[go] Putting method/program names into the with field for ISS

Michelle Gwinn Giglio mlgwinn at tigr.ORG
Thu Sep 20 14:58:37 PDT 2007



Some thoughts....

I think it is important to be able to distinguish methods that are based on just sequence analysis 
from everything else and that ISS should be the code to describe this. Most methods that make a 
prediction based on sequence, even if they include secondary structure prediction tools, have 
somewhere made use of a set of "training" or "real" examples of the thing they are trying to predict 
in order to make their models.  So I think elements of "similarity" are unsplittable from the 
creation of most of these modeling methods.  The only way to build a model is to take examples of 
the thing you are trying to find in new sequences and figure out a way to model it.  How is that not 
similarity?

If we try to start deciding which sequence based methods are just purely "similarity" then I think 
the only one that you will have in that category is Blast-type alignments.  I would argue that since 
the full name of ISS is "inferred from sequence or structural similarity"  that things like TMHMM 
which predict structural similarity should fall into that category.

I think all of these sequence-based tools are quite distinct from what should be captured with the 
ICA/RCA code.  The important concept that needs to be captured for ICA/RCA is that there is a 
combination of very different methods being used to reach a conclusion.

Remember - Blast is a "computational analysis" too, one can change the Blast parameters and get very 
different results - we don't want to lump too many things under the RCA/ICA umbrella - it just 
doesn't make sense.

I think we need at least 2 categories:
-one for all things sequence based (ISS or whatever new name might be created)
-one for combinatorial analyses that bring together different types of information to reach a 
conclusion (ICA/RCA)

If people feel there should be a code for alignments and only alignments then we will need to split 
the sequence-based category into 2 which would then give us 3 total:
-orthology based evidence
-all other sequence based evidence
-combinatorial analyses that bring together different types of information to reach a conclusion 
(ICA/RCA)

I favor the first option (2 categories, not 3) as I think it is cleaner and easier for people to 
understand.  If we feel the need to change the name of ISS to reflect this more encompassing 
definition, then OK, but that brings another whole can of worms with it (what about legacy data, 
will the community have a cow, etc.)

And I completely agree with Karen on the issues concerned with the "with" column.

Michelle






Jim Hu wrote:
> On Sep 20, 2007, at 11:16 AM, Karen Christie wrote:
> 
>> <snip>
>>
>>> It seem that RCA has not been considered, because most of the 
>>> function predictions using RCA so far have some experimental 
>>> component (In fact the RCA code says non-sequence-based computational 
>>> method).
>>
>>
>> I think we did consider RCA for tRNAscan and snoRNAs (at the 2006 
>> Annot Camp), but then rejected it at the Jan 2007 GO meeting in 
>> response to Michelle Gwinn's argument that everything based purely on 
>> analysis of the sequence of the gene product should be ISS, even if 
>> multiple types of sequence analysis were combined.
>>
>> While it is true that the original RCA documentation did say 
>> non-sequence based method, at the St. Croix meeting, Sue Rhee brought 
>> up the point about analyses that combined non-sequence and sequence 
>> based data and we agreed that these could be RCA. Thus in the draft 
>> document, I've made a new section called ICA with proposed guidelines 
>> that I think are more in line with the idea that these analyses 
>> involve multiple data sets or even multiple kinds of data sets.
>>
>> http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml
>>
> 
> I know that I probably missed this being brought up many times in the 
> past, but as a newbie, when I see ISS, I key in on Sequence Similarity 
> not just Sequence.   I think that biologists seeing this read it as a 
> way of saying inferred from evolutionary relationship which is itself 
> inferred from the sequence similarity with X.  I don't think of 
> similarity of statistical profiles of periodic amino acid content.
> 
> So, while I'm not that familiar with the specifics of the programs, it 
> seems to me that profile HMMs are very different from TMHMM.  If the 
> protein folders get to the holy grail of predicting structure/function 
> from sequence based on chemistry, I would not call that an ISS, even 
> though the sequence is the key input.
> 
> I see where the RCA documentation doesn't fit with my view.  But it 
> seems to me that the fundamental thing you want to distinguish is the 
> inferences that are based on evolution/common descent and those that are 
> not. I would use ISS for the former and RCA for everything else.
> 
> Jim
> 
> <snip>
> 
> =====================================
> 
> Jim Hu
> 
> Associate Professor
> 
> Dept. of Biochemistry and Biophysics
> 
> 2128 TAMU
> 
> Texas A&M Univ.
> 
> College Station, TX 77843-2128
> 
> 979-862-4054
> 
> 
> 



More information about the Go mailing list