boundary between ISS and RCA Re: [go] Putting method/program names into the with field for ISS

Karen Christie kchris at genome.Stanford.EDU
Fri Sep 21 10:02:23 PDT 2007


Michelle has already responded to this and discussed how these types of 
tools work and that they do include a component of comparison. She's 
definitely the expert in this area and I'm certainly not, so I don't have 
anything to add to what she said in this area.

However, on the idea of using 'evolution/common descent' as the criterion, 
I do have some comments. I think 'evolution/common descent' is a very 
slippery and difficult judgement call, one that will make it very 
difficult to define a boundary between ISS and RCA that provides any sort 
of consistency, either between existing groups using GO or when we go out 
and try to teach our annotation procedures to new groups. Having been the 
discussion leader standing in front of the group for the first public 
Annotation Camp in 2005, several times I felt very keenly the lack of 
consistent procedures when several different groups would comment on how 
they used a particular code and every group did rather different things 
(ISS has been particularly bad).

So, with trying to use 'evolution/common descent' as a criterion, how 
would you define this simply in a way that provides clarity? Going back to 
the current examples, I would say that snoRNA consensus sequences are 
definitely due to a common descent. However, due to the nature of what 
they do and the fact that they are RNA instead of protein, the constraints 
on how they evolve are somewhat different than those on the evolution of 
proteins. For example, areas involved in base pairing, either internal or 
with an external nucleic acid can vary significantly in sequence provided 
that base pairing is maintained. Even for the protein motifs such as 
transmembrane domains, I don't see how you can say that evolution is not 
involved, but of course the evolutionary constraints are on a portion of 
the molecule rather than the whole thing. We already allow ISS matches to 
domain hits such as InterPro domains, so it doesn't seem inconsistent to 
allow motif hits too, especially considering that the proposed resulting 
annotations were appropriately general, i.e. 'integral to membrane'.

To use 'analysis based purely on the sequence of the gene product' as our 
criterion determining the boundary between ISS and RCA, seems a much 
simpler criterion. I think trying to use 'evolution/common descent' as our 
defining boundary will lead to continued confusion and debate in the 
future. At least using 'based only on the sequence of the gene product' as 
the defining line is a fairly simple guideline, one that I think is easy 
to define and explain. As we seek to improve consistency amongst existing 
groups and also to teach our annotation practice to new groups, I think 
this is an important idea to consider in deciding the boundary between ISS 
and RCA.

-Karen



On Thu, 20 Sep 2007, Jim Hu wrote:

> I know that I probably missed this being brought up many times in the past, 
> but as a newbie, when I see ISS, I key in on Sequence Similarity not just 
> Sequence.   I think that biologists seeing this read it as a way of saying 
> inferred from evolutionary relationship which is itself inferred from the 
> sequence similarity with X.  I don't think of similarity of statistical 
> profiles of periodic amino acid content.
>
> So, while I'm not that familiar with the specifics of the programs, it seems 
> to me that profile HMMs are very different from TMHMM.  If the protein 
> folders get to the holy grail of predicting structure/function from sequence 
> based on chemistry, I would not call that an ISS, even though the sequence is 
> the key input.
>
> I see where the RCA documentation doesn't fit with my view.  But it seems to 
> me that the fundamental thing you want to distinguish is the inferences that 
> are based on evolution/common descent and those that are not. I would use ISS 
> for the former and RCA for everything else.
>
> Jim


> On Sep 20, 2007, at 11:16 AM, Karen Christie wrote:
>
>> <snip>
>>> It seem that RCA has not been considered, because most of the function 
>>> predictions using RCA so far have some experimental component (In fact the 
>>> RCA code says non-sequence-based computational method).
>> 
>> I think we did consider RCA for tRNAscan and snoRNAs (at the 2006 Annot 
>> Camp), but then rejected it at the Jan 2007 GO meeting in response to 
>> Michelle Gwinn's argument that everything based purely on analysis of the 
>> sequence of the gene product should be ISS, even if multiple types of 
>> sequence analysis were combined.
>> 
>> While it is true that the original RCA documentation did say non-sequence 
>> based method, at the St. Croix meeting, Sue Rhee brought up the point about 
>> analyses that combined non-sequence and sequence based data and we agreed 
>> that these could be RCA. Thus in the draft document, I've made a new 
>> section called ICA with proposed guidelines that I think are more in line 
>> with the idea that these analyses involve multiple data sets or even 
>> multiple kinds of data sets.
>> 
>> http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml
>> 
>
>
> <snip>
> =====================================
> Jim Hu
> Associate Professor
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4054
>
>



More information about the Go mailing list