boundary between ISS and RCA Re: [go] Putting method/program names into the with field for ISS
Karen Christie
kchris at genome.Stanford.EDU
Fri Sep 21 10:02:23 PDT 2007
Michelle has already responded to this and discussed how these types of
tools work and that they do include a component of comparison. She's
definitely the expert in this area and I'm certainly not, so I don't have
anything to add to what she said in this area.
However, on the idea of using 'evolution/common descent' as the criterion,
I do have some comments. I think 'evolution/common descent' is a very
slippery and difficult judgement call, one that will make it very
difficult to define a boundary between ISS and RCA that provides any sort
of consistency, either between existing groups using GO or when we go out
and try to teach our annotation procedures to new groups. Having been the
discussion leader standing in front of the group for the first public
Annotation Camp in 2005, several times I felt very keenly the lack of
consistent procedures when several different groups would comment on how
they used a particular code and every group did rather different things
(ISS has been particularly bad).
So, with trying to use 'evolution/common descent' as a criterion, how
would you define this simply in a way that provides clarity? Going back to
the current examples, I would say that snoRNA consensus sequences are
definitely due to a common descent. However, due to the nature of what
they do and the fact that they are RNA instead of protein, the constraints
on how they evolve are somewhat different than those on the evolution of
proteins. For example, areas involved in base pairing, either internal or
with an external nucleic acid can vary significantly in sequence provided
that base pairing is maintained. Even for the protein motifs such as
transmembrane domains, I don't see how you can say that evolution is not
involved, but of course the evolutionary constraints are on a portion of
the molecule rather than the whole thing. We already allow ISS matches to
domain hits such as InterPro domains, so it doesn't seem inconsistent to
allow motif hits too, especially considering that the proposed resulting
annotations were appropriately general, i.e. 'integral to membrane'.
To use 'analysis based purely on the sequence of the gene product' as our
criterion determining the boundary between ISS and RCA, seems a much
simpler criterion. I think trying to use 'evolution/common descent' as our
defining boundary will lead to continued confusion and debate in the
future. At least using 'based only on the sequence of the gene product' as
the defining line is a fairly simple guideline, one that I think is easy
to define and explain. As we seek to improve consistency amongst existing
groups and also to teach our annotation practice to new groups, I think
this is an important idea to consider in deciding the boundary between ISS
and RCA.
-Karen
On Thu, 20 Sep 2007, Jim Hu wrote:
> I know that I probably missed this being brought up many times in the past,
> but as a newbie, when I see ISS, I key in on Sequence Similarity not just
> Sequence. I think that biologists seeing this read it as a way of saying
> inferred from evolutionary relationship which is itself inferred from the
> sequence similarity with X. I don't think of similarity of statistical
> profiles of periodic amino acid content.
>
> So, while I'm not that familiar with the specifics of the programs, it seems
> to me that profile HMMs are very different from TMHMM. If the protein
> folders get to the holy grail of predicting structure/function from sequence
> based on chemistry, I would not call that an ISS, even though the sequence is
> the key input.
>
> I see where the RCA documentation doesn't fit with my view. But it seems to
> me that the fundamental thing you want to distinguish is the inferences that
> are based on evolution/common descent and those that are not. I would use ISS
> for the former and RCA for everything else.
>
> Jim
> On Sep 20, 2007, at 11:16 AM, Karen Christie wrote:
>
>> <snip>
>>> It seem that RCA has not been considered, because most of the function
>>> predictions using RCA so far have some experimental component (In fact the
>>> RCA code says non-sequence-based computational method).
>>
>> I think we did consider RCA for tRNAscan and snoRNAs (at the 2006 Annot
>> Camp), but then rejected it at the Jan 2007 GO meeting in response to
>> Michelle Gwinn's argument that everything based purely on analysis of the
>> sequence of the gene product should be ISS, even if multiple types of
>> sequence analysis were combined.
>>
>> While it is true that the original RCA documentation did say non-sequence
>> based method, at the St. Croix meeting, Sue Rhee brought up the point about
>> analyses that combined non-sequence and sequence based data and we agreed
>> that these could be RCA. Thus in the draft document, I've made a new
>> section called ICA with proposed guidelines that I think are more in line
>> with the idea that these analyses involve multiple data sets or even
>> multiple kinds of data sets.
>>
>> http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml
>>
>
>
> <snip>
> =====================================
> Jim Hu
> Associate Professor
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4054
>
>
More information about the Go
mailing list