[go] Putting method/program names into the with field for ISS

Karen Christie kchris at genome.Stanford.EDU
Wed Sep 12 15:44:33 PDT 2007


Hi,

I have several comments on this thread, relating to:

1. clear distinction between ISS and RCA/ICA analyses
2. separating RCA/ICA from IEA
3. what should/should not be in the with field for ISS.

One of the reasons I think it might be good to rename the RCA evidence 
code to ICA is to obsolete the RCA evidence code and indicate that people 
will need to consider ISS, ICA, or IEA, depending on what type of method 
was used. Unfortunately, the RCA code has suffered from bad original 
documentation and also from being muddled with ISS due to a hasty decision 
at the 2006 Annotation Camp.

-Karen


1. For the revisions to the GO evidence code documentation, one of the 
guiding principles was that evidence codes should be statements of the 
type of evidence used, and NOTHING MORE. Thus I think that the decision of 
which evidence code to use should be based purely on the type of method 
used, not on what can be put into the with/from column.

Compare my comments on the sample papers for RCA (proposed to change
to ICA) here:
   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html

with my comments on the the sample papers (first 7 only) for ISS where
it is not possible to put a sequence or domain ID in the with column
here:
   http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-ExamplePapers.html

It is very clear that these are NOT the same type of analyses.

The ones for ISS where there is no specific sequence or domain ID for the 
with column are clearly based purely on analysis of the sequence of the 
gene product.

The ones for RCA are a very different type of analyses, typically based on 
experimental (e.g. protein-protein interactions, etc) data. The ones that 
included sequence data at all either looked at promoter sequence info or 
at structural predictions based on sequence data.

On the principle that the evidence codes should reflect the method, I am 
very opposed to lumping analyses based purely on the sequence of the gene 
product into the code (whether RCA or ICA) that will represent this very 
different type of integrated, combinatorial analysis. I think lumping 
analyses of the gene products into this code instead of into ISS will 
really muddle the two.

2. On the subject of separating RCA/ICA from IEA, I am already indicating 
that we should remove the language below from the RCA/ICA documentation:

   When using the RCA evidence code, it is recommended that the annotator
   check a sampling of the annotations resulting from the computational
   method, but it is not necessary to individually review each
   annotation. Note also that RCA may only be used for reviewed,
   published computational analyses.

I've also put in this statement:

   Annotations based on integrated computational analyses, if they have
   not been reviewed by a curator, should receive the IEA code.

The proposed new Ev Code documentation has a new section for what I think 
the RCA/ICA code should cover, url below. The original doc for RCA remains 
on the page for comparison.
   http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#ica

3. On the subject of what should go in the with column for ISS, I think 
Susan has the best idea, that sometimes there just isn't anything 
reasonable to put there.

I looked at 7 papers which had come up as things where the annotator felt 
the evidence was sequence based but there was nothing to put in the with 
column. For the 5 of that dealt with snoRNAs, only 2 of them described an 
algorithm. The other 3 merely stated that the new snoRNA gene matched the 
consensus for that family of snoRNA. For these 3, there is NO algorithm to 
put in the with field, nor a corresponding PMID for an algorithm. For the 
other 2 papers, both about hydrophobicity predictions on protein 
sequences, both mentioned a named algorithm, but in neither case is there 
an appropriate paper describing that algorithm easily at hand (or even 
findable via a little bit of searching). It was also really clear that in 
all 7 of these papers, the analysis dealt only with the sequence of the 
gene product. These 7 papers are not at all the same type of analyses as I 
considered for the RCA/ICA code.

So, even having spent some time trying to come up with a system to name 
these methods so that we could put them in the with column, I think 
Susan's suggestion to NOT make with be mandatory for ISS is better. If we 
absolutely must fill the with column, and want something simpler than 
creating a way to track method names, I think that putting "not 
applicable", to indicate that the lack of an ID was intentional would be 
more appropriate than kludging these types of gene product sequence 
analysis into an evidence code designed to cover a very different type of 
analyses.





On Wed, 12 Sep 2007, Susan Tweedie wrote:

> At the risk of returning us to square one on this... I'd like to take a
> step back and revisit why we decided it was vital to have something in
> the with column for ISS. I thought this stemmed from an attempt at
> enforcing quality annotations - we wanted to identify the similar
> 'thing' for which there is experimental evidence and to use ISS only
> where this was available. We then shifted ground a bit to acknowledge
> that there are cases where there is a strong case for ISS annotation but
> no single sequence can be identified for this column. So what do we
> actually achieve by filling-in the slot for these cases? It seems to me
> this is more to do with us saying 'yup I'm being stringent about my use
> of ISS so I've stuck something in this column to prove it' than actually
> helping users. The 'how they did it' in the the paper just like it is
> for other evidence codes. I'm not sure we 'gain' enough here to justify
> mixing methods and objects in the 'with' column and I am struggling to
> see the justification for making ISS a special case in this respect. If
> we show a method for ISS, do we set a precedent and run the risk of
> users wanting to know whether it was RNAi or knock-out for IMP etc?
>
> I guess I'd just like to know we haven't just made this column mandatory
> as a means of policing curators. I strongly agree that we should fill in
> a sequence where possible and do our best (within reason) to be sure
> there is an experiment there somewhere but, if we are going to accept
> that there are cases where we can't identify a suitable sequence, can't
> we just trust curator judgement i.e. leave the column blank and let
> people read the paper to see details of how it was done?
>
> If we stick with the plan to keep 'with' mandatory for ISS then Karen's
> system is very nice. But what do we do for cases like Michelle's example
> where a whole variety of similarity based methods are used. I find this
> crops up time after time and I wouldn't want to have to list all methods
> in this column and it doesn't seem very satisfactory to pick
> representative examples?
>
> Susan
>
> On Tue, 2007-09-11 at 19:03 +0000, Valerie Wood wrote:
>> That OK,
>>
>> I just think its rather a trawl to have to create something to go in the 'with' field when the PMID of the published algorithm is sufficient.
>>
>> My other reasoning was that these aren't purely based on 'sequence similarity', they always include some 'other  additional step' (although I agree they are 'sequence based')
>>
>> and thirdly, this could become hazy, if we got functional prediction methods which combined sequence data with some experimental date (like cellular localization), for example, would be be RCA (I presume). It therefore seemed  that if the distinction was that ISS needed to have some 'object' which represented a sequence in the 'with' column (rather than allowing the with column to contain other types of things, referring to algorithms), it would be quite a nice distinction.  If you can't locate this object  then the method probably includes something else in addition to 'sequence similarity'.
>>
>> However, these were just for consideration, I really have no strong preference either way..... although I prefer easy :)
>>
>> Val
>>
>>
>> "Gwinn-Giglio, Michelle" <MLGwinn at jcvi.org> wrote:
>>>
>>>
>>> Ben,
>>>
>>> Yes, sorry to not be clear - I was disagreeing with Val's suggestion to use RCA for things like TMHMM and tRNAscan.  At least I think that was Val's suggestion and that is what I diasagree with.
>>>
>>> Sorry to disagree with you Val.  :)
>>>
>>> Michelle
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Benjamin Hitz [mailto:hitz at genome.stanford.edu]
>>> Sent: Tue 9/11/2007 1:05 PM
>>> To: Gwinn-Giglio, Michelle
>>> Cc: GO mailing list
>>> Subject: Re: [go] Putting method/program names into the with field for ISS
>>>
>>>
>>> On Sep 11, 2007, at 7:55 AM, Gwinn-Giglio, Michelle wrote:
>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I disagree.  I think taking this approach would significantly muddy
>>>> the waters in terms of distinguishing between ISS and RCA.
>>>>
>>>> Anything that is based only on sequence analysis, be it simple
>>>> Blast or vastly more complicated modeling methods, should be ISS
>>>> because at their heart they are all comparing sequences of known
>>>> function to ones with unknown function.  Whether they do simple
>>>> alignments to make that comparison or more complicated models, it
>>>> is still a sequence based analysis.
>>>
>>> I did not suggest otherwise.
>>>
>>> Ben
>>>
>>> --
>>> Ben Hitz
>>> Senior Scientific Programmer ** Saccharomyces Genome Database ** GO
>>> Consortium
>>> Stanford University ** hitz at genome.stanford.edu
>>>
>>>
>>>
>>>
>>
>



More information about the Go mailing list