[Annotation] evidence code advice
Valerie Wood
val at sanger.ac.uk
Wed Apr 2 03:14:27 PDT 2008
Rama Balakrishnan wrote:
>> Anyway, in light of that history, I think it would make most sense
>> if the
>> absolute requirement for the with column to be filled for IEA was
>> dropped
>> in the short term, so that we can use the IEA code for unreviewed
>> annotations from RCA methods.
>>
>
> I think it is important to require the 'with' column for IEAs to
> prevent circular annotations.
> The other option is to revert the RCA code to its original version
> which required only the computational method to be reviewed and not
> every annotation.
>
Hi Rama,
I wonder about the value of RCA annotations as part of the body of GO
annotations
if they are not reviewed?
This code usually provides the most tentative annotation, because they
are generally 'function predictions'
i.e.
* Predictions based on computational analyses of large-scale
experimental data sets
* Predictions based on computational analyses that integrate
datasets of several types, including experimental data (e.g.
expression data, protein-protein interaction data, genetic
interaction data, etc.), sequence data (e.g. promoter sequence,
sequence-based structural predictions, etc.), or mathematical models
they frequently seem to be
i) Obviously wrong, in a way which would easily be spotted by a curator
ii) Redundant with existing experimental, or other manually curated
annotations, or even IEA annotations
iii) Obvious annotation omissions (i.e when there is an ISS to
transporter activity, but no ISS to transporter)
Several 100 doesn't seem so many to manually review (at least to make
sure they satisfy the criteria above). It would probably save time in
the long run....(I'm also amazed there are so many good 'predictions'
for S. cerevisiae which are unnannotated already?).
For these reasons, pending any long term solution, I'd prefer RCA which
were not reviewed by a curator to be classed as 'electronically
inferred' because they are essentially "automated".
My 2p
Val
> I also really like Kara's proposal and hopefully this will be
> discussed at the upcoming GO meeting.
>
> Rama
>
>
>
>> In the long term, I think Kara's proposal is a better way to go.
>>
>> -Karen
>>
>>
>> On Sun, 30 Mar 2008, Suzanna Lewis wrote:
>>
>>
>>> This is very much along the lines that I've been trying to foster
>>> (remember the meeting in Cambridge at Jesus College). The bit-code
>>> (or
>>> bar-code) for evidence codes, with each bit indicating one of these
>>> flags for a different piece of information. Not only automated/
>>> manual,
>>> but also large-scale/small-scale, and other characteristics of the
>>> evidence.
>>>
>>> As Kara (and many others) have said, there is quite a bit of over-
>>> loading of multiple pieces of information in the current evidence
>>> codes. It would be nice one day to see these distinguished into
>>> different constituent bits of information.
>>>
>>> -S
>>>
>>> p.s. I thought that IEA did not -require- the with column.
>>> p.p.s Was the decision tree a step in this direction?
>>>
>>> On Mar 26, 2008, at 1:59 PM, Kara Dolinski wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> The root of the problem, as I see it, is that we are mixing apples
>>>> and oranges with evidence codes. All but one of the evidence codes
>>>> indicate the type of experimental evidence for a GO annotation, but
>>>> we have one oddball, IEA, that indicates not what the experiment is,
>>>> but rather how the annotation was done. We keep running into
>>>> variations of the same problem: we have some evidence (whether
>>>> experimental or computational) for a GO annotation, but also want to
>>>> indicate whether a curator looked at it or not.
>>>>
>>>> My proposed (albeit radical) solution:
>>>>
>>>> Remove IEA as an evidence code.
>>>>
>>>> Create a new property for GO annotations (or add a new type of
>>>> qualifier) that captures how the annotation was done: manual or
>>>> automated.
>>>>
>>>> Everything that is currently IEA would be given the 'automated'
>>>> property/qualifier, and then would be given a new evidence code as
>>>> appropriate (mostly a flavor of ISS I would assume).
>>>> There can be a rule that all 'automated' annotations that are a
>>>> flavor of ISS must have a 'with' value.
>>>>
>>>> This would allow us to use 'RCA' as appropriate, in some cases
>>>> they'd be 'manual', in others, they'd be 'automated'. In Rama's
>>>> case, the annotations would be 'RCA' with an 'automated' qualifier.
>>>>
>>>> I realize the issues involved in making such a drastic change, so I
>>>> understand if we don't go there, but I do think that some approach
>>>> such as the one above is the best representation of the information
>>>> that we are trying to capture.
>>>>
>>>> Cheers,
>>>> Kara
>>>>
>>>> On Mar 26, 2008, at 4:30 PM, Rama Balakrishnan wrote:
>>>>
>>>>
>>>>> Hi All,
>>>>>
>>>>> SGD has come across couple of computationally predicted GO
>>>>> annotation data sets for S. cerevisiae that we would like to add to
>>>>> our database. The GO annotations from these data sets are
>>>>> predictions based on multiple high-throughput data sets. RCA
>>>>> evidence code came to our minds but according to the documentation,
>>>>> the annotations all have to be manually reviewed by a curator to
>>>>> use this evidence. There are several 100 annotations of this kind
>>>>> and it is not feasible for us to manually review these annotations.
>>>>>
>>>>> Hence, we thought these annotations can be bulk loaded with IEA
>>>>> evidence code. However, in the Jan 2007 (Cambridge) GO meeting, it
>>>>> was decided that the 'with' column information has to be filled in
>>>>> for all IEAs (else Mike's filtering script strips them out). But
>>>>> these GO annotations being predictions based on multiple high-
>>>>> throughput data sets, don't have any information for the with
>>>>> column. So, we are left with no choice.
>>>>>
>>>>> Which evidence code do people think should be used for these kinds
>>>>> of computational datasets when there is not an obvious "with"?
>>>>>
>>>>> Thanks for your input.
>>>>>
>>>>>
>>>>> Rama
>>>>>
>>>>>
>>>>> +-----o--o
>>>>> ---------------------------------------------------------------
>>>>> o-o Rama Balakrishnan Ph.D
>>>>> O Senior Scientific Curator
>>>>> o-o Saccharomyces Genome Database
>>>>> o---o Stanford University
>>>>> o----o Stanford, CA 94305-5120
>>>>> O-----O Ph: 650.725.8956 Fax: 650.723.7016
>>>>> 0--o email: rama at genome.stanford.edu
>>>>> O Website: http://www.yeastgenome.org
>>>>> o-o SGD Wiki- http://wiki.yeastgenome.org
>>>>> +- o---o
>>>>> -----------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Annotation mailing list
>>>> Annotation at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>
>>> _______________________________________________
>>> Annotation mailing list
>>> Annotation at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>
>>>
>> _______________________________________________
>> Annotation mailing list
>> Annotation at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>
>
> _______________________________________________
> Annotation mailing list
> Annotation at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/annotation
>
>
>
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Annotation
mailing list