[go] Scope of the RCA evidence code

E Dimmer edimmer at ebi.ac.uk
Thu Sep 13 09:50:41 PDT 2007


Karen Christie wrote:
> Hi,
>
> I have responses about each of these specific papers you mention, 
> inserted below.
>
> -Karen
>
>
> On Thu, 13 Sep 2007, Pascale Gaudet wrote:
>
>> Hi,
>>
>> When I read this it seems like this is quite analogous to IC, except 
>> inferred from a paper: you have multiple sources of information and 
>> you cannot easily say which you are using for making the annotation. 
>> In most cases you show, I think I would have been happy with existing 
>> evidence codes:
>
> These can't be IC because the curator isn't making the inference. It 
> was suggested at some point previously to widen IC to allow author 
> inferences as well, but this was soundly rejected.
Wasn't it agreed that inferences made by the author instead of the 
curator should be annotated with the NAS code (with a GO ID when 
possible included the 'with' column) ?
>
>> Paper 1: I think it should have IPI evidence (even if they grouped 
>> several IPI analyses)
>
> These people didn't do any of the experimental studies, so you can't 
> use IPI from this paper. In addition, they didn't make their 
> predictions on the basis of any single experiment, but on the basis of 
> a network analysis of many, many interactions for the whole genome. 
> The annotation is based on their analysis, which is not experimental 
> work, not on a single experiment.
>
>> Paper 2, 3: expression and promoters: might be IEP??
>
> If they made statements based only on expression, then IEP might be 
> fine. However, again, these authors did not do any experimental work, 
> so an experimental code can not be used from this paper. Any 
> annotations made from this paper are from a combined analysis of 
> previous people's expression data and promoter sequence data.
>
>> Paper 5: I would have simply used IDA and ISS for group 2
>
> As I pointed out in my notes for the paper, url below, annotations 
> made on the basis of either the experimental work or the structural 
> analysis alone would receive the individual codes, IDA or ISS. 
> However, what they did was require both because they felt that it was 
> the combination of both types of evidence that really gave the 
> specificity.
>
> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html 
>
>
>
>> Pascale
>>
>>
>>
>> Karen Christie wrote:
>>> Scope of the RCA evidence code
>>> -------------------------------------------------------
>>>
>>> Here is my analysis of and recommendations for the future of the RCA
>>> evidence code:
>>>
>>> Having reviewed six papers of the type that originally prompted SGD to
>>> request the RCA evidence code, it is clear that all of these methods
>>> described within these papers include analysis of experimental data,
>>> e.g. expression data, two hybrid data, mass spec proteomic data,
>>> etc. Some also include sequence based data, but it is never the entire
>>> basis of the analysis. Two of the analyses (Troyanskaya et al, and
>>> Wade et al.) combined expression data with promoter sequence data, a
>>> type of sequence data not typically considered in analyses appropriate
>>> for the ISS code. Two other analyses (Baxter et al. and Alves et al.)
>>> combined structural analysis with either experimental results or with
>>> a mathematical model designed to test which mechanisms could reproduce
>>> existing published experimental results. Some RCA analyses also
>>> utilize existing functional annotations for characterized genes
>>> (Gat-Viks et al.).
>>>
>>> To summarize, all of these analyses combined multiple types of data,
>>> generally including experimental data, such as expression data or
>>> protein-protein interaction data. Some include sequence data, in this
>>> set either promoter sequence info or structural information, but none
>>> are based solely on sequence based information.
>>>
>>> Analyses based purely on sequence similarity based data, including
>>> sequence similarity with experimentally characterized gene products,
>>> as determined by pairwise or multiple alignment; prediction methods
>>> for non-coding RNA genes; recognized functional domains, as determined
>>> by tools such as InterPro, Pfam, SMART, etc.; predicted protein
>>> features, e.g., transmembrane regions, signal sequence, etc.;
>>> structural similarity with experimentally characterized gene products,
>>> as determined by crystallography, nuclear magnetic resonance, or
>>> computational prediction; should use the ISS evidence code (or the IEA
>>> code if it is not reviewed by a curator). The documentation does not
>>> currently list mapping files such as InterPro2GO, but I would include
>>> this as sequence-only based data since the basic analysis is all based
>>> on the sequence of the gene product and the hits by various sequence
>>> analysis methods.
>>>
>>> As a curator-reviewed code, annotations made with the RCA code must be
>>> reviewed/assigned by a curator.
>>>
>>> The documentation currently lists 'Text-based computation (e.g. text
>>> mining)' as acceptable for this evidence code. In the absence of
>>> specific examples of how this might be applied, I would suggest
>>> removing mention of 'Text-based computation' until we have an actual
>>> example or two to look at to see whether it fits into this evidence
>>> code or not.
>>>
>>> Accepting these recommendations would bascially return the RCA code 
>>> to its original intent. It would also be consistent with the 
>>> recommendation of the Evidence Code Committee (ECC) to overturn the 
>>> 2006 Annotation Camp's recommendation to use RCA for sequence 
>>> similarity comparisons where you could not put an experimentally 
>>> characterized ortholog into the with column and also with the 
>>> January 2007 GOC meeting decision that all methods based on only 
>>> sequence-based info should use the ISS code.
>>>
>>> The GOC may not wish to consider renaming the evidence code, but 
>>> having reviewed this set of papers, I think the phrase "Integrated 
>>> Computational Analysis" would be a more descriptive name and more 
>>> consistent with how authors of these types of methods describe them 
>>> (the red highlighting in the sample papers page, url below, shows 
>>> where the authors used that word). I'm not sure this is sufficient 
>>> to make clear the distinction between these methods and 
>>> sequence-only based methods, but it is better than "Reviewed 
>>> Computational Analysis". In addition, right now the RCA 
>>> documentation would exclude an analysis of this type if it was 
>>> performed internally by a database group and not published. Thus, if 
>>> the GOC is amenable to the idea of changing the name of the evidence 
>>> code, I would suggest that we call it "Integrated Computational 
>>> Analysis" with the abbreviation ICA.
>>>
>>>
>>> Here are links to supplemental information regarding this evidence
>>> code:
>>>
>>> Examples of the types of analyses the RCA code was intended to cover:
>>>   
>>> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html 
>>>
>>> History of the RCA code:
>>>   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAhistory.html
>>>
>>> Summary of controversy over RCA vs ISS in Evidence Code Committee:
>>>   
>>> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAvsISScontroversy.html 
>>>
>>> Proposed draft of new documentation for this code:
>>>   
>>> http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#ica
>>>   (note that original RCA doc is still present for comparison)
>>>
>>>
>>


-- 
 ************************************
    Emily Dimmer
    GOA Coordinator
    EMBL-EBI
    Wellcome Trust Genome Campus
    Hinxton
    Cambridge CB10 1SD, U.K.
    Tel:     +44 1223 494654
    Fax:    +44 1223 494468
    email:  edimmer at ebi.ac.uk
 ************************************




More information about the Go mailing list