[go] Scope of the RCA evidence code

Pascale Gaudet pgaudet at northwestern.edu
Thu Sep 13 05:42:51 PDT 2007


Hi,

When I read this it seems like this is quite analogous to IC, except 
inferred from a paper: you have multiple sources of information and you 
cannot easily say which you are using for making the annotation. In most 
cases you show, I think I would have been happy with existing evidence 
codes:

Paper 1: I think it should have IPI evidence (even if they grouped 
several IPI analyses)
Paper 2, 3: expression and promoters: might be IEP??
Paper 5: I would have simply used IDA and ISS for group 2

Pascale



Karen Christie wrote:
> Scope of the RCA evidence code
> -------------------------------------------------------
>
> Here is my analysis of and recommendations for the future of the RCA
> evidence code:
>
> Having reviewed six papers of the type that originally prompted SGD to
> request the RCA evidence code, it is clear that all of these methods
> described within these papers include analysis of experimental data,
> e.g. expression data, two hybrid data, mass spec proteomic data,
> etc. Some also include sequence based data, but it is never the entire
> basis of the analysis. Two of the analyses (Troyanskaya et al, and
> Wade et al.) combined expression data with promoter sequence data, a
> type of sequence data not typically considered in analyses appropriate
> for the ISS code. Two other analyses (Baxter et al. and Alves et al.)
> combined structural analysis with either experimental results or with
> a mathematical model designed to test which mechanisms could reproduce
> existing published experimental results. Some RCA analyses also
> utilize existing functional annotations for characterized genes
> (Gat-Viks et al.).
>
> To summarize, all of these analyses combined multiple types of data,
> generally including experimental data, such as expression data or
> protein-protein interaction data. Some include sequence data, in this
> set either promoter sequence info or structural information, but none
> are based solely on sequence based information.
>
> Analyses based purely on sequence similarity based data, including
> sequence similarity with experimentally characterized gene products,
> as determined by pairwise or multiple alignment; prediction methods
> for non-coding RNA genes; recognized functional domains, as determined
> by tools such as InterPro, Pfam, SMART, etc.; predicted protein
> features, e.g., transmembrane regions, signal sequence, etc.;
> structural similarity with experimentally characterized gene products,
> as determined by crystallography, nuclear magnetic resonance, or
> computational prediction; should use the ISS evidence code (or the IEA
> code if it is not reviewed by a curator). The documentation does not
> currently list mapping files such as InterPro2GO, but I would include
> this as sequence-only based data since the basic analysis is all based
> on the sequence of the gene product and the hits by various sequence
> analysis methods.
>
> As a curator-reviewed code, annotations made with the RCA code must be
> reviewed/assigned by a curator.
>
> The documentation currently lists 'Text-based computation (e.g. text
> mining)' as acceptable for this evidence code. In the absence of
> specific examples of how this might be applied, I would suggest
> removing mention of 'Text-based computation' until we have an actual
> example or two to look at to see whether it fits into this evidence
> code or not.
>
> Accepting these recommendations would bascially return the RCA code to 
> its original intent. It would also be consistent with the 
> recommendation of the Evidence Code Committee (ECC) to overturn the 
> 2006 Annotation Camp's recommendation to use RCA for sequence 
> similarity comparisons where you could not put an experimentally 
> characterized ortholog into the with column and also with the January 
> 2007 GOC meeting decision that all methods based on only 
> sequence-based info should use the ISS code.
>
> The GOC may not wish to consider renaming the evidence code, but 
> having reviewed this set of papers, I think the phrase "Integrated 
> Computational Analysis" would be a more descriptive name and more 
> consistent with how authors of these types of methods describe them 
> (the red highlighting in the sample papers page, url below, shows 
> where the authors used that word). I'm not sure this is sufficient to 
> make clear the distinction between these methods and sequence-only 
> based methods, but it is better than "Reviewed Computational 
> Analysis". In addition, right now the RCA documentation would exclude 
> an analysis of this type if it was performed internally by a database 
> group and not published. Thus, if the GOC is amenable to the idea of 
> changing the name of the evidence code, I would suggest that we call 
> it "Integrated Computational Analysis" with the abbreviation ICA.
>
>
> Here are links to supplemental information regarding this evidence
> code:
>
> Examples of the types of analyses the RCA code was intended to cover:
>   
> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html 
>
>
> History of the RCA code:
>   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAhistory.html
>
> Summary of controversy over RCA vs ISS in Evidence Code Committee:
>   
> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAvsISScontroversy.html 
>
>
> Proposed draft of new documentation for this code:
>   http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#ica
>   (note that original RCA doc is still present for comparison)
>
>




More information about the Go mailing list