[go] Scope of the RCA evidence code
Pascale Gaudet
pgaudet at northwestern.edu
Thu Sep 13 05:42:51 PDT 2007
Hi,
When I read this it seems like this is quite analogous to IC, except
inferred from a paper: you have multiple sources of information and you
cannot easily say which you are using for making the annotation. In most
cases you show, I think I would have been happy with existing evidence
codes:
Paper 1: I think it should have IPI evidence (even if they grouped
several IPI analyses)
Paper 2, 3: expression and promoters: might be IEP??
Paper 5: I would have simply used IDA and ISS for group 2
Pascale
Karen Christie wrote:
> Scope of the RCA evidence code
> -------------------------------------------------------
>
> Here is my analysis of and recommendations for the future of the RCA
> evidence code:
>
> Having reviewed six papers of the type that originally prompted SGD to
> request the RCA evidence code, it is clear that all of these methods
> described within these papers include analysis of experimental data,
> e.g. expression data, two hybrid data, mass spec proteomic data,
> etc. Some also include sequence based data, but it is never the entire
> basis of the analysis. Two of the analyses (Troyanskaya et al, and
> Wade et al.) combined expression data with promoter sequence data, a
> type of sequence data not typically considered in analyses appropriate
> for the ISS code. Two other analyses (Baxter et al. and Alves et al.)
> combined structural analysis with either experimental results or with
> a mathematical model designed to test which mechanisms could reproduce
> existing published experimental results. Some RCA analyses also
> utilize existing functional annotations for characterized genes
> (Gat-Viks et al.).
>
> To summarize, all of these analyses combined multiple types of data,
> generally including experimental data, such as expression data or
> protein-protein interaction data. Some include sequence data, in this
> set either promoter sequence info or structural information, but none
> are based solely on sequence based information.
>
> Analyses based purely on sequence similarity based data, including
> sequence similarity with experimentally characterized gene products,
> as determined by pairwise or multiple alignment; prediction methods
> for non-coding RNA genes; recognized functional domains, as determined
> by tools such as InterPro, Pfam, SMART, etc.; predicted protein
> features, e.g., transmembrane regions, signal sequence, etc.;
> structural similarity with experimentally characterized gene products,
> as determined by crystallography, nuclear magnetic resonance, or
> computational prediction; should use the ISS evidence code (or the IEA
> code if it is not reviewed by a curator). The documentation does not
> currently list mapping files such as InterPro2GO, but I would include
> this as sequence-only based data since the basic analysis is all based
> on the sequence of the gene product and the hits by various sequence
> analysis methods.
>
> As a curator-reviewed code, annotations made with the RCA code must be
> reviewed/assigned by a curator.
>
> The documentation currently lists 'Text-based computation (e.g. text
> mining)' as acceptable for this evidence code. In the absence of
> specific examples of how this might be applied, I would suggest
> removing mention of 'Text-based computation' until we have an actual
> example or two to look at to see whether it fits into this evidence
> code or not.
>
> Accepting these recommendations would bascially return the RCA code to
> its original intent. It would also be consistent with the
> recommendation of the Evidence Code Committee (ECC) to overturn the
> 2006 Annotation Camp's recommendation to use RCA for sequence
> similarity comparisons where you could not put an experimentally
> characterized ortholog into the with column and also with the January
> 2007 GOC meeting decision that all methods based on only
> sequence-based info should use the ISS code.
>
> The GOC may not wish to consider renaming the evidence code, but
> having reviewed this set of papers, I think the phrase "Integrated
> Computational Analysis" would be a more descriptive name and more
> consistent with how authors of these types of methods describe them
> (the red highlighting in the sample papers page, url below, shows
> where the authors used that word). I'm not sure this is sufficient to
> make clear the distinction between these methods and sequence-only
> based methods, but it is better than "Reviewed Computational
> Analysis". In addition, right now the RCA documentation would exclude
> an analysis of this type if it was performed internally by a database
> group and not published. Thus, if the GOC is amenable to the idea of
> changing the name of the evidence code, I would suggest that we call
> it "Integrated Computational Analysis" with the abbreviation ICA.
>
>
> Here are links to supplemental information regarding this evidence
> code:
>
> Examples of the types of analyses the RCA code was intended to cover:
>
> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html
>
>
> History of the RCA code:
> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAhistory.html
>
> Summary of controversy over RCA vs ISS in Evidence Code Committee:
>
> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAvsISScontroversy.html
>
>
> Proposed draft of new documentation for this code:
> http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#ica
> (note that original RCA doc is still present for comparison)
>
>
More information about the Go
mailing list