[go] Scope of the RCA evidence code

Karen Christie kchris at genome.Stanford.EDU
Mon Sep 10 16:22:36 PDT 2007


Scope of the RCA evidence code
-------------------------------------------------------

Here is my analysis of and recommendations for the future of the RCA
evidence code:

Having reviewed six papers of the type that originally prompted SGD to
request the RCA evidence code, it is clear that all of these methods
described within these papers include analysis of experimental data,
e.g. expression data, two hybrid data, mass spec proteomic data,
etc. Some also include sequence based data, but it is never the entire
basis of the analysis. Two of the analyses (Troyanskaya et al, and
Wade et al.) combined expression data with promoter sequence data, a
type of sequence data not typically considered in analyses appropriate
for the ISS code. Two other analyses (Baxter et al. and Alves et al.)
combined structural analysis with either experimental results or with
a mathematical model designed to test which mechanisms could reproduce
existing published experimental results. Some RCA analyses also
utilize existing functional annotations for characterized genes
(Gat-Viks et al.).

To summarize, all of these analyses combined multiple types of data,
generally including experimental data, such as expression data or
protein-protein interaction data. Some include sequence data, in this
set either promoter sequence info or structural information, but none
are based solely on sequence based information.

Analyses based purely on sequence similarity based data, including
sequence similarity with experimentally characterized gene products,
as determined by pairwise or multiple alignment; prediction methods
for non-coding RNA genes; recognized functional domains, as determined
by tools such as InterPro, Pfam, SMART, etc.; predicted protein
features, e.g., transmembrane regions, signal sequence, etc.;
structural similarity with experimentally characterized gene products,
as determined by crystallography, nuclear magnetic resonance, or
computational prediction; should use the ISS evidence code (or the IEA
code if it is not reviewed by a curator). The documentation does not
currently list mapping files such as InterPro2GO, but I would include
this as sequence-only based data since the basic analysis is all based
on the sequence of the gene product and the hits by various sequence
analysis methods.

As a curator-reviewed code, annotations made with the RCA code must be
reviewed/assigned by a curator.

The documentation currently lists 'Text-based computation (e.g. text
mining)' as acceptable for this evidence code. In the absence of
specific examples of how this might be applied, I would suggest
removing mention of 'Text-based computation' until we have an actual
example or two to look at to see whether it fits into this evidence
code or not.

Accepting these recommendations would bascially return the RCA code to its 
original intent. It would also be consistent with the recommendation of 
the Evidence Code Committee (ECC) to overturn the 2006 Annotation Camp's 
recommendation to use RCA for sequence similarity comparisons where you 
could not put an experimentally characterized ortholog into the with 
column and also with the January 2007 GOC meeting decision that all 
methods based on only sequence-based info should use the ISS code.

The GOC may not wish to consider renaming the evidence code, but having 
reviewed this set of papers, I think the phrase "Integrated Computational 
Analysis" would be a more descriptive name and more consistent with how 
authors of these types of methods describe them (the red highlighting in 
the sample papers page, url below, shows where the authors used that 
word). I'm not sure this is sufficient to make clear the distinction 
between these methods and sequence-only based methods, but it is better 
than "Reviewed Computational Analysis". In addition, right now the RCA 
documentation would exclude an analysis of this type if it was performed 
internally by a database group and not published. Thus, if the GOC is 
amenable to the idea of changing the name of the evidence code, I would 
suggest that we call it "Integrated Computational Analysis" with the 
abbreviation ICA.


Here are links to supplemental information regarding this evidence
code:

Examples of the types of analyses the RCA code was intended to cover:
   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html

History of the RCA code:
   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAhistory.html

Summary of controversy over RCA vs ISS in Evidence Code Committee:
   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAvsISScontroversy.html

Proposed draft of new documentation for this code:
   http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#ica
   (note that original RCA doc is still present for comparison)



More information about the Go mailing list