[go] Scope of the RCA evidence code

Karen Christie kchris at genome.Stanford.EDU
Thu Sep 13 09:43:42 PDT 2007


Hi,

I have responses about each of these specific papers you mention, inserted 
below.

-Karen


On Thu, 13 Sep 2007, Pascale Gaudet wrote:

> Hi,
>
> When I read this it seems like this is quite analogous to IC, except inferred 
> from a paper: you have multiple sources of information and you cannot easily 
> say which you are using for making the annotation. In most cases you show, I 
> think I would have been happy with existing evidence codes:

These can't be IC because the curator isn't making the inference. It was 
suggested at some point previously to widen IC to allow author inferences 
as well, but this was soundly rejected.

> Paper 1: I think it should have IPI evidence (even if they grouped several 
> IPI analyses)

These people didn't do any of the experimental studies, so you can't use 
IPI from this paper. In addition, they didn't make their predictions on 
the basis of any single experiment, but on the basis of a network analysis 
of many, many interactions for the whole genome. The annotation is based 
on their analysis, which is not experimental work, not on a single 
experiment.

> Paper 2, 3: expression and promoters: might be IEP??

If they made statements based only on expression, then IEP might be fine. 
However, again, these authors did not do any experimental work, so an 
experimental code can not be used from this paper. Any annotations made 
from this paper are from a combined analysis of previous people's 
expression data and promoter sequence data.

> Paper 5: I would have simply used IDA and ISS for group 2

As I pointed out in my notes for the paper, url below, annotations made on 
the basis of either the experimental work or the structural analysis alone 
would receive the individual codes, IDA or ISS. However, what they did was 
require both because they felt that it was the combination of both types 
of evidence that really gave the specificity.

http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html


> Pascale
>
>
>
> Karen Christie wrote:
>> Scope of the RCA evidence code
>> -------------------------------------------------------
>> 
>> Here is my analysis of and recommendations for the future of the RCA
>> evidence code:
>> 
>> Having reviewed six papers of the type that originally prompted SGD to
>> request the RCA evidence code, it is clear that all of these methods
>> described within these papers include analysis of experimental data,
>> e.g. expression data, two hybrid data, mass spec proteomic data,
>> etc. Some also include sequence based data, but it is never the entire
>> basis of the analysis. Two of the analyses (Troyanskaya et al, and
>> Wade et al.) combined expression data with promoter sequence data, a
>> type of sequence data not typically considered in analyses appropriate
>> for the ISS code. Two other analyses (Baxter et al. and Alves et al.)
>> combined structural analysis with either experimental results or with
>> a mathematical model designed to test which mechanisms could reproduce
>> existing published experimental results. Some RCA analyses also
>> utilize existing functional annotations for characterized genes
>> (Gat-Viks et al.).
>> 
>> To summarize, all of these analyses combined multiple types of data,
>> generally including experimental data, such as expression data or
>> protein-protein interaction data. Some include sequence data, in this
>> set either promoter sequence info or structural information, but none
>> are based solely on sequence based information.
>> 
>> Analyses based purely on sequence similarity based data, including
>> sequence similarity with experimentally characterized gene products,
>> as determined by pairwise or multiple alignment; prediction methods
>> for non-coding RNA genes; recognized functional domains, as determined
>> by tools such as InterPro, Pfam, SMART, etc.; predicted protein
>> features, e.g., transmembrane regions, signal sequence, etc.;
>> structural similarity with experimentally characterized gene products,
>> as determined by crystallography, nuclear magnetic resonance, or
>> computational prediction; should use the ISS evidence code (or the IEA
>> code if it is not reviewed by a curator). The documentation does not
>> currently list mapping files such as InterPro2GO, but I would include
>> this as sequence-only based data since the basic analysis is all based
>> on the sequence of the gene product and the hits by various sequence
>> analysis methods.
>> 
>> As a curator-reviewed code, annotations made with the RCA code must be
>> reviewed/assigned by a curator.
>> 
>> The documentation currently lists 'Text-based computation (e.g. text
>> mining)' as acceptable for this evidence code. In the absence of
>> specific examples of how this might be applied, I would suggest
>> removing mention of 'Text-based computation' until we have an actual
>> example or two to look at to see whether it fits into this evidence
>> code or not.
>> 
>> Accepting these recommendations would bascially return the RCA code to its 
>> original intent. It would also be consistent with the recommendation of the 
>> Evidence Code Committee (ECC) to overturn the 2006 Annotation Camp's 
>> recommendation to use RCA for sequence similarity comparisons where you 
>> could not put an experimentally characterized ortholog into the with column 
>> and also with the January 2007 GOC meeting decision that all methods based 
>> on only sequence-based info should use the ISS code.
>> 
>> The GOC may not wish to consider renaming the evidence code, but having 
>> reviewed this set of papers, I think the phrase "Integrated Computational 
>> Analysis" would be a more descriptive name and more consistent with how 
>> authors of these types of methods describe them (the red highlighting in 
>> the sample papers page, url below, shows where the authors used that word). 
>> I'm not sure this is sufficient to make clear the distinction between these 
>> methods and sequence-only based methods, but it is better than "Reviewed 
>> Computational Analysis". In addition, right now the RCA documentation would 
>> exclude an analysis of this type if it was performed internally by a 
>> database group and not published. Thus, if the GOC is amenable to the idea 
>> of changing the name of the evidence code, I would suggest that we call it 
>> "Integrated Computational Analysis" with the abbreviation ICA.
>> 
>> 
>> Here are links to supplemental information regarding this evidence
>> code:
>> 
>> Examples of the types of analyses the RCA code was intended to cover:
>>   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html 
>> 
>> History of the RCA code:
>>   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAhistory.html
>> 
>> Summary of controversy over RCA vs ISS in Evidence Code Committee:
>>   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAvsISScontroversy.html 
>> 
>> Proposed draft of new documentation for this code:
>>   http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#ica
>>   (note that original RCA doc is still present for comparison)
>> 
>> 
>



More information about the Go mailing list