[go] Scope of the RCA evidence code

Karen Christie kchris at genome.Stanford.EDU
Thu Sep 13 14:48:37 PDT 2007


Using NAS for authors statements is a different situation than this. We 
have indicated to use NAS for cases where the author either makes 
statements about a gene without any source information, or makes 
statements like "As a member of the blah complex, this gene is involved 
in process x."

In these papers, the results these people are reporting are these 
integrated analyses of various types of data. When you look at the methods 
in these papers, they report the algorithms and statistical methods they 
used. The annotations from these papers are based on the results of the 
work they report in their papers, i.e. their computational analyses.

-Karen

P.S. I actually don't think we have come to final consensus on the 
proposed use of the with column for NAS. I'll start a separate thread on 
that.



On Thu, 13 Sep 2007, E Dimmer wrote:

> Karen Christie wrote:
>> Hi,
>> 
>> I have responses about each of these specific papers you mention, inserted 
>> below.
>> 
>> -Karen
>> 
>> 
>> On Thu, 13 Sep 2007, Pascale Gaudet wrote:
>> 
>>> Hi,
>>> 
>>> When I read this it seems like this is quite analogous to IC, except 
>>> inferred from a paper: you have multiple sources of information and you 
>>> cannot easily say which you are using for making the annotation. In most 
>>> cases you show, I think I would have been happy with existing evidence 
>>> codes:
>> 
>> These can't be IC because the curator isn't making the inference. It was 
>> suggested at some point previously to widen IC to allow author inferences 
>> as well, but this was soundly rejected.

> Wasn't it agreed that inferences made by the author instead of the curator 
> should be annotated with the NAS code (with a GO ID when possible included 
> the 'with' column) ?

>>> Paper 1: I think it should have IPI evidence (even if they grouped several 
>>> IPI analyses)
>> 
>> These people didn't do any of the experimental studies, so you can't use 
>> IPI from this paper. In addition, they didn't make their predictions on the 
>> basis of any single experiment, but on the basis of a network analysis of 
>> many, many interactions for the whole genome. The annotation is based on 
>> their analysis, which is not experimental work, not on a single experiment.
>> 
>>> Paper 2, 3: expression and promoters: might be IEP??
>> 
>> If they made statements based only on expression, then IEP might be fine. 
>> However, again, these authors did not do any experimental work, so an 
>> experimental code can not be used from this paper. Any annotations made 
>> from this paper are from a combined analysis of previous people's 
>> expression data and promoter sequence data.
>> 
>>> Paper 5: I would have simply used IDA and ISS for group 2
>> 
>> As I pointed out in my notes for the paper, url below, annotations made on 
>> the basis of either the experimental work or the structural analysis alone 
>> would receive the individual codes, IDA or ISS. However, what they did was 
>> require both because they felt that it was the combination of both types of 
>> evidence that really gave the specificity.
>> 
>> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html 
>> 
>> 
>>> Pascale
>>> 
>>> 
>>> 
>>> Karen Christie wrote:
>>>> Scope of the RCA evidence code
>>>> -------------------------------------------------------
>>>> 
>>>> Here is my analysis of and recommendations for the future of the RCA
>>>> evidence code:
>>>> 
>>>> Having reviewed six papers of the type that originally prompted SGD to
>>>> request the RCA evidence code, it is clear that all of these methods
>>>> described within these papers include analysis of experimental data,
>>>> e.g. expression data, two hybrid data, mass spec proteomic data,
>>>> etc. Some also include sequence based data, but it is never the entire
>>>> basis of the analysis. Two of the analyses (Troyanskaya et al, and
>>>> Wade et al.) combined expression data with promoter sequence data, a
>>>> type of sequence data not typically considered in analyses appropriate
>>>> for the ISS code. Two other analyses (Baxter et al. and Alves et al.)
>>>> combined structural analysis with either experimental results or with
>>>> a mathematical model designed to test which mechanisms could reproduce
>>>> existing published experimental results. Some RCA analyses also
>>>> utilize existing functional annotations for characterized genes
>>>> (Gat-Viks et al.).
>>>> 
>>>> To summarize, all of these analyses combined multiple types of data,
>>>> generally including experimental data, such as expression data or
>>>> protein-protein interaction data. Some include sequence data, in this
>>>> set either promoter sequence info or structural information, but none
>>>> are based solely on sequence based information.
>>>> 
>>>> Analyses based purely on sequence similarity based data, including
>>>> sequence similarity with experimentally characterized gene products,
>>>> as determined by pairwise or multiple alignment; prediction methods
>>>> for non-coding RNA genes; recognized functional domains, as determined
>>>> by tools such as InterPro, Pfam, SMART, etc.; predicted protein
>>>> features, e.g., transmembrane regions, signal sequence, etc.;
>>>> structural similarity with experimentally characterized gene products,
>>>> as determined by crystallography, nuclear magnetic resonance, or
>>>> computational prediction; should use the ISS evidence code (or the IEA
>>>> code if it is not reviewed by a curator). The documentation does not
>>>> currently list mapping files such as InterPro2GO, but I would include
>>>> this as sequence-only based data since the basic analysis is all based
>>>> on the sequence of the gene product and the hits by various sequence
>>>> analysis methods.
>>>> 
>>>> As a curator-reviewed code, annotations made with the RCA code must be
>>>> reviewed/assigned by a curator.
>>>> 
>>>> The documentation currently lists 'Text-based computation (e.g. text
>>>> mining)' as acceptable for this evidence code. In the absence of
>>>> specific examples of how this might be applied, I would suggest
>>>> removing mention of 'Text-based computation' until we have an actual
>>>> example or two to look at to see whether it fits into this evidence
>>>> code or not.
>>>> 
>>>> Accepting these recommendations would bascially return the RCA code to 
>>>> its original intent. It would also be consistent with the recommendation 
>>>> of the Evidence Code Committee (ECC) to overturn the 2006 Annotation 
>>>> Camp's recommendation to use RCA for sequence similarity comparisons 
>>>> where you could not put an experimentally characterized ortholog into the 
>>>> with column and also with the January 2007 GOC meeting decision that all 
>>>> methods based on only sequence-based info should use the ISS code.
>>>> 
>>>> The GOC may not wish to consider renaming the evidence code, but having 
>>>> reviewed this set of papers, I think the phrase "Integrated Computational 
>>>> Analysis" would be a more descriptive name and more consistent with how 
>>>> authors of these types of methods describe them (the red highlighting in 
>>>> the sample papers page, url below, shows where the authors used that 
>>>> word). I'm not sure this is sufficient to make clear the distinction 
>>>> between these methods and sequence-only based methods, but it is better 
>>>> than "Reviewed Computational Analysis". In addition, right now the RCA 
>>>> documentation would exclude an analysis of this type if it was performed 
>>>> internally by a database group and not published. Thus, if the GOC is 
>>>> amenable to the idea of changing the name of the evidence code, I would 
>>>> suggest that we call it "Integrated Computational Analysis" with the 
>>>> abbreviation ICA.
>>>> 
>>>> 
>>>> Here are links to supplemental information regarding this evidence
>>>> code:
>>>> 
>>>> Examples of the types of analyses the RCA code was intended to cover:
>>>>   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html 
>>>> History of the RCA code:
>>>>   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAhistory.html
>>>> 
>>>> Summary of controversy over RCA vs ISS in Evidence Code Committee:
>>>>   http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAvsISScontroversy.html 
>>>> Proposed draft of new documentation for this code:
>>>>   http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#ica
>>>>   (note that original RCA doc is still present for comparison)
>>>> 
>>>> 
>>> 
>
>
> -- 
> ************************************
>   Emily Dimmer
>   GOA Coordinator
>   EMBL-EBI
>   Wellcome Trust Genome Campus
>   Hinxton
>   Cambridge CB10 1SD, U.K.
>   Tel:     +44 1223 494654
>   Fax:    +44 1223 494468
>   email:  edimmer at ebi.ac.uk
> ************************************
>



More information about the Go mailing list