[go] Scope of the RCA evidence code
E Dimmer
edimmer at ebi.ac.uk
Thu Sep 13 09:50:41 PDT 2007
Karen Christie wrote:
> Hi,
>
> I have responses about each of these specific papers you mention,
> inserted below.
>
> -Karen
>
>
> On Thu, 13 Sep 2007, Pascale Gaudet wrote:
>
>> Hi,
>>
>> When I read this it seems like this is quite analogous to IC, except
>> inferred from a paper: you have multiple sources of information and
>> you cannot easily say which you are using for making the annotation.
>> In most cases you show, I think I would have been happy with existing
>> evidence codes:
>
> These can't be IC because the curator isn't making the inference. It
> was suggested at some point previously to widen IC to allow author
> inferences as well, but this was soundly rejected.
Wasn't it agreed that inferences made by the author instead of the
curator should be annotated with the NAS code (with a GO ID when
possible included the 'with' column) ?
>
>> Paper 1: I think it should have IPI evidence (even if they grouped
>> several IPI analyses)
>
> These people didn't do any of the experimental studies, so you can't
> use IPI from this paper. In addition, they didn't make their
> predictions on the basis of any single experiment, but on the basis of
> a network analysis of many, many interactions for the whole genome.
> The annotation is based on their analysis, which is not experimental
> work, not on a single experiment.
>
>> Paper 2, 3: expression and promoters: might be IEP??
>
> If they made statements based only on expression, then IEP might be
> fine. However, again, these authors did not do any experimental work,
> so an experimental code can not be used from this paper. Any
> annotations made from this paper are from a combined analysis of
> previous people's expression data and promoter sequence data.
>
>> Paper 5: I would have simply used IDA and ISS for group 2
>
> As I pointed out in my notes for the paper, url below, annotations
> made on the basis of either the experimental work or the structural
> analysis alone would receive the individual codes, IDA or ISS.
> However, what they did was require both because they felt that it was
> the combination of both types of evidence that really gave the
> specificity.
>
> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html
>
>
>
>> Pascale
>>
>>
>>
>> Karen Christie wrote:
>>> Scope of the RCA evidence code
>>> -------------------------------------------------------
>>>
>>> Here is my analysis of and recommendations for the future of the RCA
>>> evidence code:
>>>
>>> Having reviewed six papers of the type that originally prompted SGD to
>>> request the RCA evidence code, it is clear that all of these methods
>>> described within these papers include analysis of experimental data,
>>> e.g. expression data, two hybrid data, mass spec proteomic data,
>>> etc. Some also include sequence based data, but it is never the entire
>>> basis of the analysis. Two of the analyses (Troyanskaya et al, and
>>> Wade et al.) combined expression data with promoter sequence data, a
>>> type of sequence data not typically considered in analyses appropriate
>>> for the ISS code. Two other analyses (Baxter et al. and Alves et al.)
>>> combined structural analysis with either experimental results or with
>>> a mathematical model designed to test which mechanisms could reproduce
>>> existing published experimental results. Some RCA analyses also
>>> utilize existing functional annotations for characterized genes
>>> (Gat-Viks et al.).
>>>
>>> To summarize, all of these analyses combined multiple types of data,
>>> generally including experimental data, such as expression data or
>>> protein-protein interaction data. Some include sequence data, in this
>>> set either promoter sequence info or structural information, but none
>>> are based solely on sequence based information.
>>>
>>> Analyses based purely on sequence similarity based data, including
>>> sequence similarity with experimentally characterized gene products,
>>> as determined by pairwise or multiple alignment; prediction methods
>>> for non-coding RNA genes; recognized functional domains, as determined
>>> by tools such as InterPro, Pfam, SMART, etc.; predicted protein
>>> features, e.g., transmembrane regions, signal sequence, etc.;
>>> structural similarity with experimentally characterized gene products,
>>> as determined by crystallography, nuclear magnetic resonance, or
>>> computational prediction; should use the ISS evidence code (or the IEA
>>> code if it is not reviewed by a curator). The documentation does not
>>> currently list mapping files such as InterPro2GO, but I would include
>>> this as sequence-only based data since the basic analysis is all based
>>> on the sequence of the gene product and the hits by various sequence
>>> analysis methods.
>>>
>>> As a curator-reviewed code, annotations made with the RCA code must be
>>> reviewed/assigned by a curator.
>>>
>>> The documentation currently lists 'Text-based computation (e.g. text
>>> mining)' as acceptable for this evidence code. In the absence of
>>> specific examples of how this might be applied, I would suggest
>>> removing mention of 'Text-based computation' until we have an actual
>>> example or two to look at to see whether it fits into this evidence
>>> code or not.
>>>
>>> Accepting these recommendations would bascially return the RCA code
>>> to its original intent. It would also be consistent with the
>>> recommendation of the Evidence Code Committee (ECC) to overturn the
>>> 2006 Annotation Camp's recommendation to use RCA for sequence
>>> similarity comparisons where you could not put an experimentally
>>> characterized ortholog into the with column and also with the
>>> January 2007 GOC meeting decision that all methods based on only
>>> sequence-based info should use the ISS code.
>>>
>>> The GOC may not wish to consider renaming the evidence code, but
>>> having reviewed this set of papers, I think the phrase "Integrated
>>> Computational Analysis" would be a more descriptive name and more
>>> consistent with how authors of these types of methods describe them
>>> (the red highlighting in the sample papers page, url below, shows
>>> where the authors used that word). I'm not sure this is sufficient
>>> to make clear the distinction between these methods and
>>> sequence-only based methods, but it is better than "Reviewed
>>> Computational Analysis". In addition, right now the RCA
>>> documentation would exclude an analysis of this type if it was
>>> performed internally by a database group and not published. Thus, if
>>> the GOC is amenable to the idea of changing the name of the evidence
>>> code, I would suggest that we call it "Integrated Computational
>>> Analysis" with the abbreviation ICA.
>>>
>>>
>>> Here are links to supplemental information regarding this evidence
>>> code:
>>>
>>> Examples of the types of analyses the RCA code was intended to cover:
>>>
>>> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html
>>>
>>> History of the RCA code:
>>> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAhistory.html
>>>
>>> Summary of controversy over RCA vs ISS in Evidence Code Committee:
>>>
>>> http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAvsISScontroversy.html
>>>
>>> Proposed draft of new documentation for this code:
>>>
>>> http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#ica
>>> (note that original RCA doc is still present for comparison)
>>>
>>>
>>
--
************************************
Emily Dimmer
GOA Coordinator
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD, U.K.
Tel: +44 1223 494654
Fax: +44 1223 494468
email: edimmer at ebi.ac.uk
************************************
More information about the Go
mailing list