[annotation] [Fwd:What evidence code to use?]
Sue Rhee
rhee at acoma.Stanford.EDU
Thu Nov 29 09:32:08 PST 2007
Tanya: I suggest that you leave it ISS for now. In the new evidence
ontology, Reviewed by Computational Analysis or some generic version of
RCA is likely to be a parent of the generic version of ISS. I haven't
gotten much feedback from the evidence committee on the updated evidence
ontology and will send out the ontology to the whole GO group sometime
next week.
Sue
Judith Blake wrote:
> I shouldn't have jumped into this. But....
>
> ISS for MGI requires that the ISS be backed up with experimental
> data. Clearly, the analysis brought forward does not do that.
>
> RCA from SGD perspective requires experimental data sets. From MGI
> perspective, was used for the FANTOM analysis (only) when the sequence
> analysis was part of expert annotation. MGI has not had much occasion
> to use RCA since the Fantom, and we are gradually removing these.
>
> The argument about ISS was whether it was to be restricted to use with
> orthologs that had experiments or whether it was to include sequence
> analysis and HMM type studies done in the individual organisms.
> We resolved that, I thought, by moving toward ISS with subcodes of ISO
> (for orthology sets) and IS- (I don't remember) for HMMs and other
> supervised sequence analysis. The study brought forward by Tanya
> could be either the ISS (generic sequence analysis) or the other one,
> but certainly these are not backed by experimental data, so with the
> current RCA, these could best, perhaps, be
>
> ISS (generic) but we don't have this implemented yet
> IEA.....why not? well, it's not just an electronic analysis...
>
> Again, these reflects only predictive analysis, there is no
> experimental data, MGI would prefer ISS only be used when backed by
> experimental data (or the new category) and SGD would prefer that RCA
> be restricted to experiment +/- computational analysis using sequence.
>
> In the end, I would like to express my thoughts again that we should
> not drown ourselves in this discussion. By going to the reference or
> by reading MOD supplied abstract, users can determine the predictive
> algorithm source if they want too. One could argue that we spend too
> too much time on sorting this out when we do have group consensus that
> evidence codes are mostly to provide clues to users as to the assay
> generic classes that the annotation is supported by. The reference is
> really the source, and we toe a fine line between just using
> 'experimental' and 'predicted', and providing all the gory details of
> the analysis.
> Cheers,
> Judy
>
>
>
> Pascale Gaudet wrote:
>> But, I thought RCA required experimental data??
>>
>> From documentation: http://www.geneontology.org/GO.evidence.shtml#ica
>>
>> * Predictions based on computational analyses of large-scale
>> experimental data sets
>> * Predictions based on computational analyses that integrate
>> datasets of several types, including experimental data (e.g.
>> expression data, protein-protein interaction data, genetic
>> interaction data, etc.), sequence data (e.g. promoter sequence,
>> sequence-based structural predictions, etc.), or mathematical
>> models
>>
>> Pascale
>>
>> Judith Blake wrote:
>>> ok with me if we need to make the distinction. I took it to mean
>>> the difference between a simple alignment report and a more
>>> comprehensive analysis. Phylogenetic analyses employ powerful
>>> algorithms, but at the core of the analysis are manually curated
>>> multiple alignments from hundreds of species. These could be RCA
>>> for me. At the end of the day, I think it doesn't matter :) since
>>> all these measures are predictive and not experimental determinations.
>>>
>>> Judy
>>>
>>>
>>> Karen Christie wrote:
>>>> My recollection is that RCA was proposed by SGD to handle papers
>>>> such as Samanta and Liang 2003 (url below) where they did
>>>> computational analysis of large-scale protein interaction data.
>>>>
>>>> http://db.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=S000074191
>>>>
>>>>
>>>> The original documentation for RCA explicitly stated that it was
>>>> not to be used for sequence data. At the St. Croix meeting, Sue
>>>> Rhee brought up the point that some computational analyses combined
>>>> sequence data into the types of analyses done by Samanta and Liang.
>>>> On that basis, it was agreed that RCA could include sequence data,
>>>> but was not intended for analyses that were entirely sequence based.
>>>>
>>>> -Karen
>>>>
>>>>
>>>> On Wed, 28 Nov 2007, Mike Cherry wrote:
>>>>
>>>>> I believe RCA was proposed by SGD to use with analyzes like Biopixie.
>>>>>
>>>>> Cheers, Mike
>>>>>
>>>>>
>>>>> On Nov 27, 2007, at 9:00 PM, Judith Blake
>>>>> <jblake at informatics.jax.org> wrote:
>>>>>
>>>>>> This is exactly what RCA was originally used for. With the
>>>>>> FANTOM project [mouse full length cDNA annotatons], participants
>>>>>> employed a series of algorithmic approaches combined with manual
>>>>>> inspection and evaluation to provide annotations. Actually, I
>>>>>> think RCA was created as a result of the FANTOM project.
>>>>>>
>>>>>> Judy
>>>>>>
>>>>>> Tanya Berardini wrote:
>>>>>>> Forwarding this from the evidence code discussion group.
>>>>>>> Apologies to those who are on both lists. I've sorted the
>>>>>>> emails from top to bottom in chronological order for easier
>>>>>>> reading:
>>>>>>>
>>>>>>> ----------
>>>>>>> My original email:
>>>>>>>
>>>>>>>> Ah, the eternal question: Is it ISS, is it RCA?
>>>>>>>>
>>>>>>>> I've got a paper that describes the identification of a nice
>>>>>>>> big set
>>>>>>>> of transcription factors in Arabidopsis.
>>>>>>>>
>>>>>>>> http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The authors use a combination of motif searches + BLAST + sequence
>>>>>>>> alignment and review those by eye and came up with 1500 or so
>>>>>>>> genes
>>>>>>>> that they call 'transcription factors.'
>>>>>>>>
>>>>>>>> Right now, we've got these annotated to 'transcription factor
>>>>>>>> activity' with the evidence code ISS but nothing in the
>>>>>>>> evidence_with
>>>>>>>> column. If I leave these as ISS, I'd like to put something in the
>>>>>>>> with column, but what? Does this type of a combination of
>>>>>>>> sequence
>>>>>>>> analysis methods that's reviewed manually make it RCA? Not
>>>>>>>> according
>>>>>>>> to the current RCA documentation:
>>>>>>>>
>>>>>>>> "Examples where the RCA evidence code should not be used:
>>>>>>>>
>>>>>>>> * Annotations based on more than one type of gene product
>>>>>>>> sequence
>>>>>>>> based evidence, including such things as BLAST, profile HMMs,
>>>>>>>> TMHMM,
>>>>>>>> SignalP, PROSITE, InterPro, mapping files such as interpro2go etc.
>>>>>>>> should use the ISS code. "
>>>>>>>>
>>>>>>>> Should I wait till ISS comes to a resolution?
>>>>>>>>
>>>>>>>> Help!
>>>>>>>
>>>>>>> ---------
>>>>>>> Ben's reply:
>>>>>>>
>>>>>>> If you can't put something USEFUL in the WITH column, I think
>>>>>>> this has to be RCA.
>>>>>>> I guess under the new, non-documented system, this would be
>>>>>>> ISS/no "With" ISA/ISO/ISM would require withs... (either seq ids
>>>>>>> or model aka interpro ids).
>>>>>>>
>>>>>>>
>>>>>>> Ben
>>>>>>>
>>>>>>> ----------
>>>>>>>
>>>>>>> Val's reply:
>>>>>>>
>>>>>>> This is *exactly* the type of data why I was orginally
>>>>>>> suggesting that RCA should not be restricted to analysis which
>>>>>>> include some experimental component. Unfortunately I couldn't
>>>>>>> come up with any good examples at the time.
>>>>>>>
>>>>>>> These would surely be better as RCA, even though they are
>>>>>>> sequence based
>>>>>>>
>>>>>>> Val
>>>>>>>
>>>>>>> ----------
>>>>>>>
>>>>>>> Susan's reply:
>>>>>>>
>>>>>>> I've just hit another example...
>>>>>>>
>>>>>>> Enhanced function annotations for Drosophila serine proteases: A
>>>>>>> case
>>>>>>> study for
>>>>>>> systematic annotation of multi-member gene families.
>>>>>>>
>>>>>>> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE,
>>>>>>> Rodrigues V,
>>>>>>> White KP, Bork P, Sowdhamini R.
>>>>>>>
>>>>>>> PMID: 17996400
>>>>>>>
>>>>>>> This is a functional classification of serine proteases based on a
>>>>>>> 'function residue clustering' algorithm. The algorithm
>>>>>>> incorporates info
>>>>>>> from sequence alignments, hydrophobicity plots and info about key
>>>>>>> residues from 3D structures - all sequence based but no one
>>>>>>> thing to put
>>>>>>> in the 'with'.
>>>>>>>
>>>>>>> Susan
>>>>>>>
>>>>>>> -----------
>>>>>>>
>>>>>>> Pascale's reply:
>>>>>>>
>>>>>>> Tanya,
>>>>>>>
>>>>>>> I thought we agreed that BLAST and InterPro were ISS, as you
>>>>>>> point out. I don't think ISS + ISS = RCA?? That is, I would say
>>>>>>> using InterPro or the BLAST result should be enough to make the
>>>>>>> annotation; we dont need to capture both? In this case, the
>>>>>>> easiest might be using ISS with an InterPro domain ID in the
>>>>>>> 'with',
>>>>>>>
>>>>>>> Similarly in the paper Susan cites, they mention several domains
>>>>>>> and also they have compared to several proteins whose 3D
>>>>>>> structure has been determined hence can be used in the 'with' -
>>>>>>> I would pick one of those example proteins and ISS to that.
>>>>>>>
>>>>>>> Pascale
>>>>>>>
>>>>>>> ---------
>>>>>>>
>>>>>>> Any other thoughts?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Tanya
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -------- Original Message --------
>>>>>>> Subject: Re: [evidence] What evidence code to use?
>>>>>>> Date: Wed, 21 Nov 2007 08:43:16 -0500
>>>>>>> From: Pascale Gaudet <pgaudet at northwestern.edu>
>>>>>>> Reply-To: pgaudet at northwestern.edu
>>>>>>> Organization: Northwestern University
>>>>>>> To: tberardi at acoma.stanford.edu
>>>>>>> CC: evidence at genome.stanford.edu
>>>>>>> References: <47437C88.5070204 at acoma.stanford.edu>
>>>>>>>
>>>>>>> Tanya,
>>>>>>>
>>>>>>> I thought we agreed that BLAST and InterPro were ISS, as you
>>>>>>> point out.
>>>>>>> I don't think ISS + ISS = RCA?? That is, I would say using
>>>>>>> InterPro or
>>>>>>> the BLAST result should be enough to make the annotation; we
>>>>>>> dont need
>>>>>>> to capture both? In this case, the easiest might be using ISS
>>>>>>> with an
>>>>>>> InterPro domain ID in the 'with',
>>>>>>>
>>>>>>> Similarly in the paper Susan cites, they mention several domains
>>>>>>> and
>>>>>>> also they have compared to several proteins whose 3D structure
>>>>>>> has been
>>>>>>> determined hence can be used in the 'with' - I would pick one of
>>>>>>> those
>>>>>>> example proteins and ISS to that.
>>>>>>>
>>>>>>> Pascale
>>>>>>>
>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu
>>>>>>>> The Arabidopsis Information Resource FAX: (650) 325-6857
>>>>>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325
>>>>>>>> Department of Plant Biology URL: http://arabidopsis.org/
>>>>>>>> 260 Panama St.
>>>>>>>> Stanford, CA 94305
>>>>>>>> ------------------------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>>
>>>
>>
>> --
>> ~~~~~~~~~~~~~~~~~~~
>> Pascale Gaudet, PhD
>> Scientific Curator, dictyBase
>> Northwestern University, Chicago, IL
>> pgaudet at northwestern.edu
>> www.dictybase.org
>> ~~~~~~~~~~~~~~~~~~
--
Sue Rhee
Staff Scientist
Carnegie Institution, Department of Plant Biology
260 Panama Street, Stanford, CA 94305
Email: (650) 325-1521 x251
Fax: (650) 325-6857
More information about the Annotation
mailing list