[annotation] [Fwd:What evidence code to use?]

Gwinn Giglio, Michelle mgiglio at som.umaryland.edu
Wed Dec 5 11:44:12 PST 2007




Hi all,

Chiming in REAL late here..... (just started new job this week, finally getting settled in and back to work)...

I'm happy to say that I agree with the resolution that was reached here.  

I think that there seems to be a desire for an evidence code that encompasses a combinatorial sequence-based approach - some call this RCA, some don't.  There is also a need for an evidence code that encompasses combinatorial approaches where sequence-based approaches are just one type of information used and experimental may or may not be required - again some people call this RCA some don't.  

Clearly we need to resolve this and its something that needs to go to the evidence code committee.

In the meantime - I think we should go with ISS as the most generic and inclusive code to use for sequence-based evidence and it does not require a something in "with".  This is consistent with the new structure decided on in Princeton.  

Harold and I signed up to work on drafts for the new ISS subcodes - so far I havn't been able to get started on that - but now that I'm at my new job location I plan to get moving on ISA and ISM.  I think we will have to tackle the RCA thing at the same time as well.

Hopefully you'll be hearing something out of the evidence code committee soon.  :)

Michelle




-----Original Message-----
From: owner-annotation at genome.stanford.edu on behalf of Tanya Berardini
Sent: Thu 11/29/2007 12:41 PM
To: Sue Rhee
Cc: Judith Blake; GO Annotation list
Subject: Re: [annotation] [Fwd:What evidence code to use?]
 
Ok, they'll stay as ISS without anything in the evidence_with field for now.

Thanks everyone.

Tanya


Sue Rhee wrote:
> Tanya: I suggest that you leave it ISS for now. In the new evidence 
> ontology, Reviewed by Computational Analysis or some generic version of 
> RCA is likely to be a parent of the generic version of ISS. I haven't 
> gotten much feedback from the evidence committee on the updated evidence 
> ontology and will send out the ontology to the whole GO group sometime 
> next week.
> 
> Sue
> 
> Judith Blake wrote:
>> I shouldn't have jumped into this.  But....
>>
>> ISS for MGI requires that the ISS be backed up with experimental 
>> data.  Clearly, the analysis brought forward does not do that.
>>
>> RCA from SGD perspective requires experimental data sets.  From MGI 
>> perspective, was used for the FANTOM analysis (only) when the sequence 
>> analysis was part of expert annotation.  MGI has not had much occasion 
>> to use RCA since the Fantom, and we are gradually removing these.
>>
>> The argument about ISS was whether it was to be restricted to use with 
>> orthologs that had experiments or whether it was to include sequence 
>> analysis and HMM type studies done in the individual organisms.  We 
>> resolved that, I thought, by moving toward ISS with subcodes of ISO 
>> (for orthology sets) and IS- (I don't remember) for HMMs and other 
>> supervised sequence analysis.  The study brought forward by Tanya 
>> could be either the  ISS (generic sequence analysis) or the other one, 
>> but certainly these are not backed by experimental data, so with the 
>> current RCA, these could best, perhaps, be
>>
>> ISS (generic) but we don't have this implemented yet
>> IEA.....why not?  well, it's not just an electronic analysis...
>>
>> Again, these reflects only predictive analysis, there is no 
>> experimental data, MGI would prefer ISS only be used when backed by 
>> experimental data (or the new category) and SGD would prefer that RCA 
>> be restricted to experiment +/- computational analysis using sequence.
>>
>> In the end, I would like to express my thoughts again that we should 
>> not drown ourselves in this discussion.  By going to the reference or 
>> by reading MOD supplied abstract, users can determine the predictive 
>> algorithm source if they want too.  One could argue that we spend too 
>> too much time on sorting this out when we do have group consensus that 
>> evidence codes are mostly to provide clues to users as to the assay 
>> generic classes that the annotation is supported by.  The reference is 
>> really the source, and we toe a fine line between just using 
>> 'experimental' and 'predicted', and providing all the gory details of 
>> the analysis.
>> Cheers,
>> Judy
>>
>>
>>
>> Pascale Gaudet wrote:
>>> But, I thought RCA required experimental data??
>>>
>>> From documentation: http://www.geneontology.org/GO.evidence.shtml#ica
>>>
>>>     * Predictions based on computational analyses of large-scale
>>>       experimental data sets
>>>     * Predictions based on computational analyses that integrate
>>>       datasets of several types, including experimental data (e.g.
>>>       expression data, protein-protein interaction data, genetic
>>>       interaction data, etc.), sequence data (e.g. promoter sequence,
>>>       sequence-based structural predictions, etc.), or mathematical
>>>       models
>>>
>>> Pascale
>>>
>>> Judith Blake wrote:
>>>> ok with me if we need to make the distinction.  I took it to mean 
>>>> the difference between a simple  alignment report and a more 
>>>> comprehensive analysis.  Phylogenetic analyses employ powerful 
>>>> algorithms, but at the core of the analysis are manually curated 
>>>> multiple alignments from hundreds of species.  These could be RCA 
>>>> for me.  At the end of the day, I think it doesn't matter :) since 
>>>> all these measures are predictive and not experimental determinations.
>>>>
>>>> Judy
>>>>
>>>>
>>>> Karen Christie wrote:
>>>>> My recollection is that RCA was proposed by SGD to handle papers 
>>>>> such as Samanta and Liang 2003 (url below) where they did 
>>>>> computational analysis of large-scale protein interaction data.
>>>>>
>>>>> http://db.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=S000074191 
>>>>>
>>>>>
>>>>> The original documentation for RCA explicitly stated that it was 
>>>>> not to be used for sequence data. At the St. Croix meeting, Sue 
>>>>> Rhee brought up the point that some computational analyses combined 
>>>>> sequence data into the types of analyses done by Samanta and Liang. 
>>>>> On that basis, it was agreed that RCA could include sequence data, 
>>>>> but was not intended for analyses that were entirely sequence based.
>>>>>
>>>>> -Karen
>>>>>
>>>>>
>>>>> On Wed, 28 Nov 2007, Mike Cherry wrote:
>>>>>
>>>>>> I believe RCA was proposed by SGD to use with analyzes like Biopixie.
>>>>>>
>>>>>> Cheers, Mike
>>>>>>
>>>>>>
>>>>>> On Nov 27, 2007, at 9:00 PM, Judith Blake 
>>>>>> <jblake at informatics.jax.org> wrote:
>>>>>>
>>>>>>> This is exactly what RCA was originally used for.  With the 
>>>>>>> FANTOM project [mouse full length cDNA annotatons], participants 
>>>>>>> employed a series of algorithmic approaches combined with manual 
>>>>>>> inspection and evaluation to provide annotations.  Actually, I 
>>>>>>> think RCA was created as a result of the FANTOM project.
>>>>>>>
>>>>>>> Judy
>>>>>>>
>>>>>>> Tanya Berardini wrote:
>>>>>>>> Forwarding this from the evidence code discussion group. 
>>>>>>>> Apologies to those who are on both lists.  I've sorted the 
>>>>>>>> emails from top to bottom in chronological order for easier 
>>>>>>>> reading:
>>>>>>>>
>>>>>>>> ----------
>>>>>>>> My original email:
>>>>>>>>
>>>>>>>>> Ah, the eternal question:  Is it ISS, is it RCA?
>>>>>>>>>
>>>>>>>>> I've got a paper that describes the identification of a nice 
>>>>>>>>> big set
>>>>>>>>> of transcription factors in Arabidopsis.
>>>>>>>>>
>>>>>>>>> http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The authors use a combination of motif searches + BLAST + sequence
>>>>>>>>> alignment and review those by eye and came up with 1500 or so 
>>>>>>>>> genes
>>>>>>>>> that they call 'transcription factors.'
>>>>>>>>>
>>>>>>>>> Right now, we've got these annotated to 'transcription factor
>>>>>>>>> activity' with the evidence code ISS but nothing in the 
>>>>>>>>> evidence_with
>>>>>>>>> column.  If I leave these as ISS, I'd like to put something in the
>>>>>>>>> with column, but what?  Does this type of a combination of 
>>>>>>>>> sequence
>>>>>>>>> analysis methods that's reviewed manually make it RCA?  Not 
>>>>>>>>> according
>>>>>>>>> to the current RCA documentation:
>>>>>>>>>
>>>>>>>>> "Examples where the RCA evidence code should not be used:
>>>>>>>>>
>>>>>>>>>     * Annotations based on more than one type of gene product 
>>>>>>>>> sequence
>>>>>>>>> based evidence, including such things as BLAST, profile HMMs, 
>>>>>>>>> TMHMM,
>>>>>>>>> SignalP, PROSITE, InterPro, mapping files such as interpro2go etc.
>>>>>>>>> should use the ISS code. "
>>>>>>>>>
>>>>>>>>> Should I wait till ISS comes to a resolution?
>>>>>>>>>
>>>>>>>>> Help!
>>>>>>>>
>>>>>>>> ---------
>>>>>>>> Ben's reply:
>>>>>>>>
>>>>>>>> If you can't put something USEFUL in the WITH column, I think 
>>>>>>>> this has to be RCA.
>>>>>>>> I guess under the new, non-documented system, this would be 
>>>>>>>> ISS/no "With" ISA/ISO/ISM would require withs... (either seq ids 
>>>>>>>> or model aka interpro ids).
>>>>>>>>
>>>>>>>>
>>>>>>>> Ben
>>>>>>>>
>>>>>>>> ----------
>>>>>>>>
>>>>>>>> Val's reply:
>>>>>>>>
>>>>>>>> This is *exactly* the type of data why I was orginally 
>>>>>>>> suggesting that RCA should not be restricted to analysis which 
>>>>>>>> include some experimental component.  Unfortunately I couldn't 
>>>>>>>> come up with any good examples at the time.
>>>>>>>>
>>>>>>>> These would surely be  better as RCA, even though they are 
>>>>>>>> sequence based
>>>>>>>>
>>>>>>>> Val
>>>>>>>>
>>>>>>>> ----------
>>>>>>>>
>>>>>>>> Susan's reply:
>>>>>>>>
>>>>>>>> I've just hit another example...
>>>>>>>>
>>>>>>>> Enhanced function annotations for Drosophila serine proteases: A 
>>>>>>>> case
>>>>>>>> study for
>>>>>>>> systematic annotation of multi-member gene families.
>>>>>>>>
>>>>>>>> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, 
>>>>>>>> Rodrigues V,
>>>>>>>> White KP, Bork P, Sowdhamini R.
>>>>>>>>
>>>>>>>> PMID: 17996400
>>>>>>>>
>>>>>>>> This is a functional classification of serine proteases based on a
>>>>>>>> 'function residue clustering' algorithm. The algorithm 
>>>>>>>> incorporates info
>>>>>>>> from sequence alignments, hydrophobicity plots and info about key
>>>>>>>> residues from 3D structures - all sequence based but no one 
>>>>>>>> thing to put
>>>>>>>> in the 'with'.
>>>>>>>>
>>>>>>>> Susan
>>>>>>>>
>>>>>>>> -----------
>>>>>>>>
>>>>>>>> Pascale's reply:
>>>>>>>>
>>>>>>>> Tanya,
>>>>>>>>
>>>>>>>> I thought we agreed that BLAST and InterPro were ISS, as you 
>>>>>>>> point out. I don't think ISS + ISS = RCA?? That is, I would say 
>>>>>>>> using InterPro or the BLAST result should be enough to make the 
>>>>>>>> annotation; we dont need to capture both? In this case, the 
>>>>>>>> easiest might be using ISS with an InterPro domain ID in the 
>>>>>>>> 'with',
>>>>>>>>
>>>>>>>> Similarly in the paper Susan cites, they mention several domains 
>>>>>>>> and also they have compared to several proteins whose 3D 
>>>>>>>> structure has been determined hence can be used in the 'with' - 
>>>>>>>> I would pick one of those example proteins and ISS to that.
>>>>>>>>
>>>>>>>> Pascale
>>>>>>>>
>>>>>>>> ---------
>>>>>>>>
>>>>>>>> Any other thoughts?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Tanya
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -------- Original Message --------
>>>>>>>> Subject: Re: [evidence] What evidence code to use?
>>>>>>>> Date: Wed, 21 Nov 2007 08:43:16 -0500
>>>>>>>> From: Pascale Gaudet <pgaudet at northwestern.edu>
>>>>>>>> Reply-To: pgaudet at northwestern.edu
>>>>>>>> Organization: Northwestern University
>>>>>>>> To: tberardi at acoma.stanford.edu
>>>>>>>> CC: evidence at genome.stanford.edu
>>>>>>>> References: <47437C88.5070204 at acoma.stanford.edu>
>>>>>>>>
>>>>>>>> Tanya,
>>>>>>>>
>>>>>>>> I thought we agreed that BLAST and InterPro were ISS, as you 
>>>>>>>> point out.
>>>>>>>> I don't think ISS + ISS = RCA?? That is, I would say using 
>>>>>>>> InterPro or
>>>>>>>> the BLAST result should be enough to make the annotation; we 
>>>>>>>> dont need
>>>>>>>> to capture both? In this case, the easiest might be using ISS 
>>>>>>>> with an
>>>>>>>> InterPro domain ID in the 'with',
>>>>>>>>
>>>>>>>> Similarly in the paper Susan cites, they mention several domains 
>>>>>>>> and
>>>>>>>> also they have compared to several proteins whose 3D structure 
>>>>>>>> has been
>>>>>>>> determined hence can be used in the 'with' - I would pick one of 
>>>>>>>> those
>>>>>>>> example proteins and ISS to that.
>>>>>>>>
>>>>>>>> Pascale
>>>>>>>>
>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------------------------ 
>>>>>>>>>
>>>>>>>>> Tanya Berardini, Ph.D.            tberardi at acoma.stanford.edu
>>>>>>>>> The Arabidopsis Information Resource    FAX: (650) 325-6857
>>>>>>>>> Carnegie Institution of Washington    Tel: (650) 325-1521 ext. 325
>>>>>>>>> Department of Plant Biology        URL: http://arabidopsis.org/
>>>>>>>>> 260 Panama St.
>>>>>>>>> Stanford, CA 94305
>>>>>>>>> ------------------------------------------------------------------------------------------ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> ~~~~~~~~~~~~~~~~~~~
>>> Pascale Gaudet, PhD
>>> Scientific Curator, dictyBase
>>> Northwestern University, Chicago, IL
>>> pgaudet at northwestern.edu
>>> www.dictybase.org
>>> ~~~~~~~~~~~~~~~~~~
> 

-- 
------------------------------------------------------------------------------------------
Tanya Berardini, Ph.D.            tberardi at acoma.stanford.edu
The Arabidopsis Information Resource    FAX: (650) 325-6857
Carnegie Institution of Washington    Tel: (650) 325-1521 ext. 325
Department of Plant Biology        URL: http://arabidopsis.org/
260 Panama St.
Stanford, CA 94305
------------------------------------------------------------------------------------------


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071205/0ca755ca/attachment.html 


More information about the Annotation mailing list