From midori at ebi.ac.uk Sun Dec 2 22:00:04 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Mon, 3 Dec 2007 06:00:04 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200712030600.lB3605P1317487@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071203/08ed5b16/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071203/08ed5b16/attachment.pl From mgiglio at som.umaryland.edu Wed Dec 5 11:44:12 2007 From: mgiglio at som.umaryland.edu (Gwinn Giglio, Michelle) Date: Wed, 5 Dec 2007 14:44:12 -0500 Subject: [annotation] [Fwd:What evidence code to use?] References: <47447B6B.9040502@acoma.stanford.edu> <474CCBD8.6080608@informatics.jax.org> <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> <474DC073.7040301@informatics.jax.org> <474EEAE0.7030506@northwestern.edu> <474EF274.6010101@informatics.jax.org> <474EF798.6020105@acoma.stanford.edu> <474EF9BD.6000308@acoma.stanford.edu> Message-ID: <124A5002E66E5647AEB8D0146B9B2E2E077B0F27@someven.som.umaryland.edu> Hi all, Chiming in REAL late here..... (just started new job this week, finally getting settled in and back to work)... I'm happy to say that I agree with the resolution that was reached here. I think that there seems to be a desire for an evidence code that encompasses a combinatorial sequence-based approach - some call this RCA, some don't. There is also a need for an evidence code that encompasses combinatorial approaches where sequence-based approaches are just one type of information used and experimental may or may not be required - again some people call this RCA some don't. Clearly we need to resolve this and its something that needs to go to the evidence code committee. In the meantime - I think we should go with ISS as the most generic and inclusive code to use for sequence-based evidence and it does not require a something in "with". This is consistent with the new structure decided on in Princeton. Harold and I signed up to work on drafts for the new ISS subcodes - so far I havn't been able to get started on that - but now that I'm at my new job location I plan to get moving on ISA and ISM. I think we will have to tackle the RCA thing at the same time as well. Hopefully you'll be hearing something out of the evidence code committee soon. :) Michelle -----Original Message----- From: owner-annotation at genome.stanford.edu on behalf of Tanya Berardini Sent: Thu 11/29/2007 12:41 PM To: Sue Rhee Cc: Judith Blake; GO Annotation list Subject: Re: [annotation] [Fwd:What evidence code to use?] Ok, they'll stay as ISS without anything in the evidence_with field for now. Thanks everyone. Tanya Sue Rhee wrote: > Tanya: I suggest that you leave it ISS for now. In the new evidence > ontology, Reviewed by Computational Analysis or some generic version of > RCA is likely to be a parent of the generic version of ISS. I haven't > gotten much feedback from the evidence committee on the updated evidence > ontology and will send out the ontology to the whole GO group sometime > next week. > > Sue > > Judith Blake wrote: >> I shouldn't have jumped into this. But.... >> >> ISS for MGI requires that the ISS be backed up with experimental >> data. Clearly, the analysis brought forward does not do that. >> >> RCA from SGD perspective requires experimental data sets. From MGI >> perspective, was used for the FANTOM analysis (only) when the sequence >> analysis was part of expert annotation. MGI has not had much occasion >> to use RCA since the Fantom, and we are gradually removing these. >> >> The argument about ISS was whether it was to be restricted to use with >> orthologs that had experiments or whether it was to include sequence >> analysis and HMM type studies done in the individual organisms. We >> resolved that, I thought, by moving toward ISS with subcodes of ISO >> (for orthology sets) and IS- (I don't remember) for HMMs and other >> supervised sequence analysis. The study brought forward by Tanya >> could be either the ISS (generic sequence analysis) or the other one, >> but certainly these are not backed by experimental data, so with the >> current RCA, these could best, perhaps, be >> >> ISS (generic) but we don't have this implemented yet >> IEA.....why not? well, it's not just an electronic analysis... >> >> Again, these reflects only predictive analysis, there is no >> experimental data, MGI would prefer ISS only be used when backed by >> experimental data (or the new category) and SGD would prefer that RCA >> be restricted to experiment +/- computational analysis using sequence. >> >> In the end, I would like to express my thoughts again that we should >> not drown ourselves in this discussion. By going to the reference or >> by reading MOD supplied abstract, users can determine the predictive >> algorithm source if they want too. One could argue that we spend too >> too much time on sorting this out when we do have group consensus that >> evidence codes are mostly to provide clues to users as to the assay >> generic classes that the annotation is supported by. The reference is >> really the source, and we toe a fine line between just using >> 'experimental' and 'predicted', and providing all the gory details of >> the analysis. >> Cheers, >> Judy >> >> >> >> Pascale Gaudet wrote: >>> But, I thought RCA required experimental data?? >>> >>> From documentation: http://www.geneontology.org/GO.evidence.shtml#ica >>> >>> * Predictions based on computational analyses of large-scale >>> experimental data sets >>> * Predictions based on computational analyses that integrate >>> datasets of several types, including experimental data (e.g. >>> expression data, protein-protein interaction data, genetic >>> interaction data, etc.), sequence data (e.g. promoter sequence, >>> sequence-based structural predictions, etc.), or mathematical >>> models >>> >>> Pascale >>> >>> Judith Blake wrote: >>>> ok with me if we need to make the distinction. I took it to mean >>>> the difference between a simple alignment report and a more >>>> comprehensive analysis. Phylogenetic analyses employ powerful >>>> algorithms, but at the core of the analysis are manually curated >>>> multiple alignments from hundreds of species. These could be RCA >>>> for me. At the end of the day, I think it doesn't matter :) since >>>> all these measures are predictive and not experimental determinations. >>>> >>>> Judy >>>> >>>> >>>> Karen Christie wrote: >>>>> My recollection is that RCA was proposed by SGD to handle papers >>>>> such as Samanta and Liang 2003 (url below) where they did >>>>> computational analysis of large-scale protein interaction data. >>>>> >>>>> http://db.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=S000074191 >>>>> >>>>> >>>>> The original documentation for RCA explicitly stated that it was >>>>> not to be used for sequence data. At the St. Croix meeting, Sue >>>>> Rhee brought up the point that some computational analyses combined >>>>> sequence data into the types of analyses done by Samanta and Liang. >>>>> On that basis, it was agreed that RCA could include sequence data, >>>>> but was not intended for analyses that were entirely sequence based. >>>>> >>>>> -Karen >>>>> >>>>> >>>>> On Wed, 28 Nov 2007, Mike Cherry wrote: >>>>> >>>>>> I believe RCA was proposed by SGD to use with analyzes like Biopixie. >>>>>> >>>>>> Cheers, Mike >>>>>> >>>>>> >>>>>> On Nov 27, 2007, at 9:00 PM, Judith Blake >>>>>> wrote: >>>>>> >>>>>>> This is exactly what RCA was originally used for. With the >>>>>>> FANTOM project [mouse full length cDNA annotatons], participants >>>>>>> employed a series of algorithmic approaches combined with manual >>>>>>> inspection and evaluation to provide annotations. Actually, I >>>>>>> think RCA was created as a result of the FANTOM project. >>>>>>> >>>>>>> Judy >>>>>>> >>>>>>> Tanya Berardini wrote: >>>>>>>> Forwarding this from the evidence code discussion group. >>>>>>>> Apologies to those who are on both lists. I've sorted the >>>>>>>> emails from top to bottom in chronological order for easier >>>>>>>> reading: >>>>>>>> >>>>>>>> ---------- >>>>>>>> My original email: >>>>>>>> >>>>>>>>> Ah, the eternal question: Is it ISS, is it RCA? >>>>>>>>> >>>>>>>>> I've got a paper that describes the identification of a nice >>>>>>>>> big set >>>>>>>>> of transcription factors in Arabidopsis. >>>>>>>>> >>>>>>>>> http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The authors use a combination of motif searches + BLAST + sequence >>>>>>>>> alignment and review those by eye and came up with 1500 or so >>>>>>>>> genes >>>>>>>>> that they call 'transcription factors.' >>>>>>>>> >>>>>>>>> Right now, we've got these annotated to 'transcription factor >>>>>>>>> activity' with the evidence code ISS but nothing in the >>>>>>>>> evidence_with >>>>>>>>> column. If I leave these as ISS, I'd like to put something in the >>>>>>>>> with column, but what? Does this type of a combination of >>>>>>>>> sequence >>>>>>>>> analysis methods that's reviewed manually make it RCA? Not >>>>>>>>> according >>>>>>>>> to the current RCA documentation: >>>>>>>>> >>>>>>>>> "Examples where the RCA evidence code should not be used: >>>>>>>>> >>>>>>>>> * Annotations based on more than one type of gene product >>>>>>>>> sequence >>>>>>>>> based evidence, including such things as BLAST, profile HMMs, >>>>>>>>> TMHMM, >>>>>>>>> SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. >>>>>>>>> should use the ISS code. " >>>>>>>>> >>>>>>>>> Should I wait till ISS comes to a resolution? >>>>>>>>> >>>>>>>>> Help! >>>>>>>> >>>>>>>> --------- >>>>>>>> Ben's reply: >>>>>>>> >>>>>>>> If you can't put something USEFUL in the WITH column, I think >>>>>>>> this has to be RCA. >>>>>>>> I guess under the new, non-documented system, this would be >>>>>>>> ISS/no "With" ISA/ISO/ISM would require withs... (either seq ids >>>>>>>> or model aka interpro ids). >>>>>>>> >>>>>>>> >>>>>>>> Ben >>>>>>>> >>>>>>>> ---------- >>>>>>>> >>>>>>>> Val's reply: >>>>>>>> >>>>>>>> This is *exactly* the type of data why I was orginally >>>>>>>> suggesting that RCA should not be restricted to analysis which >>>>>>>> include some experimental component. Unfortunately I couldn't >>>>>>>> come up with any good examples at the time. >>>>>>>> >>>>>>>> These would surely be better as RCA, even though they are >>>>>>>> sequence based >>>>>>>> >>>>>>>> Val >>>>>>>> >>>>>>>> ---------- >>>>>>>> >>>>>>>> Susan's reply: >>>>>>>> >>>>>>>> I've just hit another example... >>>>>>>> >>>>>>>> Enhanced function annotations for Drosophila serine proteases: A >>>>>>>> case >>>>>>>> study for >>>>>>>> systematic annotation of multi-member gene families. >>>>>>>> >>>>>>>> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, >>>>>>>> Rodrigues V, >>>>>>>> White KP, Bork P, Sowdhamini R. >>>>>>>> >>>>>>>> PMID: 17996400 >>>>>>>> >>>>>>>> This is a functional classification of serine proteases based on a >>>>>>>> 'function residue clustering' algorithm. The algorithm >>>>>>>> incorporates info >>>>>>>> from sequence alignments, hydrophobicity plots and info about key >>>>>>>> residues from 3D structures - all sequence based but no one >>>>>>>> thing to put >>>>>>>> in the 'with'. >>>>>>>> >>>>>>>> Susan >>>>>>>> >>>>>>>> ----------- >>>>>>>> >>>>>>>> Pascale's reply: >>>>>>>> >>>>>>>> Tanya, >>>>>>>> >>>>>>>> I thought we agreed that BLAST and InterPro were ISS, as you >>>>>>>> point out. I don't think ISS + ISS = RCA?? That is, I would say >>>>>>>> using InterPro or the BLAST result should be enough to make the >>>>>>>> annotation; we dont need to capture both? In this case, the >>>>>>>> easiest might be using ISS with an InterPro domain ID in the >>>>>>>> 'with', >>>>>>>> >>>>>>>> Similarly in the paper Susan cites, they mention several domains >>>>>>>> and also they have compared to several proteins whose 3D >>>>>>>> structure has been determined hence can be used in the 'with' - >>>>>>>> I would pick one of those example proteins and ISS to that. >>>>>>>> >>>>>>>> Pascale >>>>>>>> >>>>>>>> --------- >>>>>>>> >>>>>>>> Any other thoughts? >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Tanya >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -------- Original Message -------- >>>>>>>> Subject: Re: [evidence] What evidence code to use? >>>>>>>> Date: Wed, 21 Nov 2007 08:43:16 -0500 >>>>>>>> From: Pascale Gaudet >>>>>>>> Reply-To: pgaudet at northwestern.edu >>>>>>>> Organization: Northwestern University >>>>>>>> To: tberardi at acoma.stanford.edu >>>>>>>> CC: evidence at genome.stanford.edu >>>>>>>> References: <47437C88.5070204 at acoma.stanford.edu> >>>>>>>> >>>>>>>> Tanya, >>>>>>>> >>>>>>>> I thought we agreed that BLAST and InterPro were ISS, as you >>>>>>>> point out. >>>>>>>> I don't think ISS + ISS = RCA?? That is, I would say using >>>>>>>> InterPro or >>>>>>>> the BLAST result should be enough to make the annotation; we >>>>>>>> dont need >>>>>>>> to capture both? In this case, the easiest might be using ISS >>>>>>>> with an >>>>>>>> InterPro domain ID in the 'with', >>>>>>>> >>>>>>>> Similarly in the paper Susan cites, they mention several domains >>>>>>>> and >>>>>>>> also they have compared to several proteins whose 3D structure >>>>>>>> has been >>>>>>>> determined hence can be used in the 'with' - I would pick one of >>>>>>>> those >>>>>>>> example proteins and ISS to that. >>>>>>>> >>>>>>>> Pascale >>>>>>>> >>>>>>>> >>>>>>>>> ------------------------------------------------------------------------------------------ >>>>>>>>> >>>>>>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>>>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>>>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>>>>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>>>>>> 260 Panama St. >>>>>>>>> Stanford, CA 94305 >>>>>>>>> ------------------------------------------------------------------------------------------ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >>>> >>>> >>> >>> -- >>> ~~~~~~~~~~~~~~~~~~~ >>> Pascale Gaudet, PhD >>> Scientific Curator, dictyBase >>> Northwestern University, Chicago, IL >>> pgaudet at northwestern.edu >>> www.dictybase.org >>> ~~~~~~~~~~~~~~~~~~ > -- ------------------------------------------------------------------------------------------ Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu The Arabidopsis Information Resource FAX: (650) 325-6857 Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 Department of Plant Biology URL: http://arabidopsis.org/ 260 Panama St. Stanford, CA 94305 ------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071205/0ca755ca/attachment.html From midori at ebi.ac.uk Mon Dec 10 22:00:05 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Tue, 11 Dec 2007 06:00:05 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200712110600.lBB60531265095@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071211/6a27987e/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071211/6a27987e/attachment.pl From midori at ebi.ac.uk Tue Dec 18 22:00:04 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Wed, 19 Dec 2007 06:00:04 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200712190600.lBJ605L1227568@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071219/8ecc7c9e/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071219/8ecc7c9e/attachment.pl From rama at genome.Stanford.EDU Wed Dec 19 12:50:35 2007 From: rama at genome.Stanford.EDU (Rama Balakrishnan) Date: Wed, 19 Dec 2007 12:50:35 -0800 Subject: [annotation] question about use of 'colocalizes with' Message-ID: Hi All, We have a question about the use of 'colocalizes with' qualifier. We are curating- PMID: 16713564 Downregulation of PP2A(Cdc55) phosphatase by separase initiates mitotic exit in budding yeast. Queralt E, Lehane C, Novak B, Uhlmann F. Cell. 2006 May 19;125(4):719-32. In the section titled " Separase-Dependent Downregulation of PP2ACdc55 at Anaphase Onset" the authors say that 'Colocalization with Net1 revealed nucleolar enrichment of Cdc55 in metaphase'.... (I have highlighted this section in the attached pdf). The figure legend for Fig 5A says 'Cdc55 localization in the nucleolus'. Should Cdc55 be annotated to 'nucleolus' directly or to 'colocalizes with nucleolus'? The documentation on how to use this qualifier should be updated with more examples. http://www.geneontology.org/GO.annotation.conventions.shtml#colocalizes_with Thanks for your help, Rama From midori at ebi.ac.uk Mon Dec 31 22:00:05 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Tue, 1 Jan 2008 06:00:05 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200801010600.m01605l1194130@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20080101/87c7de9f/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20080101/87c7de9f/attachment.pl