From jdeegan at ebi.ac.uk Thu Nov 1 04:05:13 2007 From: jdeegan at ebi.ac.uk (Jennifer Deegan (nee Clark)) Date: Thu, 01 Nov 2007 11:05:13 +0000 Subject: [annotation] transport anntations Message-ID: <4729B2E9.1040202@ebi.ac.uk> Hi, During the transport work we made, and then fixed, a small incorrect edit that altered some annotations. As a consequence, we think you might possibly want to adjust your annotations to make them more granular. If you do not make the adjustment, the annotations will still be correct, they will just be less granular. In more detail, during the transport work we erroneously merged a child term into its parent, and subsequently made a new child term that exactly replaced the term that had been subsumed in the merge. The annotations that were moved as a consequence of the merge are not now wrong, but they are no longer as granular as they were before. Specifically, the old term 'connexon channel activity' ; GO:0015285 has been subsumed into gap junction channel activity ; GO:0005243, and then connexon channel activity ; GO:0015285 was replaced with the new term 'gap junction hemi-channel activity' ; GO:0055077 which is a part_of child of 'gap junction channel activity' ; GO:0005243. If you would like to take the annotations that were once to 'connexon channel activity' ; GO:0015285 and move them to 'gap junction hemi-channel activity '; GO:0055077 then that would put them back to their more granular state. If you leave them on 'gap junction channel activity'; GO:0005243 they will still be correct but will be less granular than they once were. I hope we have not caused too much inconvenience with this edit. Please write back if you have any questions. Best wishes, Jennifer -- Jennifer Deegan (nee Clark) EMBL-European Bioinformatics Institute Gene Ontology Consortium From pgaudet at northwestern.edu Thu Nov 1 07:09:38 2007 From: pgaudet at northwestern.edu (Pascale Gaudet) Date: Thu, 01 Nov 2007 10:09:38 -0400 Subject: [annotation] transport anntations In-Reply-To: <4729B2E9.1040202@ebi.ac.uk> References: <4729B2E9.1040202@ebi.ac.uk> Message-ID: <4729DE22.3040403@northwestern.edu> Jen, Does this information get stored anywhere? I suppose it's too specific to go in the ontology's 'comment' section, but perhaps that's the kind of stuff for the GO wiki? http://dimer.tamu.edu/GO/wiki/index.php/Main_Page Pascale Jennifer Deegan (nee Clark) wrote: > Hi, > > During the transport work we made, and then fixed, a small incorrect > edit that altered some annotations. As a consequence, we think you > might possibly want to adjust your annotations to make them more > granular. If you do not make the adjustment, the annotations will > still be correct, they will just be less granular. > > In more detail, during the transport work we erroneously merged a > child term into its parent, and subsequently made a new child term > that exactly replaced the term that had been subsumed in the merge. > The annotations that were moved as a consequence of the merge are not > now wrong, but they are no longer as granular as they were before. > Specifically, the old term 'connexon channel activity' ; GO:0015285 > has been subsumed into gap junction channel activity ; GO:0005243, and > then connexon channel activity ; GO:0015285 was replaced with the new > term 'gap junction hemi-channel activity' ; GO:0055077 which is a > part_of child of 'gap junction channel activity' ; GO:0005243. If you > would like to take the annotations that were once to 'connexon channel > activity' ; GO:0015285 and move them to 'gap junction hemi-channel > activity '; GO:0055077 then that would put them back to their more > granular state. If you leave them on 'gap junction channel activity'; > GO:0005243 they will still be correct but will be less granular than > they once were. > > I hope we have not caused too much inconvenience with this edit. > Please write back if you have any questions. > > Best wishes, > > Jennifer > -- ~~~~~~~~~~~~~~~~~~~ Pascale Gaudet, PhD Scientific Curator, dictyBase Northwestern University, Chicago, IL pgaudet at northwestern.edu www.dictybase.org ~~~~~~~~~~~~~~~~~~ From jdeegan at ebi.ac.uk Thu Nov 1 07:17:15 2007 From: jdeegan at ebi.ac.uk (Jennifer Deegan (nee Clark)) Date: Thu, 01 Nov 2007 14:17:15 +0000 Subject: [annotation] transport anntations In-Reply-To: <4729DE22.3040403@northwestern.edu> References: <4729B2E9.1040202@ebi.ac.uk> <4729DE22.3040403@northwestern.edu> Message-ID: <4729DFEB.8020705@ebi.ac.uk> Hi Pascale, Good point. I'll put on the transport meeting minutes page of the GO wiki. Jen Pascale Gaudet wrote: > Jen, > > Does this information get stored anywhere? I suppose it's too specific > to go in the ontology's 'comment' section, but perhaps that's the kind > of stuff for the GO wiki? > http://dimer.tamu.edu/GO/wiki/index.php/Main_Page > > Pascale > > Jennifer Deegan (nee Clark) wrote: > >> Hi, >> >> During the transport work we made, and then fixed, a small incorrect >> edit that altered some annotations. As a consequence, we think you >> might possibly want to adjust your annotations to make them more >> granular. If you do not make the adjustment, the annotations will >> still be correct, they will just be less granular. >> >> In more detail, during the transport work we erroneously merged a >> child term into its parent, and subsequently made a new child term >> that exactly replaced the term that had been subsumed in the merge. >> The annotations that were moved as a consequence of the merge are not >> now wrong, but they are no longer as granular as they were before. >> Specifically, the old term 'connexon channel activity' ; GO:0015285 >> has been subsumed into gap junction channel activity ; GO:0005243, >> and then connexon channel activity ; GO:0015285 was replaced with the >> new term 'gap junction hemi-channel activity' ; GO:0055077 which is a >> part_of child of 'gap junction channel activity' ; GO:0005243. If you >> would like to take the annotations that were once to 'connexon >> channel activity' ; GO:0015285 and move them to 'gap junction >> hemi-channel activity '; GO:0055077 then that would put them back to >> their more granular state. If you leave them on 'gap junction channel >> activity'; GO:0005243 they will still be correct but will be less >> granular than they once were. >> >> I hope we have not caused too much inconvenience with this edit. >> Please write back if you have any questions. >> >> Best wishes, >> >> Jennifer >> > -- Jennifer Deegan nee Clark EMBL-European Bioinformatics Institute Gene Ontology Consortium From jdeegan at ebi.ac.uk Thu Nov 1 07:21:59 2007 From: jdeegan at ebi.ac.uk (Jennifer Deegan (nee Clark)) Date: Thu, 01 Nov 2007 14:21:59 +0000 Subject: [annotation] transport anntations In-Reply-To: <4729DFEB.8020705@ebi.ac.uk> References: <4729B2E9.1040202@ebi.ac.uk> <4729DE22.3040403@northwestern.edu> <4729DFEB.8020705@ebi.ac.uk> Message-ID: <4729E107.9080109@ebi.ac.uk> Hi Pascale, I have added the note here on the public wiki: http://wiki.geneontology.org/index.php/Meeting_Notes#Bug_fixing_meeting:_31th_Nov.2C_2007 Thanks, Jen Jennifer Deegan (nee Clark) wrote: > Hi Pascale, > > Good point. I'll put on the transport meeting minutes page of the GO > wiki. > > Jen > > > Pascale Gaudet wrote: > >> Jen, >> >> Does this information get stored anywhere? I suppose it's too >> specific to go in the ontology's 'comment' section, but perhaps >> that's the kind of stuff for the GO wiki? >> http://dimer.tamu.edu/GO/wiki/index.php/Main_Page >> >> Pascale >> >> Jennifer Deegan (nee Clark) wrote: >> >>> Hi, >>> >>> During the transport work we made, and then fixed, a small incorrect >>> edit that altered some annotations. As a consequence, we think you >>> might possibly want to adjust your annotations to make them more >>> granular. If you do not make the adjustment, the annotations will >>> still be correct, they will just be less granular. >>> >>> In more detail, during the transport work we erroneously merged a >>> child term into its parent, and subsequently made a new child term >>> that exactly replaced the term that had been subsumed in the merge. >>> The annotations that were moved as a consequence of the merge are >>> not now wrong, but they are no longer as granular as they were >>> before. Specifically, the old term 'connexon channel activity' ; >>> GO:0015285 has been subsumed into gap junction channel activity ; >>> GO:0005243, and then connexon channel activity ; GO:0015285 was >>> replaced with the new term 'gap junction hemi-channel activity' ; >>> GO:0055077 which is a part_of child of 'gap junction channel >>> activity' ; GO:0005243. If you would like to take the annotations >>> that were once to 'connexon channel activity' ; GO:0015285 and move >>> them to 'gap junction hemi-channel activity '; GO:0055077 then that >>> would put them back to their more granular state. If you leave them >>> on 'gap junction channel activity'; GO:0005243 they will still be >>> correct but will be less granular than they once were. >>> >>> I hope we have not caused too much inconvenience with this edit. >>> Please write back if you have any questions. >>> >>> Best wishes, >>> >>> Jennifer >>> >> > > -- Jennifer Deegan nee Clark EMBL-European Bioinformatics Institute Gene Ontology Consortium From midori at ebi.ac.uk Mon Nov 5 22:00:05 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Tue, 6 Nov 2007 06:00:05 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711060600.lA6606K1314965@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071106/7e26e736/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071106/7e26e736/attachment.pl From midori at ebi.ac.uk Fri Nov 9 22:00:04 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Sat, 10 Nov 2007 06:00:04 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711100600.lAA605F1085055@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071110/51b33455/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071110/51b33455/attachment.pl From midori at ebi.ac.uk Mon Nov 12 22:00:05 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Tue, 13 Nov 2007 06:00:05 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711130600.lAD605Q1221293@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071113/afa4389a/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071113/afa4389a/attachment.pl From val at sanger.ac.uk Tue Nov 13 03:38:37 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Tue, 13 Nov 2007 11:38:37 +0000 Subject: [annotation] annotation heads up pseudouridylate synthase Message-ID: <47398CBD.1030905@sanger.ac.uk> A number of gene products annotated to this term pseudouridylate synthase activity "Note that this term should not be confused with 'pseudouridine synthase activity ; GO:0009982', which refers to the intramolecular isomerization of uridine to pseudouridine." should be abbotated to pseudouridine synthase activity GO:0009982 or its child tRNA-pseudouridine synthase activity (The SGD annotation that a number of groups ISS'd to was updated yesterday) You may want to check and fix..... Val http://amigo.geneontology.org/cgi-bin/amigo/go.cgi?view=assoc&search_constraint=terms&query=GO:0004730&session_id=7288b1194953466 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From midori at ebi.ac.uk Tue Nov 13 22:00:07 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Wed, 14 Nov 2007 06:00:07 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711140600.lAE607s1279499@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071114/31cfdda6/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071114/31cfdda6/attachment.pl From midori at ebi.ac.uk Wed Nov 14 22:00:05 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Thu, 15 Nov 2007 06:00:05 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711150600.lAF606W1349335@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071115/47d6a6e8/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071115/47d6a6e8/attachment.pl From midori at ebi.ac.uk Thu Nov 15 22:00:05 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Fri, 16 Nov 2007 06:00:05 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711160600.lAG605s1399819@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071116/2df17316/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071116/2df17316/attachment.pl From midori at ebi.ac.uk Sun Nov 18 22:00:06 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Mon, 19 Nov 2007 06:00:06 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711190600.lAJ606L1544505@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071119/bdf9c10a/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071119/bdf9c10a/attachment.pl From jdeegan at ebi.ac.uk Tue Nov 20 02:14:52 2007 From: jdeegan at ebi.ac.uk (Jennifer Deegan (nee Clark)) Date: Tue, 20 Nov 2007 10:14:52 +0000 Subject: [annotation] nutrient import ; GO:0009935 Message-ID: <4742B39C.8060701@ebi.ac.uk> Hi, In SF1834028 http://tinyurl.com/355o8o we are thinking about obsoleting nutrient import ; GO:0009935. Does anybody think that this term should be kept on for any reason? We are keen to know people's thoughts before we decide what to do. Thanks, Jen -- Jennifer Deegan (nee Clark) EMBL-European Bioinformatics Institute Gene Ontology Consortium From midori at ebi.ac.uk Tue Nov 20 22:00:05 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Wed, 21 Nov 2007 06:00:05 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711210600.lAL605Q1131712@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071121/93c92d53/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071121/93c92d53/attachment.pl From tberardi at acoma.Stanford.EDU Wed Nov 21 10:39:39 2007 From: tberardi at acoma.Stanford.EDU (Tanya Berardini) Date: Wed, 21 Nov 2007 10:39:39 -0800 Subject: [annotation] [Fwd:What evidence code to use?] Message-ID: <47447B6B.9040502@acoma.stanford.edu> Forwarding this from the evidence code discussion group. Apologies to those who are on both lists. I've sorted the emails from top to bottom in chronological order for easier reading: ---------- My original email: > Ah, the eternal question: Is it ISS, is it RCA? > > I've got a paper that describes the identification of a nice big set > of transcription factors in Arabidopsis. > > http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus > > > > The authors use a combination of motif searches + BLAST + sequence > alignment and review those by eye and came up with 1500 or so genes > that they call 'transcription factors.' > > Right now, we've got these annotated to 'transcription factor > activity' with the evidence code ISS but nothing in the evidence_with > column. If I leave these as ISS, I'd like to put something in the > with column, but what? Does this type of a combination of sequence > analysis methods that's reviewed manually make it RCA? Not according > to the current RCA documentation: > > "Examples where the RCA evidence code should not be used: > > * Annotations based on more than one type of gene product sequence > based evidence, including such things as BLAST, profile HMMs, TMHMM, > SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. > should use the ISS code. " > > Should I wait till ISS comes to a resolution? > > Help! --------- Ben's reply: If you can't put something USEFUL in the WITH column, I think this has to be RCA. I guess under the new, non-documented system, this would be ISS/no "With" ISA/ISO/ISM would require withs... (either seq ids or model aka interpro ids). Ben ---------- Val's reply: This is *exactly* the type of data why I was orginally suggesting that RCA should not be restricted to analysis which include some experimental component. Unfortunately I couldn't come up with any good examples at the time. These would surely be better as RCA, even though they are sequence based Val ---------- Susan's reply: I've just hit another example... Enhanced function annotations for Drosophila serine proteases: A case study for systematic annotation of multi-member gene families. Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, Rodrigues V, White KP, Bork P, Sowdhamini R. PMID: 17996400 This is a functional classification of serine proteases based on a 'function residue clustering' algorithm. The algorithm incorporates info from sequence alignments, hydrophobicity plots and info about key residues from 3D structures - all sequence based but no one thing to put in the 'with'. Susan ----------- Pascale's reply: Tanya, I thought we agreed that BLAST and InterPro were ISS, as you point out. I don't think ISS + ISS = RCA?? That is, I would say using InterPro or the BLAST result should be enough to make the annotation; we dont need to capture both? In this case, the easiest might be using ISS with an InterPro domain ID in the 'with', Similarly in the paper Susan cites, they mention several domains and also they have compared to several proteins whose 3D structure has been determined hence can be used in the 'with' - I would pick one of those example proteins and ISS to that. Pascale --------- Any other thoughts? Thanks, Tanya -------- Original Message -------- Subject: Re: [evidence] What evidence code to use? Date: Wed, 21 Nov 2007 08:43:16 -0500 From: Pascale Gaudet Reply-To: pgaudet at northwestern.edu Organization: Northwestern University To: tberardi at acoma.stanford.edu CC: evidence at genome.stanford.edu References: <47437C88.5070204 at acoma.stanford.edu> Tanya, I thought we agreed that BLAST and InterPro were ISS, as you point out. I don't think ISS + ISS = RCA?? That is, I would say using InterPro or the BLAST result should be enough to make the annotation; we dont need to capture both? In this case, the easiest might be using ISS with an InterPro domain ID in the 'with', Similarly in the paper Susan cites, they mention several domains and also they have compared to several proteins whose 3D structure has been determined hence can be used in the 'with' - I would pick one of those example proteins and ISS to that. Pascale > ------------------------------------------------------------------------------------------ > > Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu > The Arabidopsis Information Resource FAX: (650) 325-6857 > Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 > Department of Plant Biology URL: http://arabidopsis.org/ > 260 Panama St. > Stanford, CA 94305 > ------------------------------------------------------------------------------------------ > > > -- ------------------------------------------------------------------------------------------ Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu The Arabidopsis Information Resource FAX: (650) 325-6857 Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 Department of Plant Biology URL: http://arabidopsis.org/ 260 Panama St. Stanford, CA 94305 ------------------------------------------------------------------------------------------ From hjd at informatics.jax.org Wed Nov 21 11:08:33 2007 From: hjd at informatics.jax.org (Harold Drabkin) Date: Wed, 21 Nov 2007 14:08:33 -0500 Subject: [annotation] [Fwd:What evidence code to use?] In-Reply-To: <47447B6B.9040502@acoma.stanford.edu> References: <47447B6B.9040502@acoma.stanford.edu> Message-ID: <47448231.2000906@informatics.jax.org> We are in the same boat also; Our RCA set (we just have one or two from Riken) are in the same boat. Basically they used a bunch of things which were then examined by "experts' . SO our set has things in the WITH field that was used to make an assignment; sometimes domain, sometimes sequence. The matching and alignments were done and then bucketed by various means (the reference for the paper has some details). There was no attempt to determine if, when a sequence was used, the organism that owned it had an experiment done with it to support the GO term. Definitely not an ISS as we use (backed by experiment in comparison organism). It is based on a computational method (motifs, domains, alignments) to point to something which one of the translation tables spewed a GO term out of. Then a curator looked at it to see if it were reasonable. Unlike our current IEAs where everything is done without any monitoring (which is why some of our IEAs come up with such informative terms as "catalytic activity". We USED to call the Rikens ISS long ago; then changed because there was no insistence on a link to anything experimental. We are uncomfortable changing them to TAS. Presently since they are static (we kill several a month because what's in the WITH is no longer valid (domain no longer in a translation table), we are even tempted to "archive" them. hjd > Forwarding this from the evidence code discussion group. Apologies to > those who are on both lists. I've sorted the emails from top to > bottom in chronological order for easier reading: > > ---------- > My original email: > > > Ah, the eternal question: Is it ISS, is it RCA? > > > > I've got a paper that describes the identification of a nice big set > > of transcription factors in Arabidopsis. > > > > > http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus > > > > > > > > > The authors use a combination of motif searches + BLAST + sequence > > alignment and review those by eye and came up with 1500 or so genes > > that they call 'transcription factors.' > > > > Right now, we've got these annotated to 'transcription factor > > activity' with the evidence code ISS but nothing in the evidence_with > > column. If I leave these as ISS, I'd like to put something in the > > with column, but what? Does this type of a combination of sequence > > analysis methods that's reviewed manually make it RCA? Not according > > to the current RCA documentation: > > > > "Examples where the RCA evidence code should not be used: > > > > * Annotations based on more than one type of gene product sequence > > based evidence, including such things as BLAST, profile HMMs, TMHMM, > > SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. > > should use the ISS code. " > > > > Should I wait till ISS comes to a resolution? > > > > Help! > > --------- > Ben's reply: > > If you can't put something USEFUL in the WITH column, I think this has > to be RCA. > I guess under the new, non-documented system, this would be ISS/no > "With" ISA/ISO/ISM would require withs... (either seq ids or model > aka interpro ids). > > > Ben > > ---------- > > Val's reply: > > This is *exactly* the type of data why I was orginally suggesting that > RCA should not be restricted to analysis which include some > experimental component. Unfortunately I couldn't come up with any > good examples at the time. > > These would surely be better as RCA, even though they are sequence based > > Val > > ---------- > > Susan's reply: > > I've just hit another example... > > Enhanced function annotations for Drosophila serine proteases: A case > study for > systematic annotation of multi-member gene families. > > Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, > Rodrigues V, > White KP, Bork P, Sowdhamini R. > > PMID: 17996400 > > This is a functional classification of serine proteases based on a > 'function residue clustering' algorithm. The algorithm incorporates info > from sequence alignments, hydrophobicity plots and info about key > residues from 3D structures - all sequence based but no one thing to put > in the 'with'. > > Susan > > ----------- > > Pascale's reply: > > Tanya, > > I thought we agreed that BLAST and InterPro were ISS, as you point > out. I don't think ISS + ISS = RCA?? That is, I would say using > InterPro or the BLAST result should be enough to make the annotation; > we dont need to capture both? In this case, the easiest might be using > ISS with an InterPro domain ID in the 'with', > > Similarly in the paper Susan cites, they mention several domains and > also they have compared to several proteins whose 3D structure has > been determined hence can be used in the 'with' - I would pick one of > those example proteins and ISS to that. > > Pascale > > --------- > > Any other thoughts? > > > Thanks, > > Tanya > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------- Original Message -------- > Subject: Re: [evidence] What evidence code to use? > Date: Wed, 21 Nov 2007 08:43:16 -0500 > From: Pascale Gaudet > Reply-To: pgaudet at northwestern.edu > Organization: Northwestern University > To: tberardi at acoma.stanford.edu > CC: evidence at genome.stanford.edu > References: <47437C88.5070204 at acoma.stanford.edu> > > Tanya, > > I thought we agreed that BLAST and InterPro were ISS, as you point out. > I don't think ISS + ISS = RCA?? That is, I would say using InterPro or > the BLAST result should be enough to make the annotation; we dont need > to capture both? In this case, the easiest might be using ISS with an > InterPro domain ID in the 'with', > > Similarly in the paper Susan cites, they mention several domains and > also they have compared to several proteins whose 3D structure has been > determined hence can be used in the 'with' - I would pick one of those > example proteins and ISS to that. > > Pascale > > >> ------------------------------------------------------------------------------------------ >> >> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >> The Arabidopsis Information Resource FAX: (650) 325-6857 >> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >> Department of Plant Biology URL: http://arabidopsis.org/ >> 260 Panama St. >> Stanford, CA 94305 >> ------------------------------------------------------------------------------------------ >> >> >> > > From midori at ebi.ac.uk Fri Nov 23 22:00:07 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Sat, 24 Nov 2007 06:00:07 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711240600.lAO60701287199@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071124/21d8674d/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071124/21d8674d/attachment.pl From midori at ebi.ac.uk Mon Nov 26 22:00:05 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Tue, 27 Nov 2007 06:00:05 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711270600.lAR605e1436031@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071127/86e62c8d/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071127/86e62c8d/attachment.pl From jblake at informatics.jax.org Tue Nov 27 18:00:56 2007 From: jblake at informatics.jax.org (Judith Blake) Date: Tue, 27 Nov 2007 21:00:56 -0500 Subject: [annotation] [Fwd:What evidence code to use?] In-Reply-To: <47447B6B.9040502@acoma.stanford.edu> References: <47447B6B.9040502@acoma.stanford.edu> Message-ID: <474CCBD8.6080608@informatics.jax.org> This is exactly what RCA was originally used for. With the FANTOM project [mouse full length cDNA annotatons], participants employed a series of algorithmic approaches combined with manual inspection and evaluation to provide annotations. Actually, I think RCA was created as a result of the FANTOM project. Judy Tanya Berardini wrote: > Forwarding this from the evidence code discussion group. Apologies to > those who are on both lists. I've sorted the emails from top to > bottom in chronological order for easier reading: > > ---------- > My original email: > > > Ah, the eternal question: Is it ISS, is it RCA? > > > > I've got a paper that describes the identification of a nice big set > > of transcription factors in Arabidopsis. > > > > > http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus > > > > > > > > > The authors use a combination of motif searches + BLAST + sequence > > alignment and review those by eye and came up with 1500 or so genes > > that they call 'transcription factors.' > > > > Right now, we've got these annotated to 'transcription factor > > activity' with the evidence code ISS but nothing in the evidence_with > > column. If I leave these as ISS, I'd like to put something in the > > with column, but what? Does this type of a combination of sequence > > analysis methods that's reviewed manually make it RCA? Not according > > to the current RCA documentation: > > > > "Examples where the RCA evidence code should not be used: > > > > * Annotations based on more than one type of gene product sequence > > based evidence, including such things as BLAST, profile HMMs, TMHMM, > > SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. > > should use the ISS code. " > > > > Should I wait till ISS comes to a resolution? > > > > Help! > > --------- > Ben's reply: > > If you can't put something USEFUL in the WITH column, I think this has > to be RCA. > I guess under the new, non-documented system, this would be ISS/no > "With" ISA/ISO/ISM would require withs... (either seq ids or model > aka interpro ids). > > > Ben > > ---------- > > Val's reply: > > This is *exactly* the type of data why I was orginally suggesting that > RCA should not be restricted to analysis which include some > experimental component. Unfortunately I couldn't come up with any > good examples at the time. > > These would surely be better as RCA, even though they are sequence based > > Val > > ---------- > > Susan's reply: > > I've just hit another example... > > Enhanced function annotations for Drosophila serine proteases: A case > study for > systematic annotation of multi-member gene families. > > Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, > Rodrigues V, > White KP, Bork P, Sowdhamini R. > > PMID: 17996400 > > This is a functional classification of serine proteases based on a > 'function residue clustering' algorithm. The algorithm incorporates info > from sequence alignments, hydrophobicity plots and info about key > residues from 3D structures - all sequence based but no one thing to put > in the 'with'. > > Susan > > ----------- > > Pascale's reply: > > Tanya, > > I thought we agreed that BLAST and InterPro were ISS, as you point > out. I don't think ISS + ISS = RCA?? That is, I would say using > InterPro or the BLAST result should be enough to make the annotation; > we dont need to capture both? In this case, the easiest might be using > ISS with an InterPro domain ID in the 'with', > > Similarly in the paper Susan cites, they mention several domains and > also they have compared to several proteins whose 3D structure has > been determined hence can be used in the 'with' - I would pick one of > those example proteins and ISS to that. > > Pascale > > --------- > > Any other thoughts? > > > Thanks, > > Tanya > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------- Original Message -------- > Subject: Re: [evidence] What evidence code to use? > Date: Wed, 21 Nov 2007 08:43:16 -0500 > From: Pascale Gaudet > Reply-To: pgaudet at northwestern.edu > Organization: Northwestern University > To: tberardi at acoma.stanford.edu > CC: evidence at genome.stanford.edu > References: <47437C88.5070204 at acoma.stanford.edu> > > Tanya, > > I thought we agreed that BLAST and InterPro were ISS, as you point out. > I don't think ISS + ISS = RCA?? That is, I would say using InterPro or > the BLAST result should be enough to make the annotation; we dont need > to capture both? In this case, the easiest might be using ISS with an > InterPro domain ID in the 'with', > > Similarly in the paper Susan cites, they mention several domains and > also they have compared to several proteins whose 3D structure has been > determined hence can be used in the 'with' - I would pick one of those > example proteins and ISS to that. > > Pascale > > >> ------------------------------------------------------------------------------------------ >> >> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >> The Arabidopsis Information Resource FAX: (650) 325-6857 >> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >> Department of Plant Biology URL: http://arabidopsis.org/ >> 260 Panama St. >> Stanford, CA 94305 >> ------------------------------------------------------------------------------------------ >> >> >> > > From midori at ebi.ac.uk Tue Nov 27 22:00:05 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Wed, 28 Nov 2007 06:00:05 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711280600.lAS60531401455@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071128/38d6b445/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071128/38d6b445/attachment.pl From cherry at stanford.edu Wed Nov 28 04:47:09 2007 From: cherry at stanford.edu (Mike Cherry) Date: Wed, 28 Nov 2007 07:47:09 -0500 Subject: [annotation] [Fwd:What evidence code to use?] In-Reply-To: <474CCBD8.6080608@informatics.jax.org> References: <47447B6B.9040502@acoma.stanford.edu> <474CCBD8.6080608@informatics.jax.org> Message-ID: <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> I believe RCA was proposed by SGD to use with analyzes like Biopixie. Cheers, Mike On Nov 27, 2007, at 9:00 PM, Judith Blake wrote: > This is exactly what RCA was originally used for. With the FANTOM > project [mouse full length cDNA annotatons], participants employed a > series of algorithmic approaches combined with manual inspection and > evaluation to provide annotations. Actually, I think RCA was > created as a result of the FANTOM project. > > Judy > > Tanya Berardini wrote: >> Forwarding this from the evidence code discussion group. Apologies >> to those who are on both lists. I've sorted the emails from top to >> bottom in chronological order for easier reading: >> >> ---------- >> My original email: >> >> > Ah, the eternal question: Is it ISS, is it RCA? >> > >> > I've got a paper that describes the identification of a nice big >> set >> > of transcription factors in Arabidopsis. >> > >> > http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus >> > >> > >> > >> > The authors use a combination of motif searches + BLAST + sequence >> > alignment and review those by eye and came up with 1500 or so genes >> > that they call 'transcription factors.' >> > >> > Right now, we've got these annotated to 'transcription factor >> > activity' with the evidence code ISS but nothing in the >> evidence_with >> > column. If I leave these as ISS, I'd like to put something in the >> > with column, but what? Does this type of a combination of sequence >> > analysis methods that's reviewed manually make it RCA? Not >> according >> > to the current RCA documentation: >> > >> > "Examples where the RCA evidence code should not be used: >> > >> > * Annotations based on more than one type of gene product >> sequence >> > based evidence, including such things as BLAST, profile HMMs, >> TMHMM, >> > SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. >> > should use the ISS code. " >> > >> > Should I wait till ISS comes to a resolution? >> > >> > Help! >> >> --------- >> Ben's reply: >> >> If you can't put something USEFUL in the WITH column, I think this >> has to be RCA. >> I guess under the new, non-documented system, this would be ISS/no >> "With" ISA/ISO/ISM would require withs... (either seq ids or model >> aka interpro ids). >> >> >> Ben >> >> ---------- >> >> Val's reply: >> >> This is *exactly* the type of data why I was orginally suggesting >> that RCA should not be restricted to analysis which include some >> experimental component. Unfortunately I couldn't come up with any >> good examples at the time. >> >> These would surely be better as RCA, even though they are sequence >> based >> >> Val >> >> ---------- >> >> Susan's reply: >> >> I've just hit another example... >> >> Enhanced function annotations for Drosophila serine proteases: A case >> study for >> systematic annotation of multi-member gene families. >> >> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, >> Rodrigues V, >> White KP, Bork P, Sowdhamini R. >> >> PMID: 17996400 >> >> This is a functional classification of serine proteases based on a >> 'function residue clustering' algorithm. The algorithm incorporates >> info >> from sequence alignments, hydrophobicity plots and info about key >> residues from 3D structures - all sequence based but no one thing >> to put >> in the 'with'. >> >> Susan >> >> ----------- >> >> Pascale's reply: >> >> Tanya, >> >> I thought we agreed that BLAST and InterPro were ISS, as you point >> out. I don't think ISS + ISS = RCA?? That is, I would say using >> InterPro or the BLAST result should be enough to make the >> annotation; we dont need to capture both? In this case, the easiest >> might be using ISS with an InterPro domain ID in the 'with', >> >> Similarly in the paper Susan cites, they mention several domains >> and also they have compared to several proteins whose 3D structure >> has been determined hence can be used in the 'with' - I would pick >> one of those example proteins and ISS to that. >> >> Pascale >> >> --------- >> >> Any other thoughts? >> >> >> Thanks, >> >> Tanya >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -------- Original Message -------- >> Subject: Re: [evidence] What evidence code to use? >> Date: Wed, 21 Nov 2007 08:43:16 -0500 >> From: Pascale Gaudet >> Reply-To: pgaudet at northwestern.edu >> Organization: Northwestern University >> To: tberardi at acoma.stanford.edu >> CC: evidence at genome.stanford.edu >> References: <47437C88.5070204 at acoma.stanford.edu> >> >> Tanya, >> >> I thought we agreed that BLAST and InterPro were ISS, as you point >> out. >> I don't think ISS + ISS = RCA?? That is, I would say using InterPro >> or >> the BLAST result should be enough to make the annotation; we dont >> need >> to capture both? In this case, the easiest might be using ISS with an >> InterPro domain ID in the 'with', >> >> Similarly in the paper Susan cites, they mention several domains and >> also they have compared to several proteins whose 3D structure has >> been >> determined hence can be used in the 'with' - I would pick one of >> those >> example proteins and ISS to that. >> >> Pascale >> >> >>> --- >>> --- >>> --- >>> --- >>> --- >>> --- >>> --- >>> --- >>> ------------------------------------------------------------------ >>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>> Department of Plant Biology URL: http://arabidopsis.org/ >>> 260 Panama St. >>> Stanford, CA 94305 >>> --- >>> --- >>> --- >>> --- >>> --- >>> --- >>> --- >>> --- >>> ------------------------------------------------------------------ >>> >>> >> >> From tberardi at acoma.Stanford.EDU Wed Nov 28 10:36:59 2007 From: tberardi at acoma.Stanford.EDU (Tanya Berardini) Date: Wed, 28 Nov 2007 10:36:59 -0800 Subject: [annotation] [Fwd:What evidence code to use?] In-Reply-To: <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> References: <47447B6B.9040502@acoma.stanford.edu> <474CCBD8.6080608@informatics.jax.org> <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> Message-ID: <474DB54B.50203@acoma.stanford.edu> Thanks, everyone, for your replies. To come back around to the original question, then, should I use: 1. RCA or 2. ISS and pick one of the domain identifiers/Genbank sequences from the paper to put into the 'with' field ? I've heard opinions supporting both options. Thanks, Tanya Mike Cherry wrote: > I believe RCA was proposed by SGD to use with analyzes like Biopixie. > > Cheers, Mike > > > On Nov 27, 2007, at 9:00 PM, Judith Blake > wrote: > >> This is exactly what RCA was originally used for. With the FANTOM >> project [mouse full length cDNA annotatons], participants employed a >> series of algorithmic approaches combined with manual inspection and >> evaluation to provide annotations. Actually, I think RCA was created >> as a result of the FANTOM project. >> >> Judy >> >> Tanya Berardini wrote: >>> Forwarding this from the evidence code discussion group. Apologies to >>> those who are on both lists. I've sorted the emails from top to >>> bottom in chronological order for easier reading: >>> >>> ---------- >>> My original email: >>> >>> > Ah, the eternal question: Is it ISS, is it RCA? >>> > >>> > I've got a paper that describes the identification of a nice big set >>> > of transcription factors in Arabidopsis. >>> > >>> > >>> http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus >>> >>> > >>> > >>> > >>> > The authors use a combination of motif searches + BLAST + sequence >>> > alignment and review those by eye and came up with 1500 or so genes >>> > that they call 'transcription factors.' >>> > >>> > Right now, we've got these annotated to 'transcription factor >>> > activity' with the evidence code ISS but nothing in the evidence_with >>> > column. If I leave these as ISS, I'd like to put something in the >>> > with column, but what? Does this type of a combination of sequence >>> > analysis methods that's reviewed manually make it RCA? Not according >>> > to the current RCA documentation: >>> > >>> > "Examples where the RCA evidence code should not be used: >>> > >>> > * Annotations based on more than one type of gene product sequence >>> > based evidence, including such things as BLAST, profile HMMs, TMHMM, >>> > SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. >>> > should use the ISS code. " >>> > >>> > Should I wait till ISS comes to a resolution? >>> > >>> > Help! >>> >>> --------- >>> Ben's reply: >>> >>> If you can't put something USEFUL in the WITH column, I think this >>> has to be RCA. >>> I guess under the new, non-documented system, this would be ISS/no >>> "With" ISA/ISO/ISM would require withs... (either seq ids or model >>> aka interpro ids). >>> >>> >>> Ben >>> >>> ---------- >>> >>> Val's reply: >>> >>> This is *exactly* the type of data why I was orginally suggesting >>> that RCA should not be restricted to analysis which include some >>> experimental component. Unfortunately I couldn't come up with any >>> good examples at the time. >>> >>> These would surely be better as RCA, even though they are sequence >>> based >>> >>> Val >>> >>> ---------- >>> >>> Susan's reply: >>> >>> I've just hit another example... >>> >>> Enhanced function annotations for Drosophila serine proteases: A case >>> study for >>> systematic annotation of multi-member gene families. >>> >>> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, >>> Rodrigues V, >>> White KP, Bork P, Sowdhamini R. >>> >>> PMID: 17996400 >>> >>> This is a functional classification of serine proteases based on a >>> 'function residue clustering' algorithm. The algorithm incorporates info >>> from sequence alignments, hydrophobicity plots and info about key >>> residues from 3D structures - all sequence based but no one thing to put >>> in the 'with'. >>> >>> Susan >>> >>> ----------- >>> >>> Pascale's reply: >>> >>> Tanya, >>> >>> I thought we agreed that BLAST and InterPro were ISS, as you point >>> out. I don't think ISS + ISS = RCA?? That is, I would say using >>> InterPro or the BLAST result should be enough to make the annotation; >>> we dont need to capture both? In this case, the easiest might be >>> using ISS with an InterPro domain ID in the 'with', >>> >>> Similarly in the paper Susan cites, they mention several domains and >>> also they have compared to several proteins whose 3D structure has >>> been determined hence can be used in the 'with' - I would pick one of >>> those example proteins and ISS to that. >>> >>> Pascale >>> >>> --------- >>> >>> Any other thoughts? >>> >>> >>> Thanks, >>> >>> Tanya >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -------- Original Message -------- >>> Subject: Re: [evidence] What evidence code to use? >>> Date: Wed, 21 Nov 2007 08:43:16 -0500 >>> From: Pascale Gaudet >>> Reply-To: pgaudet at northwestern.edu >>> Organization: Northwestern University >>> To: tberardi at acoma.stanford.edu >>> CC: evidence at genome.stanford.edu >>> References: <47437C88.5070204 at acoma.stanford.edu> >>> >>> Tanya, >>> >>> I thought we agreed that BLAST and InterPro were ISS, as you point out. >>> I don't think ISS + ISS = RCA?? That is, I would say using InterPro or >>> the BLAST result should be enough to make the annotation; we dont need >>> to capture both? In this case, the easiest might be using ISS with an >>> InterPro domain ID in the 'with', >>> >>> Similarly in the paper Susan cites, they mention several domains and >>> also they have compared to several proteins whose 3D structure has been >>> determined hence can be used in the 'with' - I would pick one of those >>> example proteins and ISS to that. >>> >>> Pascale >>> >>> >>>> ------------------------------------------------------------------------------------------ >>>> >>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>> 260 Panama St. >>>> Stanford, CA 94305 >>>> ------------------------------------------------------------------------------------------ >>>> >>>> >>>> >>> >>> -- ------------------------------------------------------------------------------------------ Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu The Arabidopsis Information Resource FAX: (650) 325-6857 Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 Department of Plant Biology URL: http://arabidopsis.org/ 260 Panama St. Stanford, CA 94305 ------------------------------------------------------------------------------------------ From jblake at informatics.jax.org Wed Nov 28 10:42:35 2007 From: jblake at informatics.jax.org (Judith Blake) Date: Wed, 28 Nov 2007 13:42:35 -0500 Subject: [annotation] [Fwd:What evidence code to use?] In-Reply-To: <474DB54B.50203@acoma.stanford.edu> References: <47447B6B.9040502@acoma.stanford.edu> <474CCBD8.6080608@informatics.jax.org> <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> <474DB54B.50203@acoma.stanford.edu> Message-ID: <474DB69B.7080202@informatics.jax.org> RCA I think, judy Tanya Berardini wrote: > Thanks, everyone, for your replies. To come back around to the > original question, then, should I use: > > 1. RCA > > or > > 2. ISS and pick one of the domain identifiers/Genbank sequences from > the paper to put into the 'with' field > > ? > > I've heard opinions supporting both options. > > Thanks, > > Tanya > > > > Mike Cherry wrote: >> I believe RCA was proposed by SGD to use with analyzes like Biopixie. >> >> Cheers, Mike >> >> >> On Nov 27, 2007, at 9:00 PM, Judith Blake >> wrote: >> >>> This is exactly what RCA was originally used for. With the FANTOM >>> project [mouse full length cDNA annotatons], participants employed a >>> series of algorithmic approaches combined with manual inspection and >>> evaluation to provide annotations. Actually, I think RCA was >>> created as a result of the FANTOM project. >>> >>> Judy >>> >>> Tanya Berardini wrote: >>>> Forwarding this from the evidence code discussion group. Apologies >>>> to those who are on both lists. I've sorted the emails from top to >>>> bottom in chronological order for easier reading: >>>> >>>> ---------- >>>> My original email: >>>> >>>> > Ah, the eternal question: Is it ISS, is it RCA? >>>> > >>>> > I've got a paper that describes the identification of a nice big set >>>> > of transcription factors in Arabidopsis. >>>> > >>>> > >>>> http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus >>>> >>>> > >>>> > >>>> > >>>> > The authors use a combination of motif searches + BLAST + sequence >>>> > alignment and review those by eye and came up with 1500 or so genes >>>> > that they call 'transcription factors.' >>>> > >>>> > Right now, we've got these annotated to 'transcription factor >>>> > activity' with the evidence code ISS but nothing in the >>>> evidence_with >>>> > column. If I leave these as ISS, I'd like to put something in the >>>> > with column, but what? Does this type of a combination of sequence >>>> > analysis methods that's reviewed manually make it RCA? Not >>>> according >>>> > to the current RCA documentation: >>>> > >>>> > "Examples where the RCA evidence code should not be used: >>>> > >>>> > * Annotations based on more than one type of gene product >>>> sequence >>>> > based evidence, including such things as BLAST, profile HMMs, TMHMM, >>>> > SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. >>>> > should use the ISS code. " >>>> > >>>> > Should I wait till ISS comes to a resolution? >>>> > >>>> > Help! >>>> >>>> --------- >>>> Ben's reply: >>>> >>>> If you can't put something USEFUL in the WITH column, I think this >>>> has to be RCA. >>>> I guess under the new, non-documented system, this would be ISS/no >>>> "With" ISA/ISO/ISM would require withs... (either seq ids or model >>>> aka interpro ids). >>>> >>>> >>>> Ben >>>> >>>> ---------- >>>> >>>> Val's reply: >>>> >>>> This is *exactly* the type of data why I was orginally suggesting >>>> that RCA should not be restricted to analysis which include some >>>> experimental component. Unfortunately I couldn't come up with any >>>> good examples at the time. >>>> >>>> These would surely be better as RCA, even though they are sequence >>>> based >>>> >>>> Val >>>> >>>> ---------- >>>> >>>> Susan's reply: >>>> >>>> I've just hit another example... >>>> >>>> Enhanced function annotations for Drosophila serine proteases: A case >>>> study for >>>> systematic annotation of multi-member gene families. >>>> >>>> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, >>>> Rodrigues V, >>>> White KP, Bork P, Sowdhamini R. >>>> >>>> PMID: 17996400 >>>> >>>> This is a functional classification of serine proteases based on a >>>> 'function residue clustering' algorithm. The algorithm incorporates >>>> info >>>> from sequence alignments, hydrophobicity plots and info about key >>>> residues from 3D structures - all sequence based but no one thing >>>> to put >>>> in the 'with'. >>>> >>>> Susan >>>> >>>> ----------- >>>> >>>> Pascale's reply: >>>> >>>> Tanya, >>>> >>>> I thought we agreed that BLAST and InterPro were ISS, as you point >>>> out. I don't think ISS + ISS = RCA?? That is, I would say using >>>> InterPro or the BLAST result should be enough to make the >>>> annotation; we dont need to capture both? In this case, the easiest >>>> might be using ISS with an InterPro domain ID in the 'with', >>>> >>>> Similarly in the paper Susan cites, they mention several domains >>>> and also they have compared to several proteins whose 3D structure >>>> has been determined hence can be used in the 'with' - I would pick >>>> one of those example proteins and ISS to that. >>>> >>>> Pascale >>>> >>>> --------- >>>> >>>> Any other thoughts? >>>> >>>> >>>> Thanks, >>>> >>>> Tanya >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -------- Original Message -------- >>>> Subject: Re: [evidence] What evidence code to use? >>>> Date: Wed, 21 Nov 2007 08:43:16 -0500 >>>> From: Pascale Gaudet >>>> Reply-To: pgaudet at northwestern.edu >>>> Organization: Northwestern University >>>> To: tberardi at acoma.stanford.edu >>>> CC: evidence at genome.stanford.edu >>>> References: <47437C88.5070204 at acoma.stanford.edu> >>>> >>>> Tanya, >>>> >>>> I thought we agreed that BLAST and InterPro were ISS, as you point >>>> out. >>>> I don't think ISS + ISS = RCA?? That is, I would say using InterPro or >>>> the BLAST result should be enough to make the annotation; we dont need >>>> to capture both? In this case, the easiest might be using ISS with an >>>> InterPro domain ID in the 'with', >>>> >>>> Similarly in the paper Susan cites, they mention several domains and >>>> also they have compared to several proteins whose 3D structure has >>>> been >>>> determined hence can be used in the 'with' - I would pick one of those >>>> example proteins and ISS to that. >>>> >>>> Pascale >>>> >>>> >>>>> ------------------------------------------------------------------------------------------ >>>>> >>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>> 260 Panama St. >>>>> Stanford, CA 94305 >>>>> ------------------------------------------------------------------------------------------ >>>>> >>>>> >>>>> >>>> >>>> > From kchris at genome.Stanford.EDU Wed Nov 28 11:08:35 2007 From: kchris at genome.Stanford.EDU (Karen Christie) Date: Wed, 28 Nov 2007 11:08:35 -0800 (PST) Subject: [annotation] [Fwd:What evidence code to use?] In-Reply-To: <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> References: <47447B6B.9040502@acoma.stanford.edu> <474CCBD8.6080608@informatics.jax.org> <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> Message-ID: My recollection is that RCA was proposed by SGD to handle papers such as Samanta and Liang 2003 (url below) where they did computational analysis of large-scale protein interaction data. http://db.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=S000074191 The original documentation for RCA explicitly stated that it was not to be used for sequence data. At the St. Croix meeting, Sue Rhee brought up the point that some computational analyses combined sequence data into the types of analyses done by Samanta and Liang. On that basis, it was agreed that RCA could include sequence data, but was not intended for analyses that were entirely sequence based. -Karen On Wed, 28 Nov 2007, Mike Cherry wrote: > I believe RCA was proposed by SGD to use with analyzes like Biopixie. > > Cheers, Mike > > > On Nov 27, 2007, at 9:00 PM, Judith Blake wrote: > >> This is exactly what RCA was originally used for. With the FANTOM project >> [mouse full length cDNA annotatons], participants employed a series of >> algorithmic approaches combined with manual inspection and evaluation to >> provide annotations. Actually, I think RCA was created as a result of the >> FANTOM project. >> >> Judy >> >> Tanya Berardini wrote: >>> Forwarding this from the evidence code discussion group. Apologies to >>> those who are on both lists. I've sorted the emails from top to bottom in >>> chronological order for easier reading: >>> >>> ---------- >>> My original email: >>> >>>> Ah, the eternal question: Is it ISS, is it RCA? >>>> >>>> I've got a paper that describes the identification of a nice big set >>>> of transcription factors in Arabidopsis. >>>> >>>> http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus >>>> >>>> >>>> >>>> The authors use a combination of motif searches + BLAST + sequence >>>> alignment and review those by eye and came up with 1500 or so genes >>>> that they call 'transcription factors.' >>>> >>>> Right now, we've got these annotated to 'transcription factor >>>> activity' with the evidence code ISS but nothing in the evidence_with >>>> column. If I leave these as ISS, I'd like to put something in the >>>> with column, but what? Does this type of a combination of sequence >>>> analysis methods that's reviewed manually make it RCA? Not according >>>> to the current RCA documentation: >>>> >>>> "Examples where the RCA evidence code should not be used: >>>> >>>> * Annotations based on more than one type of gene product sequence >>>> based evidence, including such things as BLAST, profile HMMs, TMHMM, >>>> SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. >>>> should use the ISS code. " >>>> >>>> Should I wait till ISS comes to a resolution? >>>> >>>> Help! >>> >>> --------- >>> Ben's reply: >>> >>> If you can't put something USEFUL in the WITH column, I think this has to >>> be RCA. >>> I guess under the new, non-documented system, this would be ISS/no "With" >>> ISA/ISO/ISM would require withs... (either seq ids or model aka interpro >>> ids). >>> >>> >>> Ben >>> >>> ---------- >>> >>> Val's reply: >>> >>> This is *exactly* the type of data why I was orginally suggesting that RCA >>> should not be restricted to analysis which include some experimental >>> component. Unfortunately I couldn't come up with any good examples at the >>> time. >>> >>> These would surely be better as RCA, even though they are sequence based >>> >>> Val >>> >>> ---------- >>> >>> Susan's reply: >>> >>> I've just hit another example... >>> >>> Enhanced function annotations for Drosophila serine proteases: A case >>> study for >>> systematic annotation of multi-member gene families. >>> >>> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, Rodrigues >>> V, >>> White KP, Bork P, Sowdhamini R. >>> >>> PMID: 17996400 >>> >>> This is a functional classification of serine proteases based on a >>> 'function residue clustering' algorithm. The algorithm incorporates info >>> from sequence alignments, hydrophobicity plots and info about key >>> residues from 3D structures - all sequence based but no one thing to put >>> in the 'with'. >>> >>> Susan >>> >>> ----------- >>> >>> Pascale's reply: >>> >>> Tanya, >>> >>> I thought we agreed that BLAST and InterPro were ISS, as you point out. I >>> don't think ISS + ISS = RCA?? That is, I would say using InterPro or the >>> BLAST result should be enough to make the annotation; we dont need to >>> capture both? In this case, the easiest might be using ISS with an >>> InterPro domain ID in the 'with', >>> >>> Similarly in the paper Susan cites, they mention several domains and also >>> they have compared to several proteins whose 3D structure has been >>> determined hence can be used in the 'with' - I would pick one of those >>> example proteins and ISS to that. >>> >>> Pascale >>> >>> --------- >>> >>> Any other thoughts? >>> >>> >>> Thanks, >>> >>> Tanya >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -------- Original Message -------- >>> Subject: Re: [evidence] What evidence code to use? >>> Date: Wed, 21 Nov 2007 08:43:16 -0500 >>> From: Pascale Gaudet >>> Reply-To: pgaudet at northwestern.edu >>> Organization: Northwestern University >>> To: tberardi at acoma.stanford.edu >>> CC: evidence at genome.stanford.edu >>> References: <47437C88.5070204 at acoma.stanford.edu> >>> >>> Tanya, >>> >>> I thought we agreed that BLAST and InterPro were ISS, as you point out. >>> I don't think ISS + ISS = RCA?? That is, I would say using InterPro or >>> the BLAST result should be enough to make the annotation; we dont need >>> to capture both? In this case, the easiest might be using ISS with an >>> InterPro domain ID in the 'with', >>> >>> Similarly in the paper Susan cites, they mention several domains and >>> also they have compared to several proteins whose 3D structure has been >>> determined hence can be used in the 'with' - I would pick one of those >>> example proteins and ISS to that. >>> >>> Pascale >>> >>> >>>> ------------------------------------------------------------------------------------------ >>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>> 260 Panama St. >>>> Stanford, CA 94305 >>>> ------------------------------------------------------------------------------------------ >>>> >>>> >>> > From jblake at informatics.jax.org Wed Nov 28 11:24:35 2007 From: jblake at informatics.jax.org (Judith Blake) Date: Wed, 28 Nov 2007 14:24:35 -0500 Subject: [annotation] [Fwd:What evidence code to use?] In-Reply-To: References: <47447B6B.9040502@acoma.stanford.edu> <474CCBD8.6080608@informatics.jax.org> <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> Message-ID: <474DC073.7040301@informatics.jax.org> ok with me if we need to make the distinction. I took it to mean the difference between a simple alignment report and a more comprehensive analysis. Phylogenetic analyses employ powerful algorithms, but at the core of the analysis are manually curated multiple alignments from hundreds of species. These could be RCA for me. At the end of the day, I think it doesn't matter :) since all these measures are predictive and not experimental determinations. Judy Karen Christie wrote: > My recollection is that RCA was proposed by SGD to handle papers such > as Samanta and Liang 2003 (url below) where they did computational > analysis of large-scale protein interaction data. > > http://db.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=S000074191 > > The original documentation for RCA explicitly stated that it was not > to be used for sequence data. At the St. Croix meeting, Sue Rhee > brought up the point that some computational analyses combined > sequence data into the types of analyses done by Samanta and Liang. On > that basis, it was agreed that RCA could include sequence data, but > was not intended for analyses that were entirely sequence based. > > -Karen > > > On Wed, 28 Nov 2007, Mike Cherry wrote: > >> I believe RCA was proposed by SGD to use with analyzes like Biopixie. >> >> Cheers, Mike >> >> >> On Nov 27, 2007, at 9:00 PM, Judith Blake >> wrote: >> >>> This is exactly what RCA was originally used for. With the FANTOM >>> project [mouse full length cDNA annotatons], participants employed a >>> series of algorithmic approaches combined with manual inspection and >>> evaluation to provide annotations. Actually, I think RCA was >>> created as a result of the FANTOM project. >>> >>> Judy >>> >>> Tanya Berardini wrote: >>>> Forwarding this from the evidence code discussion group. Apologies >>>> to those who are on both lists. I've sorted the emails from top to >>>> bottom in chronological order for easier reading: >>>> >>>> ---------- >>>> My original email: >>>> >>>>> Ah, the eternal question: Is it ISS, is it RCA? >>>>> >>>>> I've got a paper that describes the identification of a nice big set >>>>> of transcription factors in Arabidopsis. >>>>> >>>>> http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus >>>>> >>>>> >>>>> >>>>> >>>>> The authors use a combination of motif searches + BLAST + sequence >>>>> alignment and review those by eye and came up with 1500 or so genes >>>>> that they call 'transcription factors.' >>>>> >>>>> Right now, we've got these annotated to 'transcription factor >>>>> activity' with the evidence code ISS but nothing in the evidence_with >>>>> column. If I leave these as ISS, I'd like to put something in the >>>>> with column, but what? Does this type of a combination of sequence >>>>> analysis methods that's reviewed manually make it RCA? Not according >>>>> to the current RCA documentation: >>>>> >>>>> "Examples where the RCA evidence code should not be used: >>>>> >>>>> * Annotations based on more than one type of gene product >>>>> sequence >>>>> based evidence, including such things as BLAST, profile HMMs, TMHMM, >>>>> SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. >>>>> should use the ISS code. " >>>>> >>>>> Should I wait till ISS comes to a resolution? >>>>> >>>>> Help! >>>> >>>> --------- >>>> Ben's reply: >>>> >>>> If you can't put something USEFUL in the WITH column, I think this >>>> has to be RCA. >>>> I guess under the new, non-documented system, this would be ISS/no >>>> "With" ISA/ISO/ISM would require withs... (either seq ids or model >>>> aka interpro ids). >>>> >>>> >>>> Ben >>>> >>>> ---------- >>>> >>>> Val's reply: >>>> >>>> This is *exactly* the type of data why I was orginally suggesting >>>> that RCA should not be restricted to analysis which include some >>>> experimental component. Unfortunately I couldn't come up with any >>>> good examples at the time. >>>> >>>> These would surely be better as RCA, even though they are sequence >>>> based >>>> >>>> Val >>>> >>>> ---------- >>>> >>>> Susan's reply: >>>> >>>> I've just hit another example... >>>> >>>> Enhanced function annotations for Drosophila serine proteases: A case >>>> study for >>>> systematic annotation of multi-member gene families. >>>> >>>> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, >>>> Rodrigues V, >>>> White KP, Bork P, Sowdhamini R. >>>> >>>> PMID: 17996400 >>>> >>>> This is a functional classification of serine proteases based on a >>>> 'function residue clustering' algorithm. The algorithm incorporates >>>> info >>>> from sequence alignments, hydrophobicity plots and info about key >>>> residues from 3D structures - all sequence based but no one thing >>>> to put >>>> in the 'with'. >>>> >>>> Susan >>>> >>>> ----------- >>>> >>>> Pascale's reply: >>>> >>>> Tanya, >>>> >>>> I thought we agreed that BLAST and InterPro were ISS, as you point >>>> out. I don't think ISS + ISS = RCA?? That is, I would say using >>>> InterPro or the BLAST result should be enough to make the >>>> annotation; we dont need to capture both? In this case, the easiest >>>> might be using ISS with an InterPro domain ID in the 'with', >>>> >>>> Similarly in the paper Susan cites, they mention several domains >>>> and also they have compared to several proteins whose 3D structure >>>> has been determined hence can be used in the 'with' - I would pick >>>> one of those example proteins and ISS to that. >>>> >>>> Pascale >>>> >>>> --------- >>>> >>>> Any other thoughts? >>>> >>>> >>>> Thanks, >>>> >>>> Tanya >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -------- Original Message -------- >>>> Subject: Re: [evidence] What evidence code to use? >>>> Date: Wed, 21 Nov 2007 08:43:16 -0500 >>>> From: Pascale Gaudet >>>> Reply-To: pgaudet at northwestern.edu >>>> Organization: Northwestern University >>>> To: tberardi at acoma.stanford.edu >>>> CC: evidence at genome.stanford.edu >>>> References: <47437C88.5070204 at acoma.stanford.edu> >>>> >>>> Tanya, >>>> >>>> I thought we agreed that BLAST and InterPro were ISS, as you point >>>> out. >>>> I don't think ISS + ISS = RCA?? That is, I would say using InterPro or >>>> the BLAST result should be enough to make the annotation; we dont need >>>> to capture both? In this case, the easiest might be using ISS with an >>>> InterPro domain ID in the 'with', >>>> >>>> Similarly in the paper Susan cites, they mention several domains and >>>> also they have compared to several proteins whose 3D structure has >>>> been >>>> determined hence can be used in the 'with' - I would pick one of those >>>> example proteins and ISS to that. >>>> >>>> Pascale >>>> >>>> >>>>> ------------------------------------------------------------------------------------------ >>>>> >>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>> 260 Panama St. >>>>> Stanford, CA 94305 >>>>> ------------------------------------------------------------------------------------------ >>>>> >>>>> >>>>> >>>> >> From aji at ebi.ac.uk Wed Nov 28 15:01:41 2007 From: aji at ebi.ac.uk (Amelia Ireland) Date: Wed, 28 Nov 2007 23:01:41 +0000 (GMT) Subject: [annotation] GO Annotation Conventions Message-ID: Hi annotators, Could someone have a look at the GO annotation conventions here: http://www.geneontology.org/GO.annotation.shtml#conventions and tell me if there is anything that needs to be updated? Thanks, Amelia. -- Amelia Ireland GO Editorial Office, European Bioinformatics Institute, UK. Carbon neutral driving: http://www.targetneutral.com/TONIC/index.jsp From val at sanger.ac.uk Wed Nov 28 15:11:31 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Wed, 28 Nov 2007 23:11:31 UT Subject: [annotation] GO Annotation Conventions Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071128/95abaf53/attachment.pl From aji at ebi.ac.uk Wed Nov 28 15:36:35 2007 From: aji at ebi.ac.uk (Amelia Ireland) Date: Wed, 28 Nov 2007 23:36:35 +0000 (GMT) Subject: [annotation] GO Annotation Conventions In-Reply-To: Message-ID: Back in Gotham City, Valerie Wood wrote: >The 'anotating to unknown' section needs updating to reflect that these should now be made to the root node (not unknown terms) and that they should always use ND (not ISS and TAS). So the bit about the exceptions can just be deleted? Thanks Val! A. -- Amelia Ireland GO Editorial Office, European Bioinformatics Institute, UK. Carbon neutral driving: http://www.targetneutral.com/TONIC/index.jsp From val at sanger.ac.uk Thu Nov 29 00:13:11 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Thu, 29 Nov 2007 08:13:11 UT Subject: [annotation] GO Annotation Conventions Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071129/325a2b9b/attachment.pl From pgaudet at northwestern.edu Thu Nov 29 08:37:52 2007 From: pgaudet at northwestern.edu (Pascale Gaudet) Date: Thu, 29 Nov 2007 11:37:52 -0500 Subject: [annotation] [Fwd:What evidence code to use?] In-Reply-To: <474DC073.7040301@informatics.jax.org> References: <47447B6B.9040502@acoma.stanford.edu> <474CCBD8.6080608@informatics.jax.org> <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> <474DC073.7040301@informatics.jax.org> Message-ID: <474EEAE0.7030506@northwestern.edu> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071129/276d2f39/attachment.html From jblake at informatics.jax.org Thu Nov 29 09:10:12 2007 From: jblake at informatics.jax.org (Judith Blake) Date: Thu, 29 Nov 2007 12:10:12 -0500 Subject: [annotation] [Fwd:What evidence code to use?] In-Reply-To: <474EEAE0.7030506@northwestern.edu> References: <47447B6B.9040502@acoma.stanford.edu> <474CCBD8.6080608@informatics.jax.org> <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> <474DC073.7040301@informatics.jax.org> <474EEAE0.7030506@northwestern.edu> Message-ID: <474EF274.6010101@informatics.jax.org> I shouldn't have jumped into this. But.... ISS for MGI requires that the ISS be backed up with experimental data. Clearly, the analysis brought forward does not do that. RCA from SGD perspective requires experimental data sets. From MGI perspective, was used for the FANTOM analysis (only) when the sequence analysis was part of expert annotation. MGI has not had much occasion to use RCA since the Fantom, and we are gradually removing these. The argument about ISS was whether it was to be restricted to use with orthologs that had experiments or whether it was to include sequence analysis and HMM type studies done in the individual organisms. We resolved that, I thought, by moving toward ISS with subcodes of ISO (for orthology sets) and IS- (I don't remember) for HMMs and other supervised sequence analysis. The study brought forward by Tanya could be either the ISS (generic sequence analysis) or the other one, but certainly these are not backed by experimental data, so with the current RCA, these could best, perhaps, be ISS (generic) but we don't have this implemented yet IEA.....why not? well, it's not just an electronic analysis... Again, these reflects only predictive analysis, there is no experimental data, MGI would prefer ISS only be used when backed by experimental data (or the new category) and SGD would prefer that RCA be restricted to experiment +/- computational analysis using sequence. In the end, I would like to express my thoughts again that we should not drown ourselves in this discussion. By going to the reference or by reading MOD supplied abstract, users can determine the predictive algorithm source if they want too. One could argue that we spend too too much time on sorting this out when we do have group consensus that evidence codes are mostly to provide clues to users as to the assay generic classes that the annotation is supported by. The reference is really the source, and we toe a fine line between just using 'experimental' and 'predicted', and providing all the gory details of the analysis. Cheers, Judy Pascale Gaudet wrote: > But, I thought RCA required experimental data?? > > From documentation: http://www.geneontology.org/GO.evidence.shtml#ica > > * Predictions based on computational analyses of large-scale > experimental data sets > * Predictions based on computational analyses that integrate > datasets of several types, including experimental data (e.g. > expression data, protein-protein interaction data, genetic > interaction data, etc.), sequence data (e.g. promoter sequence, > sequence-based structural predictions, etc.), or mathematical > models > > Pascale > > Judith Blake wrote: >> ok with me if we need to make the distinction. I took it to mean the >> difference between a simple alignment report and a more >> comprehensive analysis. Phylogenetic analyses employ powerful >> algorithms, but at the core of the analysis are manually curated >> multiple alignments from hundreds of species. These could be RCA for >> me. At the end of the day, I think it doesn't matter :) since all >> these measures are predictive and not experimental determinations. >> >> Judy >> >> >> Karen Christie wrote: >>> My recollection is that RCA was proposed by SGD to handle papers >>> such as Samanta and Liang 2003 (url below) where they did >>> computational analysis of large-scale protein interaction data. >>> >>> http://db.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=S000074191 >>> >>> >>> The original documentation for RCA explicitly stated that it was not >>> to be used for sequence data. At the St. Croix meeting, Sue Rhee >>> brought up the point that some computational analyses combined >>> sequence data into the types of analyses done by Samanta and Liang. >>> On that basis, it was agreed that RCA could include sequence data, >>> but was not intended for analyses that were entirely sequence based. >>> >>> -Karen >>> >>> >>> On Wed, 28 Nov 2007, Mike Cherry wrote: >>> >>>> I believe RCA was proposed by SGD to use with analyzes like Biopixie. >>>> >>>> Cheers, Mike >>>> >>>> >>>> On Nov 27, 2007, at 9:00 PM, Judith Blake >>>> wrote: >>>> >>>>> This is exactly what RCA was originally used for. With the FANTOM >>>>> project [mouse full length cDNA annotatons], participants employed >>>>> a series of algorithmic approaches combined with manual inspection >>>>> and evaluation to provide annotations. Actually, I think RCA was >>>>> created as a result of the FANTOM project. >>>>> >>>>> Judy >>>>> >>>>> Tanya Berardini wrote: >>>>>> Forwarding this from the evidence code discussion group. >>>>>> Apologies to those who are on both lists. I've sorted the emails >>>>>> from top to bottom in chronological order for easier reading: >>>>>> >>>>>> ---------- >>>>>> My original email: >>>>>> >>>>>>> Ah, the eternal question: Is it ISS, is it RCA? >>>>>>> >>>>>>> I've got a paper that describes the identification of a nice big >>>>>>> set >>>>>>> of transcription factors in Arabidopsis. >>>>>>> >>>>>>> http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> The authors use a combination of motif searches + BLAST + sequence >>>>>>> alignment and review those by eye and came up with 1500 or so genes >>>>>>> that they call 'transcription factors.' >>>>>>> >>>>>>> Right now, we've got these annotated to 'transcription factor >>>>>>> activity' with the evidence code ISS but nothing in the >>>>>>> evidence_with >>>>>>> column. If I leave these as ISS, I'd like to put something in the >>>>>>> with column, but what? Does this type of a combination of sequence >>>>>>> analysis methods that's reviewed manually make it RCA? Not >>>>>>> according >>>>>>> to the current RCA documentation: >>>>>>> >>>>>>> "Examples where the RCA evidence code should not be used: >>>>>>> >>>>>>> * Annotations based on more than one type of gene product >>>>>>> sequence >>>>>>> based evidence, including such things as BLAST, profile HMMs, >>>>>>> TMHMM, >>>>>>> SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. >>>>>>> should use the ISS code. " >>>>>>> >>>>>>> Should I wait till ISS comes to a resolution? >>>>>>> >>>>>>> Help! >>>>>> >>>>>> --------- >>>>>> Ben's reply: >>>>>> >>>>>> If you can't put something USEFUL in the WITH column, I think >>>>>> this has to be RCA. >>>>>> I guess under the new, non-documented system, this would be >>>>>> ISS/no "With" ISA/ISO/ISM would require withs... (either seq ids >>>>>> or model aka interpro ids). >>>>>> >>>>>> >>>>>> Ben >>>>>> >>>>>> ---------- >>>>>> >>>>>> Val's reply: >>>>>> >>>>>> This is *exactly* the type of data why I was orginally suggesting >>>>>> that RCA should not be restricted to analysis which include some >>>>>> experimental component. Unfortunately I couldn't come up with >>>>>> any good examples at the time. >>>>>> >>>>>> These would surely be better as RCA, even though they are >>>>>> sequence based >>>>>> >>>>>> Val >>>>>> >>>>>> ---------- >>>>>> >>>>>> Susan's reply: >>>>>> >>>>>> I've just hit another example... >>>>>> >>>>>> Enhanced function annotations for Drosophila serine proteases: A >>>>>> case >>>>>> study for >>>>>> systematic annotation of multi-member gene families. >>>>>> >>>>>> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, >>>>>> Rodrigues V, >>>>>> White KP, Bork P, Sowdhamini R. >>>>>> >>>>>> PMID: 17996400 >>>>>> >>>>>> This is a functional classification of serine proteases based on a >>>>>> 'function residue clustering' algorithm. The algorithm >>>>>> incorporates info >>>>>> from sequence alignments, hydrophobicity plots and info about key >>>>>> residues from 3D structures - all sequence based but no one thing >>>>>> to put >>>>>> in the 'with'. >>>>>> >>>>>> Susan >>>>>> >>>>>> ----------- >>>>>> >>>>>> Pascale's reply: >>>>>> >>>>>> Tanya, >>>>>> >>>>>> I thought we agreed that BLAST and InterPro were ISS, as you >>>>>> point out. I don't think ISS + ISS = RCA?? That is, I would say >>>>>> using InterPro or the BLAST result should be enough to make the >>>>>> annotation; we dont need to capture both? In this case, the >>>>>> easiest might be using ISS with an InterPro domain ID in the 'with', >>>>>> >>>>>> Similarly in the paper Susan cites, they mention several domains >>>>>> and also they have compared to several proteins whose 3D >>>>>> structure has been determined hence can be used in the 'with' - I >>>>>> would pick one of those example proteins and ISS to that. >>>>>> >>>>>> Pascale >>>>>> >>>>>> --------- >>>>>> >>>>>> Any other thoughts? >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Tanya >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -------- Original Message -------- >>>>>> Subject: Re: [evidence] What evidence code to use? >>>>>> Date: Wed, 21 Nov 2007 08:43:16 -0500 >>>>>> From: Pascale Gaudet >>>>>> Reply-To: pgaudet at northwestern.edu >>>>>> Organization: Northwestern University >>>>>> To: tberardi at acoma.stanford.edu >>>>>> CC: evidence at genome.stanford.edu >>>>>> References: <47437C88.5070204 at acoma.stanford.edu> >>>>>> >>>>>> Tanya, >>>>>> >>>>>> I thought we agreed that BLAST and InterPro were ISS, as you >>>>>> point out. >>>>>> I don't think ISS + ISS = RCA?? That is, I would say using >>>>>> InterPro or >>>>>> the BLAST result should be enough to make the annotation; we dont >>>>>> need >>>>>> to capture both? In this case, the easiest might be using ISS >>>>>> with an >>>>>> InterPro domain ID in the 'with', >>>>>> >>>>>> Similarly in the paper Susan cites, they mention several domains and >>>>>> also they have compared to several proteins whose 3D structure >>>>>> has been >>>>>> determined hence can be used in the 'with' - I would pick one of >>>>>> those >>>>>> example proteins and ISS to that. >>>>>> >>>>>> Pascale >>>>>> >>>>>> >>>>>>> ------------------------------------------------------------------------------------------ >>>>>>> >>>>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>>>> 260 Panama St. >>>>>>> Stanford, CA 94305 >>>>>>> ------------------------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >> >> >> > > -- > ~~~~~~~~~~~~~~~~~~~ > Pascale Gaudet, PhD > Scientific Curator, dictyBase > Northwestern University, Chicago, IL > pgaudet at northwestern.edu > www.dictybase.org > ~~~~~~~~~~~~~~~~~~ From rhee at acoma.Stanford.EDU Thu Nov 29 09:32:08 2007 From: rhee at acoma.Stanford.EDU (Sue Rhee) Date: Thu, 29 Nov 2007 09:32:08 -0800 Subject: [annotation] [Fwd:What evidence code to use?] In-Reply-To: <474EF274.6010101@informatics.jax.org> References: <47447B6B.9040502@acoma.stanford.edu> <474CCBD8.6080608@informatics.jax.org> <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> <474DC073.7040301@informatics.jax.org> <474EEAE0.7030506@northwestern.edu> <474EF274.6010101@informatics.jax.org> Message-ID: <474EF798.6020105@acoma.stanford.edu> Tanya: I suggest that you leave it ISS for now. In the new evidence ontology, Reviewed by Computational Analysis or some generic version of RCA is likely to be a parent of the generic version of ISS. I haven't gotten much feedback from the evidence committee on the updated evidence ontology and will send out the ontology to the whole GO group sometime next week. Sue Judith Blake wrote: > I shouldn't have jumped into this. But.... > > ISS for MGI requires that the ISS be backed up with experimental > data. Clearly, the analysis brought forward does not do that. > > RCA from SGD perspective requires experimental data sets. From MGI > perspective, was used for the FANTOM analysis (only) when the sequence > analysis was part of expert annotation. MGI has not had much occasion > to use RCA since the Fantom, and we are gradually removing these. > > The argument about ISS was whether it was to be restricted to use with > orthologs that had experiments or whether it was to include sequence > analysis and HMM type studies done in the individual organisms. > We resolved that, I thought, by moving toward ISS with subcodes of ISO > (for orthology sets) and IS- (I don't remember) for HMMs and other > supervised sequence analysis. The study brought forward by Tanya > could be either the ISS (generic sequence analysis) or the other one, > but certainly these are not backed by experimental data, so with the > current RCA, these could best, perhaps, be > > ISS (generic) but we don't have this implemented yet > IEA.....why not? well, it's not just an electronic analysis... > > Again, these reflects only predictive analysis, there is no > experimental data, MGI would prefer ISS only be used when backed by > experimental data (or the new category) and SGD would prefer that RCA > be restricted to experiment +/- computational analysis using sequence. > > In the end, I would like to express my thoughts again that we should > not drown ourselves in this discussion. By going to the reference or > by reading MOD supplied abstract, users can determine the predictive > algorithm source if they want too. One could argue that we spend too > too much time on sorting this out when we do have group consensus that > evidence codes are mostly to provide clues to users as to the assay > generic classes that the annotation is supported by. The reference is > really the source, and we toe a fine line between just using > 'experimental' and 'predicted', and providing all the gory details of > the analysis. > Cheers, > Judy > > > > Pascale Gaudet wrote: >> But, I thought RCA required experimental data?? >> >> From documentation: http://www.geneontology.org/GO.evidence.shtml#ica >> >> * Predictions based on computational analyses of large-scale >> experimental data sets >> * Predictions based on computational analyses that integrate >> datasets of several types, including experimental data (e.g. >> expression data, protein-protein interaction data, genetic >> interaction data, etc.), sequence data (e.g. promoter sequence, >> sequence-based structural predictions, etc.), or mathematical >> models >> >> Pascale >> >> Judith Blake wrote: >>> ok with me if we need to make the distinction. I took it to mean >>> the difference between a simple alignment report and a more >>> comprehensive analysis. Phylogenetic analyses employ powerful >>> algorithms, but at the core of the analysis are manually curated >>> multiple alignments from hundreds of species. These could be RCA >>> for me. At the end of the day, I think it doesn't matter :) since >>> all these measures are predictive and not experimental determinations. >>> >>> Judy >>> >>> >>> Karen Christie wrote: >>>> My recollection is that RCA was proposed by SGD to handle papers >>>> such as Samanta and Liang 2003 (url below) where they did >>>> computational analysis of large-scale protein interaction data. >>>> >>>> http://db.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=S000074191 >>>> >>>> >>>> The original documentation for RCA explicitly stated that it was >>>> not to be used for sequence data. At the St. Croix meeting, Sue >>>> Rhee brought up the point that some computational analyses combined >>>> sequence data into the types of analyses done by Samanta and Liang. >>>> On that basis, it was agreed that RCA could include sequence data, >>>> but was not intended for analyses that were entirely sequence based. >>>> >>>> -Karen >>>> >>>> >>>> On Wed, 28 Nov 2007, Mike Cherry wrote: >>>> >>>>> I believe RCA was proposed by SGD to use with analyzes like Biopixie. >>>>> >>>>> Cheers, Mike >>>>> >>>>> >>>>> On Nov 27, 2007, at 9:00 PM, Judith Blake >>>>> wrote: >>>>> >>>>>> This is exactly what RCA was originally used for. With the >>>>>> FANTOM project [mouse full length cDNA annotatons], participants >>>>>> employed a series of algorithmic approaches combined with manual >>>>>> inspection and evaluation to provide annotations. Actually, I >>>>>> think RCA was created as a result of the FANTOM project. >>>>>> >>>>>> Judy >>>>>> >>>>>> Tanya Berardini wrote: >>>>>>> Forwarding this from the evidence code discussion group. >>>>>>> Apologies to those who are on both lists. I've sorted the >>>>>>> emails from top to bottom in chronological order for easier >>>>>>> reading: >>>>>>> >>>>>>> ---------- >>>>>>> My original email: >>>>>>> >>>>>>>> Ah, the eternal question: Is it ISS, is it RCA? >>>>>>>> >>>>>>>> I've got a paper that describes the identification of a nice >>>>>>>> big set >>>>>>>> of transcription factors in Arabidopsis. >>>>>>>> >>>>>>>> http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The authors use a combination of motif searches + BLAST + sequence >>>>>>>> alignment and review those by eye and came up with 1500 or so >>>>>>>> genes >>>>>>>> that they call 'transcription factors.' >>>>>>>> >>>>>>>> Right now, we've got these annotated to 'transcription factor >>>>>>>> activity' with the evidence code ISS but nothing in the >>>>>>>> evidence_with >>>>>>>> column. If I leave these as ISS, I'd like to put something in the >>>>>>>> with column, but what? Does this type of a combination of >>>>>>>> sequence >>>>>>>> analysis methods that's reviewed manually make it RCA? Not >>>>>>>> according >>>>>>>> to the current RCA documentation: >>>>>>>> >>>>>>>> "Examples where the RCA evidence code should not be used: >>>>>>>> >>>>>>>> * Annotations based on more than one type of gene product >>>>>>>> sequence >>>>>>>> based evidence, including such things as BLAST, profile HMMs, >>>>>>>> TMHMM, >>>>>>>> SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. >>>>>>>> should use the ISS code. " >>>>>>>> >>>>>>>> Should I wait till ISS comes to a resolution? >>>>>>>> >>>>>>>> Help! >>>>>>> >>>>>>> --------- >>>>>>> Ben's reply: >>>>>>> >>>>>>> If you can't put something USEFUL in the WITH column, I think >>>>>>> this has to be RCA. >>>>>>> I guess under the new, non-documented system, this would be >>>>>>> ISS/no "With" ISA/ISO/ISM would require withs... (either seq ids >>>>>>> or model aka interpro ids). >>>>>>> >>>>>>> >>>>>>> Ben >>>>>>> >>>>>>> ---------- >>>>>>> >>>>>>> Val's reply: >>>>>>> >>>>>>> This is *exactly* the type of data why I was orginally >>>>>>> suggesting that RCA should not be restricted to analysis which >>>>>>> include some experimental component. Unfortunately I couldn't >>>>>>> come up with any good examples at the time. >>>>>>> >>>>>>> These would surely be better as RCA, even though they are >>>>>>> sequence based >>>>>>> >>>>>>> Val >>>>>>> >>>>>>> ---------- >>>>>>> >>>>>>> Susan's reply: >>>>>>> >>>>>>> I've just hit another example... >>>>>>> >>>>>>> Enhanced function annotations for Drosophila serine proteases: A >>>>>>> case >>>>>>> study for >>>>>>> systematic annotation of multi-member gene families. >>>>>>> >>>>>>> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, >>>>>>> Rodrigues V, >>>>>>> White KP, Bork P, Sowdhamini R. >>>>>>> >>>>>>> PMID: 17996400 >>>>>>> >>>>>>> This is a functional classification of serine proteases based on a >>>>>>> 'function residue clustering' algorithm. The algorithm >>>>>>> incorporates info >>>>>>> from sequence alignments, hydrophobicity plots and info about key >>>>>>> residues from 3D structures - all sequence based but no one >>>>>>> thing to put >>>>>>> in the 'with'. >>>>>>> >>>>>>> Susan >>>>>>> >>>>>>> ----------- >>>>>>> >>>>>>> Pascale's reply: >>>>>>> >>>>>>> Tanya, >>>>>>> >>>>>>> I thought we agreed that BLAST and InterPro were ISS, as you >>>>>>> point out. I don't think ISS + ISS = RCA?? That is, I would say >>>>>>> using InterPro or the BLAST result should be enough to make the >>>>>>> annotation; we dont need to capture both? In this case, the >>>>>>> easiest might be using ISS with an InterPro domain ID in the >>>>>>> 'with', >>>>>>> >>>>>>> Similarly in the paper Susan cites, they mention several domains >>>>>>> and also they have compared to several proteins whose 3D >>>>>>> structure has been determined hence can be used in the 'with' - >>>>>>> I would pick one of those example proteins and ISS to that. >>>>>>> >>>>>>> Pascale >>>>>>> >>>>>>> --------- >>>>>>> >>>>>>> Any other thoughts? >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Tanya >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -------- Original Message -------- >>>>>>> Subject: Re: [evidence] What evidence code to use? >>>>>>> Date: Wed, 21 Nov 2007 08:43:16 -0500 >>>>>>> From: Pascale Gaudet >>>>>>> Reply-To: pgaudet at northwestern.edu >>>>>>> Organization: Northwestern University >>>>>>> To: tberardi at acoma.stanford.edu >>>>>>> CC: evidence at genome.stanford.edu >>>>>>> References: <47437C88.5070204 at acoma.stanford.edu> >>>>>>> >>>>>>> Tanya, >>>>>>> >>>>>>> I thought we agreed that BLAST and InterPro were ISS, as you >>>>>>> point out. >>>>>>> I don't think ISS + ISS = RCA?? That is, I would say using >>>>>>> InterPro or >>>>>>> the BLAST result should be enough to make the annotation; we >>>>>>> dont need >>>>>>> to capture both? In this case, the easiest might be using ISS >>>>>>> with an >>>>>>> InterPro domain ID in the 'with', >>>>>>> >>>>>>> Similarly in the paper Susan cites, they mention several domains >>>>>>> and >>>>>>> also they have compared to several proteins whose 3D structure >>>>>>> has been >>>>>>> determined hence can be used in the 'with' - I would pick one of >>>>>>> those >>>>>>> example proteins and ISS to that. >>>>>>> >>>>>>> Pascale >>>>>>> >>>>>>> >>>>>>>> ------------------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>>>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>>>>> 260 Panama St. >>>>>>>> Stanford, CA 94305 >>>>>>>> ------------------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> >>> >>> >> >> -- >> ~~~~~~~~~~~~~~~~~~~ >> Pascale Gaudet, PhD >> Scientific Curator, dictyBase >> Northwestern University, Chicago, IL >> pgaudet at northwestern.edu >> www.dictybase.org >> ~~~~~~~~~~~~~~~~~~ -- Sue Rhee Staff Scientist Carnegie Institution, Department of Plant Biology 260 Panama Street, Stanford, CA 94305 Email: (650) 325-1521 x251 Fax: (650) 325-6857 From tberardi at acoma.Stanford.EDU Thu Nov 29 09:41:17 2007 From: tberardi at acoma.Stanford.EDU (Tanya Berardini) Date: Thu, 29 Nov 2007 09:41:17 -0800 Subject: [annotation] [Fwd:What evidence code to use?] In-Reply-To: <474EF798.6020105@acoma.stanford.edu> References: <47447B6B.9040502@acoma.stanford.edu> <474CCBD8.6080608@informatics.jax.org> <5B307BFC-A18D-4A40-98CF-6D0D6198A87F@stanford.edu> <474DC073.7040301@informatics.jax.org> <474EEAE0.7030506@northwestern.edu> <474EF274.6010101@informatics.jax.org> <474EF798.6020105@acoma.stanford.edu> Message-ID: <474EF9BD.6000308@acoma.stanford.edu> Ok, they'll stay as ISS without anything in the evidence_with field for now. Thanks everyone. Tanya Sue Rhee wrote: > Tanya: I suggest that you leave it ISS for now. In the new evidence > ontology, Reviewed by Computational Analysis or some generic version of > RCA is likely to be a parent of the generic version of ISS. I haven't > gotten much feedback from the evidence committee on the updated evidence > ontology and will send out the ontology to the whole GO group sometime > next week. > > Sue > > Judith Blake wrote: >> I shouldn't have jumped into this. But.... >> >> ISS for MGI requires that the ISS be backed up with experimental >> data. Clearly, the analysis brought forward does not do that. >> >> RCA from SGD perspective requires experimental data sets. From MGI >> perspective, was used for the FANTOM analysis (only) when the sequence >> analysis was part of expert annotation. MGI has not had much occasion >> to use RCA since the Fantom, and we are gradually removing these. >> >> The argument about ISS was whether it was to be restricted to use with >> orthologs that had experiments or whether it was to include sequence >> analysis and HMM type studies done in the individual organisms. We >> resolved that, I thought, by moving toward ISS with subcodes of ISO >> (for orthology sets) and IS- (I don't remember) for HMMs and other >> supervised sequence analysis. The study brought forward by Tanya >> could be either the ISS (generic sequence analysis) or the other one, >> but certainly these are not backed by experimental data, so with the >> current RCA, these could best, perhaps, be >> >> ISS (generic) but we don't have this implemented yet >> IEA.....why not? well, it's not just an electronic analysis... >> >> Again, these reflects only predictive analysis, there is no >> experimental data, MGI would prefer ISS only be used when backed by >> experimental data (or the new category) and SGD would prefer that RCA >> be restricted to experiment +/- computational analysis using sequence. >> >> In the end, I would like to express my thoughts again that we should >> not drown ourselves in this discussion. By going to the reference or >> by reading MOD supplied abstract, users can determine the predictive >> algorithm source if they want too. One could argue that we spend too >> too much time on sorting this out when we do have group consensus that >> evidence codes are mostly to provide clues to users as to the assay >> generic classes that the annotation is supported by. The reference is >> really the source, and we toe a fine line between just using >> 'experimental' and 'predicted', and providing all the gory details of >> the analysis. >> Cheers, >> Judy >> >> >> >> Pascale Gaudet wrote: >>> But, I thought RCA required experimental data?? >>> >>> From documentation: http://www.geneontology.org/GO.evidence.shtml#ica >>> >>> * Predictions based on computational analyses of large-scale >>> experimental data sets >>> * Predictions based on computational analyses that integrate >>> datasets of several types, including experimental data (e.g. >>> expression data, protein-protein interaction data, genetic >>> interaction data, etc.), sequence data (e.g. promoter sequence, >>> sequence-based structural predictions, etc.), or mathematical >>> models >>> >>> Pascale >>> >>> Judith Blake wrote: >>>> ok with me if we need to make the distinction. I took it to mean >>>> the difference between a simple alignment report and a more >>>> comprehensive analysis. Phylogenetic analyses employ powerful >>>> algorithms, but at the core of the analysis are manually curated >>>> multiple alignments from hundreds of species. These could be RCA >>>> for me. At the end of the day, I think it doesn't matter :) since >>>> all these measures are predictive and not experimental determinations. >>>> >>>> Judy >>>> >>>> >>>> Karen Christie wrote: >>>>> My recollection is that RCA was proposed by SGD to handle papers >>>>> such as Samanta and Liang 2003 (url below) where they did >>>>> computational analysis of large-scale protein interaction data. >>>>> >>>>> http://db.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=S000074191 >>>>> >>>>> >>>>> The original documentation for RCA explicitly stated that it was >>>>> not to be used for sequence data. At the St. Croix meeting, Sue >>>>> Rhee brought up the point that some computational analyses combined >>>>> sequence data into the types of analyses done by Samanta and Liang. >>>>> On that basis, it was agreed that RCA could include sequence data, >>>>> but was not intended for analyses that were entirely sequence based. >>>>> >>>>> -Karen >>>>> >>>>> >>>>> On Wed, 28 Nov 2007, Mike Cherry wrote: >>>>> >>>>>> I believe RCA was proposed by SGD to use with analyzes like Biopixie. >>>>>> >>>>>> Cheers, Mike >>>>>> >>>>>> >>>>>> On Nov 27, 2007, at 9:00 PM, Judith Blake >>>>>> wrote: >>>>>> >>>>>>> This is exactly what RCA was originally used for. With the >>>>>>> FANTOM project [mouse full length cDNA annotatons], participants >>>>>>> employed a series of algorithmic approaches combined with manual >>>>>>> inspection and evaluation to provide annotations. Actually, I >>>>>>> think RCA was created as a result of the FANTOM project. >>>>>>> >>>>>>> Judy >>>>>>> >>>>>>> Tanya Berardini wrote: >>>>>>>> Forwarding this from the evidence code discussion group. >>>>>>>> Apologies to those who are on both lists. I've sorted the >>>>>>>> emails from top to bottom in chronological order for easier >>>>>>>> reading: >>>>>>>> >>>>>>>> ---------- >>>>>>>> My original email: >>>>>>>> >>>>>>>>> Ah, the eternal question: Is it ISS, is it RCA? >>>>>>>>> >>>>>>>>> I've got a paper that describes the identification of a nice >>>>>>>>> big set >>>>>>>>> of transcription factors in Arabidopsis. >>>>>>>>> >>>>>>>>> http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The authors use a combination of motif searches + BLAST + sequence >>>>>>>>> alignment and review those by eye and came up with 1500 or so >>>>>>>>> genes >>>>>>>>> that they call 'transcription factors.' >>>>>>>>> >>>>>>>>> Right now, we've got these annotated to 'transcription factor >>>>>>>>> activity' with the evidence code ISS but nothing in the >>>>>>>>> evidence_with >>>>>>>>> column. If I leave these as ISS, I'd like to put something in the >>>>>>>>> with column, but what? Does this type of a combination of >>>>>>>>> sequence >>>>>>>>> analysis methods that's reviewed manually make it RCA? Not >>>>>>>>> according >>>>>>>>> to the current RCA documentation: >>>>>>>>> >>>>>>>>> "Examples where the RCA evidence code should not be used: >>>>>>>>> >>>>>>>>> * Annotations based on more than one type of gene product >>>>>>>>> sequence >>>>>>>>> based evidence, including such things as BLAST, profile HMMs, >>>>>>>>> TMHMM, >>>>>>>>> SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. >>>>>>>>> should use the ISS code. " >>>>>>>>> >>>>>>>>> Should I wait till ISS comes to a resolution? >>>>>>>>> >>>>>>>>> Help! >>>>>>>> >>>>>>>> --------- >>>>>>>> Ben's reply: >>>>>>>> >>>>>>>> If you can't put something USEFUL in the WITH column, I think >>>>>>>> this has to be RCA. >>>>>>>> I guess under the new, non-documented system, this would be >>>>>>>> ISS/no "With" ISA/ISO/ISM would require withs... (either seq ids >>>>>>>> or model aka interpro ids). >>>>>>>> >>>>>>>> >>>>>>>> Ben >>>>>>>> >>>>>>>> ---------- >>>>>>>> >>>>>>>> Val's reply: >>>>>>>> >>>>>>>> This is *exactly* the type of data why I was orginally >>>>>>>> suggesting that RCA should not be restricted to analysis which >>>>>>>> include some experimental component. Unfortunately I couldn't >>>>>>>> come up with any good examples at the time. >>>>>>>> >>>>>>>> These would surely be better as RCA, even though they are >>>>>>>> sequence based >>>>>>>> >>>>>>>> Val >>>>>>>> >>>>>>>> ---------- >>>>>>>> >>>>>>>> Susan's reply: >>>>>>>> >>>>>>>> I've just hit another example... >>>>>>>> >>>>>>>> Enhanced function annotations for Drosophila serine proteases: A >>>>>>>> case >>>>>>>> study for >>>>>>>> systematic annotation of multi-member gene families. >>>>>>>> >>>>>>>> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, >>>>>>>> Rodrigues V, >>>>>>>> White KP, Bork P, Sowdhamini R. >>>>>>>> >>>>>>>> PMID: 17996400 >>>>>>>> >>>>>>>> This is a functional classification of serine proteases based on a >>>>>>>> 'function residue clustering' algorithm. The algorithm >>>>>>>> incorporates info >>>>>>>> from sequence alignments, hydrophobicity plots and info about key >>>>>>>> residues from 3D structures - all sequence based but no one >>>>>>>> thing to put >>>>>>>> in the 'with'. >>>>>>>> >>>>>>>> Susan >>>>>>>> >>>>>>>> ----------- >>>>>>>> >>>>>>>> Pascale's reply: >>>>>>>> >>>>>>>> Tanya, >>>>>>>> >>>>>>>> I thought we agreed that BLAST and InterPro were ISS, as you >>>>>>>> point out. I don't think ISS + ISS = RCA?? That is, I would say >>>>>>>> using InterPro or the BLAST result should be enough to make the >>>>>>>> annotation; we dont need to capture both? In this case, the >>>>>>>> easiest might be using ISS with an InterPro domain ID in the >>>>>>>> 'with', >>>>>>>> >>>>>>>> Similarly in the paper Susan cites, they mention several domains >>>>>>>> and also they have compared to several proteins whose 3D >>>>>>>> structure has been determined hence can be used in the 'with' - >>>>>>>> I would pick one of those example proteins and ISS to that. >>>>>>>> >>>>>>>> Pascale >>>>>>>> >>>>>>>> --------- >>>>>>>> >>>>>>>> Any other thoughts? >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Tanya >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -------- Original Message -------- >>>>>>>> Subject: Re: [evidence] What evidence code to use? >>>>>>>> Date: Wed, 21 Nov 2007 08:43:16 -0500 >>>>>>>> From: Pascale Gaudet >>>>>>>> Reply-To: pgaudet at northwestern.edu >>>>>>>> Organization: Northwestern University >>>>>>>> To: tberardi at acoma.stanford.edu >>>>>>>> CC: evidence at genome.stanford.edu >>>>>>>> References: <47437C88.5070204 at acoma.stanford.edu> >>>>>>>> >>>>>>>> Tanya, >>>>>>>> >>>>>>>> I thought we agreed that BLAST and InterPro were ISS, as you >>>>>>>> point out. >>>>>>>> I don't think ISS + ISS = RCA?? That is, I would say using >>>>>>>> InterPro or >>>>>>>> the BLAST result should be enough to make the annotation; we >>>>>>>> dont need >>>>>>>> to capture both? In this case, the easiest might be using ISS >>>>>>>> with an >>>>>>>> InterPro domain ID in the 'with', >>>>>>>> >>>>>>>> Similarly in the paper Susan cites, they mention several domains >>>>>>>> and >>>>>>>> also they have compared to several proteins whose 3D structure >>>>>>>> has been >>>>>>>> determined hence can be used in the 'with' - I would pick one of >>>>>>>> those >>>>>>>> example proteins and ISS to that. >>>>>>>> >>>>>>>> Pascale >>>>>>>> >>>>>>>> >>>>>>>>> ------------------------------------------------------------------------------------------ >>>>>>>>> >>>>>>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>>>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>>>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>>>>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>>>>>> 260 Panama St. >>>>>>>>> Stanford, CA 94305 >>>>>>>>> ------------------------------------------------------------------------------------------ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >>>> >>>> >>> >>> -- >>> ~~~~~~~~~~~~~~~~~~~ >>> Pascale Gaudet, PhD >>> Scientific Curator, dictyBase >>> Northwestern University, Chicago, IL >>> pgaudet at northwestern.edu >>> www.dictybase.org >>> ~~~~~~~~~~~~~~~~~~ > -- ------------------------------------------------------------------------------------------ Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu The Arabidopsis Information Resource FAX: (650) 325-6857 Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 Department of Plant Biology URL: http://arabidopsis.org/ 260 Panama St. Stanford, CA 94305 ------------------------------------------------------------------------------------------ From midori at ebi.ac.uk Thu Nov 29 22:00:04 2007 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Fri, 30 Nov 2007 06:00:04 UT Subject: [annotation] SourceForge Annotation Tracker Update Message-ID: <200711300600.lAU605v1276766@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20071130/6b385bdc/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20071130/6b385bdc/attachment.pl