From cherry at stanford.edu Tue Apr 1 10:48:02 2008 From: cherry at stanford.edu (Mike Cherry) Date: Tue, 1 Apr 2008 10:48:02 -0700 Subject: [Annotation] Scheduling 2nd Curator Discussion Message-ID: <2D1A025E-D2A7-4DE2-86FD-20B07E258209@stanford.edu> Hello, I have created a doodle at: http://www.doodle.ch/2yf4pzvphgaum7tkykwz9hu2/admin This is to schedule a Curator Discussion for the week of April 14th. Potential topics are the discussion of a selected paper, I believe WormBase or MGI were potentially thinking of proposing a paper. Two other topics that I am interested in discussing have to do with the communication of the various projects with their communities. Such as what information is put on home pages, is a newsletters or wiki, what is announced and what is announced. The later topic would be the beginning of a discussion how the biocuration group conduct business. I think this is a good place to start before we get into annotation procedures and requirements. I'm calling these calls "Curator Discussions", anyone have a better name? -Mike From cherry at stanford.edu Tue Apr 1 10:56:30 2008 From: cherry at stanford.edu (Mike Cherry) Date: Tue, 1 Apr 2008 10:56:30 -0700 Subject: [Annotation] Scheduling 2nd Curator Discussion In-Reply-To: <2D1A025E-D2A7-4DE2-86FD-20B07E258209@stanford.edu> References: <2D1A025E-D2A7-4DE2-86FD-20B07E258209@stanford.edu> Message-ID: <4026A2A4-E83A-4A4A-BBBF-92800623F730@stanford.edu> Sorry the correct doodle hyperlink is: http://www.doodle.ch/vibfa5m58hgb3gma -Mike On Apr 1, 2008, at 10:48 AM, Mike Cherry wrote: > Hello, > > This is to schedule a Curator Discussion for the week of April 14th. > > Potential topics are the discussion of a selected paper, I believe > WormBase or MGI were potentially thinking of proposing a paper. Two > other topics that I am interested in discussing have to do with the > communication of the various projects with their communities. Such as > what information is put on home pages, is a newsletters or wiki, what > is announced and what is announced. The later topic would be the > beginning of a discussion how the biocuration group conduct business. > I think this is a good place to start before we get into annotation > procedures and requirements. > > I'm calling these calls "Curator Discussions", anyone have a better > name? > > -Mike > > _______________________________________________ > Annotation mailing list > Annotation at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/annotation From kchris at genome.stanford.edu Tue Apr 1 15:24:30 2008 From: kchris at genome.stanford.edu (Karen Christie) Date: Tue, 1 Apr 2008 15:24:30 -0700 (PDT) Subject: [Annotation] evidence code advice In-Reply-To: <3CC10808-17BB-45BF-9963-B8075045E3B8@fruitfly.org> References: <7EA8D90D-C57F-4F76-A060-3D28A470865D@genome.stanford.edu> <26C47A9C-74CD-4033-BE4E-086D6015713D@genomics.princeton.edu> <3CC10808-17BB-45BF-9963-B8075045E3B8@fruitfly.org> Message-ID: Hi, I think Kara's suggestion is a great idea, though more of a long term possibility and not an immediate solution to the issue Rama brought up of what evidence code to use for annotations that are based on an RCA type of method but not reviewed by a curator. As for the with column and IEA, yes, at the Jan 2007 meeting in Cambridge, we agreed to make the with column mandatory for IEA. At the time, we considered whether or not to make the with column mandatory based only on sequenced based methods and keyword based methods. The need to make IEA annotations for non-reviewed RCA methods was not considered. The initial documentation for the RCA code allowed the RCA code to be used for both curator reviewed and non-curator reviewed annotations. However, at that meeting, the RCA code was sent back to committee for further review. During subsequent review of the RCA code by the evidence code committee, it was agreed that the RCA code should be more like the other codes and thus it was limited to be a curator-reviewed code, with the thought that when annotations were made based on RCA methods but without curator review that they would be IEA, similarly to unreviewed annotations based on ISS methods. Clearly we'd forgotten about the newly instituted requirement for the with column to be filled for IEA. Anyway, in light of that history, I think it would make most sense if the absolute requirement for the with column to be filled for IEA was dropped in the short term, so that we can use the IEA code for unreviewed annotations from RCA methods. In the long term, I think Kara's proposal is a better way to go. -Karen On Sun, 30 Mar 2008, Suzanna Lewis wrote: > This is very much along the lines that I've been trying to foster > (remember the meeting in Cambridge at Jesus College). The bit-code (or > bar-code) for evidence codes, with each bit indicating one of these > flags for a different piece of information. Not only automated/manual, > but also large-scale/small-scale, and other characteristics of the > evidence. > > As Kara (and many others) have said, there is quite a bit of over- > loading of multiple pieces of information in the current evidence > codes. It would be nice one day to see these distinguished into > different constituent bits of information. > > -S > > p.s. I thought that IEA did not -require- the with column. > p.p.s Was the decision tree a step in this direction? > > On Mar 26, 2008, at 1:59 PM, Kara Dolinski wrote: > >> Hi, >> >> The root of the problem, as I see it, is that we are mixing apples >> and oranges with evidence codes. All but one of the evidence codes >> indicate the type of experimental evidence for a GO annotation, but >> we have one oddball, IEA, that indicates not what the experiment is, >> but rather how the annotation was done. We keep running into >> variations of the same problem: we have some evidence (whether >> experimental or computational) for a GO annotation, but also want to >> indicate whether a curator looked at it or not. >> >> My proposed (albeit radical) solution: >> >> Remove IEA as an evidence code. >> >> Create a new property for GO annotations (or add a new type of >> qualifier) that captures how the annotation was done: manual or >> automated. >> >> Everything that is currently IEA would be given the 'automated' >> property/qualifier, and then would be given a new evidence code as >> appropriate (mostly a flavor of ISS I would assume). >> There can be a rule that all 'automated' annotations that are a >> flavor of ISS must have a 'with' value. >> >> This would allow us to use 'RCA' as appropriate, in some cases >> they'd be 'manual', in others, they'd be 'automated'. In Rama's >> case, the annotations would be 'RCA' with an 'automated' qualifier. >> >> I realize the issues involved in making such a drastic change, so I >> understand if we don't go there, but I do think that some approach >> such as the one above is the best representation of the information >> that we are trying to capture. >> >> Cheers, >> Kara >> >> On Mar 26, 2008, at 4:30 PM, Rama Balakrishnan wrote: >> >>> >>> Hi All, >>> >>> SGD has come across couple of computationally predicted GO >>> annotation data sets for S. cerevisiae that we would like to add to >>> our database. The GO annotations from these data sets are >>> predictions based on multiple high-throughput data sets. RCA >>> evidence code came to our minds but according to the documentation, >>> the annotations all have to be manually reviewed by a curator to >>> use this evidence. There are several 100 annotations of this kind >>> and it is not feasible for us to manually review these annotations. >>> >>> Hence, we thought these annotations can be bulk loaded with IEA >>> evidence code. However, in the Jan 2007 (Cambridge) GO meeting, it >>> was decided that the 'with' column information has to be filled in >>> for all IEAs (else Mike's filtering script strips them out). But >>> these GO annotations being predictions based on multiple high- >>> throughput data sets, don't have any information for the with >>> column. So, we are left with no choice. >>> >>> Which evidence code do people think should be used for these kinds >>> of computational datasets when there is not an obvious "with"? >>> >>> Thanks for your input. >>> >>> >>> Rama >>> >>> >>> +-----o--o >>> --------------------------------------------------------------- >>> o-o Rama Balakrishnan Ph.D >>> O Senior Scientific Curator >>> o-o Saccharomyces Genome Database >>> o---o Stanford University >>> o----o Stanford, CA 94305-5120 >>> O-----O Ph: 650.725.8956 Fax: 650.723.7016 >>> 0--o email: rama at genome.stanford.edu >>> O Website: http://www.yeastgenome.org >>> o-o SGD Wiki- http://wiki.yeastgenome.org >>> +- o---o >>> ----------------------------------------------------------------- >>> >>> >>> >>> >>> >>> >>> >> >> _______________________________________________ >> Annotation mailing list >> Annotation at geneontology.org >> http://fafner.stanford.edu/mailman/listinfo/annotation > > > _______________________________________________ > Annotation mailing list > Annotation at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/annotation > From rama at genome.stanford.edu Tue Apr 1 15:37:51 2008 From: rama at genome.stanford.edu (Rama Balakrishnan) Date: Tue, 1 Apr 2008 15:37:51 -0700 Subject: [Annotation] evidence code advice In-Reply-To: References: <7EA8D90D-C57F-4F76-A060-3D28A470865D@genome.stanford.edu> <26C47A9C-74CD-4033-BE4E-086D6015713D@genomics.princeton.edu> <3CC10808-17BB-45BF-9963-B8075045E3B8@fruitfly.org> Message-ID: > > Anyway, in light of that history, I think it would make most sense > if the > absolute requirement for the with column to be filled for IEA was > dropped > in the short term, so that we can use the IEA code for unreviewed > annotations from RCA methods. I think it is important to require the 'with' column for IEAs to prevent circular annotations. The other option is to revert the RCA code to its original version which required only the computational method to be reviewed and not every annotation. I also really like Kara's proposal and hopefully this will be discussed at the upcoming GO meeting. Rama > > > In the long term, I think Kara's proposal is a better way to go. > > -Karen > > > On Sun, 30 Mar 2008, Suzanna Lewis wrote: > >> This is very much along the lines that I've been trying to foster >> (remember the meeting in Cambridge at Jesus College). The bit-code >> (or >> bar-code) for evidence codes, with each bit indicating one of these >> flags for a different piece of information. Not only automated/ >> manual, >> but also large-scale/small-scale, and other characteristics of the >> evidence. >> >> As Kara (and many others) have said, there is quite a bit of over- >> loading of multiple pieces of information in the current evidence >> codes. It would be nice one day to see these distinguished into >> different constituent bits of information. >> >> -S >> >> p.s. I thought that IEA did not -require- the with column. >> p.p.s Was the decision tree a step in this direction? >> >> On Mar 26, 2008, at 1:59 PM, Kara Dolinski wrote: >> >>> Hi, >>> >>> The root of the problem, as I see it, is that we are mixing apples >>> and oranges with evidence codes. All but one of the evidence codes >>> indicate the type of experimental evidence for a GO annotation, but >>> we have one oddball, IEA, that indicates not what the experiment is, >>> but rather how the annotation was done. We keep running into >>> variations of the same problem: we have some evidence (whether >>> experimental or computational) for a GO annotation, but also want to >>> indicate whether a curator looked at it or not. >>> >>> My proposed (albeit radical) solution: >>> >>> Remove IEA as an evidence code. >>> >>> Create a new property for GO annotations (or add a new type of >>> qualifier) that captures how the annotation was done: manual or >>> automated. >>> >>> Everything that is currently IEA would be given the 'automated' >>> property/qualifier, and then would be given a new evidence code as >>> appropriate (mostly a flavor of ISS I would assume). >>> There can be a rule that all 'automated' annotations that are a >>> flavor of ISS must have a 'with' value. >>> >>> This would allow us to use 'RCA' as appropriate, in some cases >>> they'd be 'manual', in others, they'd be 'automated'. In Rama's >>> case, the annotations would be 'RCA' with an 'automated' qualifier. >>> >>> I realize the issues involved in making such a drastic change, so I >>> understand if we don't go there, but I do think that some approach >>> such as the one above is the best representation of the information >>> that we are trying to capture. >>> >>> Cheers, >>> Kara >>> >>> On Mar 26, 2008, at 4:30 PM, Rama Balakrishnan wrote: >>> >>>> >>>> Hi All, >>>> >>>> SGD has come across couple of computationally predicted GO >>>> annotation data sets for S. cerevisiae that we would like to add to >>>> our database. The GO annotations from these data sets are >>>> predictions based on multiple high-throughput data sets. RCA >>>> evidence code came to our minds but according to the documentation, >>>> the annotations all have to be manually reviewed by a curator to >>>> use this evidence. There are several 100 annotations of this kind >>>> and it is not feasible for us to manually review these annotations. >>>> >>>> Hence, we thought these annotations can be bulk loaded with IEA >>>> evidence code. However, in the Jan 2007 (Cambridge) GO meeting, it >>>> was decided that the 'with' column information has to be filled in >>>> for all IEAs (else Mike's filtering script strips them out). But >>>> these GO annotations being predictions based on multiple high- >>>> throughput data sets, don't have any information for the with >>>> column. So, we are left with no choice. >>>> >>>> Which evidence code do people think should be used for these kinds >>>> of computational datasets when there is not an obvious "with"? >>>> >>>> Thanks for your input. >>>> >>>> >>>> Rama >>>> >>>> >>>> +-----o--o >>>> --------------------------------------------------------------- >>>> o-o Rama Balakrishnan Ph.D >>>> O Senior Scientific Curator >>>> o-o Saccharomyces Genome Database >>>> o---o Stanford University >>>> o----o Stanford, CA 94305-5120 >>>> O-----O Ph: 650.725.8956 Fax: 650.723.7016 >>>> 0--o email: rama at genome.stanford.edu >>>> O Website: http://www.yeastgenome.org >>>> o-o SGD Wiki- http://wiki.yeastgenome.org >>>> +- o---o >>>> ----------------------------------------------------------------- >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Annotation mailing list >>> Annotation at geneontology.org >>> http://fafner.stanford.edu/mailman/listinfo/annotation >> >> >> _______________________________________________ >> Annotation mailing list >> Annotation at geneontology.org >> http://fafner.stanford.edu/mailman/listinfo/annotation >> > _______________________________________________ > Annotation mailing list > Annotation at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/annotation From jblake at informatics.jax.org Tue Apr 1 17:55:19 2008 From: jblake at informatics.jax.org (Judith Blake) Date: Tue, 01 Apr 2008 20:55:19 -0400 Subject: [Annotation] evidence code advice In-Reply-To: References: <7EA8D90D-C57F-4F76-A060-3D28A470865D@genome.stanford.edu> <26C47A9C-74CD-4033-BE4E-086D6015713D@genomics.princeton.edu> <3CC10808-17BB-45BF-9963-B8075045E3B8@fruitfly.org> Message-ID: <47F2D977.5050904@informatics.jax.org> Rama, If this hasn't been done, would you please add to the wiki agenda list with a pointer to a page with Kara's (and others?) emails... Thanks very much judy Rama Balakrishnan wrote: >> Anyway, in light of that history, I think it would make most sense >> if the >> absolute requirement for the with column to be filled for IEA was >> dropped >> in the short term, so that we can use the IEA code for unreviewed >> annotations from RCA methods. >> > > I think it is important to require the 'with' column for IEAs to > prevent circular annotations. > The other option is to revert the RCA code to its original version > which required only the computational method to be reviewed and not > every annotation. > > I also really like Kara's proposal and hopefully this will be > discussed at the upcoming GO meeting. > > Rama > > > >> In the long term, I think Kara's proposal is a better way to go. >> >> -Karen >> >> >> On Sun, 30 Mar 2008, Suzanna Lewis wrote: >> >> >>> This is very much along the lines that I've been trying to foster >>> (remember the meeting in Cambridge at Jesus College). The bit-code >>> (or >>> bar-code) for evidence codes, with each bit indicating one of these >>> flags for a different piece of information. Not only automated/ >>> manual, >>> but also large-scale/small-scale, and other characteristics of the >>> evidence. >>> >>> As Kara (and many others) have said, there is quite a bit of over- >>> loading of multiple pieces of information in the current evidence >>> codes. It would be nice one day to see these distinguished into >>> different constituent bits of information. >>> >>> -S >>> >>> p.s. I thought that IEA did not -require- the with column. >>> p.p.s Was the decision tree a step in this direction? >>> >>> On Mar 26, 2008, at 1:59 PM, Kara Dolinski wrote: >>> >>> >>>> Hi, >>>> >>>> The root of the problem, as I see it, is that we are mixing apples >>>> and oranges with evidence codes. All but one of the evidence codes >>>> indicate the type of experimental evidence for a GO annotation, but >>>> we have one oddball, IEA, that indicates not what the experiment is, >>>> but rather how the annotation was done. We keep running into >>>> variations of the same problem: we have some evidence (whether >>>> experimental or computational) for a GO annotation, but also want to >>>> indicate whether a curator looked at it or not. >>>> >>>> My proposed (albeit radical) solution: >>>> >>>> Remove IEA as an evidence code. >>>> >>>> Create a new property for GO annotations (or add a new type of >>>> qualifier) that captures how the annotation was done: manual or >>>> automated. >>>> >>>> Everything that is currently IEA would be given the 'automated' >>>> property/qualifier, and then would be given a new evidence code as >>>> appropriate (mostly a flavor of ISS I would assume). >>>> There can be a rule that all 'automated' annotations that are a >>>> flavor of ISS must have a 'with' value. >>>> >>>> This would allow us to use 'RCA' as appropriate, in some cases >>>> they'd be 'manual', in others, they'd be 'automated'. In Rama's >>>> case, the annotations would be 'RCA' with an 'automated' qualifier. >>>> >>>> I realize the issues involved in making such a drastic change, so I >>>> understand if we don't go there, but I do think that some approach >>>> such as the one above is the best representation of the information >>>> that we are trying to capture. >>>> >>>> Cheers, >>>> Kara >>>> >>>> On Mar 26, 2008, at 4:30 PM, Rama Balakrishnan wrote: >>>> >>>> >>>>> Hi All, >>>>> >>>>> SGD has come across couple of computationally predicted GO >>>>> annotation data sets for S. cerevisiae that we would like to add to >>>>> our database. The GO annotations from these data sets are >>>>> predictions based on multiple high-throughput data sets. RCA >>>>> evidence code came to our minds but according to the documentation, >>>>> the annotations all have to be manually reviewed by a curator to >>>>> use this evidence. There are several 100 annotations of this kind >>>>> and it is not feasible for us to manually review these annotations. >>>>> >>>>> Hence, we thought these annotations can be bulk loaded with IEA >>>>> evidence code. However, in the Jan 2007 (Cambridge) GO meeting, it >>>>> was decided that the 'with' column information has to be filled in >>>>> for all IEAs (else Mike's filtering script strips them out). But >>>>> these GO annotations being predictions based on multiple high- >>>>> throughput data sets, don't have any information for the with >>>>> column. So, we are left with no choice. >>>>> >>>>> Which evidence code do people think should be used for these kinds >>>>> of computational datasets when there is not an obvious "with"? >>>>> >>>>> Thanks for your input. >>>>> >>>>> >>>>> Rama >>>>> >>>>> >>>>> +-----o--o >>>>> --------------------------------------------------------------- >>>>> o-o Rama Balakrishnan Ph.D >>>>> O Senior Scientific Curator >>>>> o-o Saccharomyces Genome Database >>>>> o---o Stanford University >>>>> o----o Stanford, CA 94305-5120 >>>>> O-----O Ph: 650.725.8956 Fax: 650.723.7016 >>>>> 0--o email: rama at genome.stanford.edu >>>>> O Website: http://www.yeastgenome.org >>>>> o-o SGD Wiki- http://wiki.yeastgenome.org >>>>> +- o---o >>>>> ----------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Annotation mailing list >>>> Annotation at geneontology.org >>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>> >>> _______________________________________________ >>> Annotation mailing list >>> Annotation at geneontology.org >>> http://fafner.stanford.edu/mailman/listinfo/annotation >>> >>> >> _______________________________________________ >> Annotation mailing list >> Annotation at geneontology.org >> http://fafner.stanford.edu/mailman/listinfo/annotation >> > > _______________________________________________ > Annotation mailing list > Annotation at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/annotation > From val at sanger.ac.uk Wed Apr 2 03:14:27 2008 From: val at sanger.ac.uk (Valerie Wood) Date: Wed, 02 Apr 2008 11:14:27 +0100 Subject: [Annotation] evidence code advice In-Reply-To: References: <7EA8D90D-C57F-4F76-A060-3D28A470865D@genome.stanford.edu> <26C47A9C-74CD-4033-BE4E-086D6015713D@genomics.princeton.edu> <3CC10808-17BB-45BF-9963-B8075045E3B8@fruitfly.org> Message-ID: <47F35C83.1000907@sanger.ac.uk> Rama Balakrishnan wrote: >> Anyway, in light of that history, I think it would make most sense >> if the >> absolute requirement for the with column to be filled for IEA was >> dropped >> in the short term, so that we can use the IEA code for unreviewed >> annotations from RCA methods. >> > > I think it is important to require the 'with' column for IEAs to > prevent circular annotations. > The other option is to revert the RCA code to its original version > which required only the computational method to be reviewed and not > every annotation. > Hi Rama, I wonder about the value of RCA annotations as part of the body of GO annotations if they are not reviewed? This code usually provides the most tentative annotation, because they are generally 'function predictions' i.e. * Predictions based on computational analyses of large-scale experimental data sets * Predictions based on computational analyses that integrate datasets of several types, including experimental data (e.g. expression data, protein-protein interaction data, genetic interaction data, etc.), sequence data (e.g. promoter sequence, sequence-based structural predictions, etc.), or mathematical models they frequently seem to be i) Obviously wrong, in a way which would easily be spotted by a curator ii) Redundant with existing experimental, or other manually curated annotations, or even IEA annotations iii) Obvious annotation omissions (i.e when there is an ISS to transporter activity, but no ISS to transporter) Several 100 doesn't seem so many to manually review (at least to make sure they satisfy the criteria above). It would probably save time in the long run....(I'm also amazed there are so many good 'predictions' for S. cerevisiae which are unnannotated already?). For these reasons, pending any long term solution, I'd prefer RCA which were not reviewed by a curator to be classed as 'electronically inferred' because they are essentially "automated". My 2p Val > I also really like Kara's proposal and hopefully this will be > discussed at the upcoming GO meeting. > > Rama > > > >> In the long term, I think Kara's proposal is a better way to go. >> >> -Karen >> >> >> On Sun, 30 Mar 2008, Suzanna Lewis wrote: >> >> >>> This is very much along the lines that I've been trying to foster >>> (remember the meeting in Cambridge at Jesus College). The bit-code >>> (or >>> bar-code) for evidence codes, with each bit indicating one of these >>> flags for a different piece of information. Not only automated/ >>> manual, >>> but also large-scale/small-scale, and other characteristics of the >>> evidence. >>> >>> As Kara (and many others) have said, there is quite a bit of over- >>> loading of multiple pieces of information in the current evidence >>> codes. It would be nice one day to see these distinguished into >>> different constituent bits of information. >>> >>> -S >>> >>> p.s. I thought that IEA did not -require- the with column. >>> p.p.s Was the decision tree a step in this direction? >>> >>> On Mar 26, 2008, at 1:59 PM, Kara Dolinski wrote: >>> >>> >>>> Hi, >>>> >>>> The root of the problem, as I see it, is that we are mixing apples >>>> and oranges with evidence codes. All but one of the evidence codes >>>> indicate the type of experimental evidence for a GO annotation, but >>>> we have one oddball, IEA, that indicates not what the experiment is, >>>> but rather how the annotation was done. We keep running into >>>> variations of the same problem: we have some evidence (whether >>>> experimental or computational) for a GO annotation, but also want to >>>> indicate whether a curator looked at it or not. >>>> >>>> My proposed (albeit radical) solution: >>>> >>>> Remove IEA as an evidence code. >>>> >>>> Create a new property for GO annotations (or add a new type of >>>> qualifier) that captures how the annotation was done: manual or >>>> automated. >>>> >>>> Everything that is currently IEA would be given the 'automated' >>>> property/qualifier, and then would be given a new evidence code as >>>> appropriate (mostly a flavor of ISS I would assume). >>>> There can be a rule that all 'automated' annotations that are a >>>> flavor of ISS must have a 'with' value. >>>> >>>> This would allow us to use 'RCA' as appropriate, in some cases >>>> they'd be 'manual', in others, they'd be 'automated'. In Rama's >>>> case, the annotations would be 'RCA' with an 'automated' qualifier. >>>> >>>> I realize the issues involved in making such a drastic change, so I >>>> understand if we don't go there, but I do think that some approach >>>> such as the one above is the best representation of the information >>>> that we are trying to capture. >>>> >>>> Cheers, >>>> Kara >>>> >>>> On Mar 26, 2008, at 4:30 PM, Rama Balakrishnan wrote: >>>> >>>> >>>>> Hi All, >>>>> >>>>> SGD has come across couple of computationally predicted GO >>>>> annotation data sets for S. cerevisiae that we would like to add to >>>>> our database. The GO annotations from these data sets are >>>>> predictions based on multiple high-throughput data sets. RCA >>>>> evidence code came to our minds but according to the documentation, >>>>> the annotations all have to be manually reviewed by a curator to >>>>> use this evidence. There are several 100 annotations of this kind >>>>> and it is not feasible for us to manually review these annotations. >>>>> >>>>> Hence, we thought these annotations can be bulk loaded with IEA >>>>> evidence code. However, in the Jan 2007 (Cambridge) GO meeting, it >>>>> was decided that the 'with' column information has to be filled in >>>>> for all IEAs (else Mike's filtering script strips them out). But >>>>> these GO annotations being predictions based on multiple high- >>>>> throughput data sets, don't have any information for the with >>>>> column. So, we are left with no choice. >>>>> >>>>> Which evidence code do people think should be used for these kinds >>>>> of computational datasets when there is not an obvious "with"? >>>>> >>>>> Thanks for your input. >>>>> >>>>> >>>>> Rama >>>>> >>>>> >>>>> +-----o--o >>>>> --------------------------------------------------------------- >>>>> o-o Rama Balakrishnan Ph.D >>>>> O Senior Scientific Curator >>>>> o-o Saccharomyces Genome Database >>>>> o---o Stanford University >>>>> o----o Stanford, CA 94305-5120 >>>>> O-----O Ph: 650.725.8956 Fax: 650.723.7016 >>>>> 0--o email: rama at genome.stanford.edu >>>>> O Website: http://www.yeastgenome.org >>>>> o-o SGD Wiki- http://wiki.yeastgenome.org >>>>> +- o---o >>>>> ----------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Annotation mailing list >>>> Annotation at geneontology.org >>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>> >>> _______________________________________________ >>> Annotation mailing list >>> Annotation at geneontology.org >>> http://fafner.stanford.edu/mailman/listinfo/annotation >>> >>> >> _______________________________________________ >> Annotation mailing list >> Annotation at geneontology.org >> http://fafner.stanford.edu/mailman/listinfo/annotation >> > > _______________________________________________ > Annotation mailing list > Annotation at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/annotation > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cherry at stanford.edu Wed Apr 2 14:15:03 2008 From: cherry at stanford.edu (Mike Cherry) Date: Wed, 2 Apr 2008 14:15:03 -0700 Subject: [Annotation] evidence code advice References: <0FB252E3-24AE-48BF-A2C7-F4BCCEA73427@stanford.edu> Message-ID: <208AC9B5-008A-4034-9358-5E652AF36F5C@stanford.edu> We need an evidence code for the data Rama mentioned. As IEA annotations have cardinality of 1 for the WITH field (this was defined at the Jesus College GOC meeting) and RCA seems to require each association to be curated. We have a catch-22. I too agree that Kara's proposal would be useful, but gets us into some bigger changes. I also agree with Suzanna that a solution would be to remove the requirement that every association be curated for RCA. This is not perfect but could be a temporary solution. There likely needs to be a curated and non-curated form of a RCA-like evidence code. On curation of RCA: The RCA documentation lists two examples, the Samanta and the Troyanskaya papers. In those papers only a slice of their predictions were published to make their case for the methods used. They did not include all their significant predictions from their databases. We curated the slice published but because of the curated requirement did not pull out other significant results from their datasets. We now have other papers like those with many more potential annotations reported in the paper. Also we still have potential annotations that could be added from the Troyanskaya database (BioPixie) that are continually refined and updated. Ability to curate all these annotations: The two papers mentioned by Rama include several hundred, not just a 100, assertions being made from the combination of experimental results. We disagree that these annotations are often wrong. The combinations of all these data removes the questionable results. These methods are generally reviewed for publication to allow the specificity and recall to be determined. SGD has been involved in some of these analyses by reviewing a large number of their results -- but not all. These annotations are generally very useful in our view. For us there are too many of these annotations to curate. These are assertions that are made by an analysis of IGI, IPI, IEP, IDA and sometimes ISM evidence to make new interesting and statistically significant associations. There is no literature for many of the specific associations and would thus not be possible to curate. These associations often identify errors in the literature and plus add new associations that have not been reported, but are supported by the combined data. These are not based on just HTP data, the methods are typically trained using all existing non-IEA data from SGD. We use the results from these papers to identify problems with the literature annotations, but we are not able to review each of the assertions from these new papers. I am interested to learn how Gramene (60,938 - 75% of all associations), TAIR (23,486 - 22%), MGI (12,999 - 8%), RGD (5,089 - 2%) and PseudoCAP (2,572 - 35%) use RCA -- thats the number of RCA and the percent of total associations provided by the project. If everyone has curated all those annotations then more power to them and SGD just needs to figure out how to do more. We don't believe any of the current evidence codes as defined are appropriate for the associations we would like to include. IEA requires the WITH field and RCA requires every annotation to be curated. So what should we do? -Mike On Apr 2, 2008, at 3:14 AM, Valerie Wood wrote: > Rama Balakrishnan wrote: >>> Anyway, in light of that history, I think it would make most sense >>> if the >>> absolute requirement for the with column to be filled for IEA was >>> dropped >>> in the short term, so that we can use the IEA code for unreviewed >>> annotations from RCA methods. >>> >> >> I think it is important to require the 'with' column for IEAs to >> prevent circular annotations. >> The other option is to revert the RCA code to its original version >> which required only the computational method to be reviewed and not >> every annotation. >> > > > Hi Rama, > > I wonder about the value of RCA annotations as part of the body of GO > annotations if they are not reviewed? > This code usually provides the most tentative annotation, because > they > are generally 'function predictions' > > i.e. > > * Predictions based on computational analyses of large-scale > experimental data sets > * Predictions based on computational analyses that integrate > datasets of several types, including experimental data (e.g. > expression data, protein-protein interaction data, genetic > interaction data, etc.), sequence data (e.g. promoter sequence, > sequence-based structural predictions, etc.), or mathematical > models > > they frequently seem to be > > i) Obviously wrong, in a way which would easily be spotted by a > curator > ii) Redundant with existing experimental, or other manually curated > annotations, or even IEA annotations > iii) Obvious annotation omissions (i.e when there is an ISS to > transporter activity, but no ISS to transporter) > > Several 100 doesn't seem so many to manually review (at least to make > sure they satisfy the criteria above). It would probably save time in > the long run....(I'm also amazed there are so many good 'predictions' > for S. cerevisiae which are unnannotated already?). > > For these reasons, pending any long term solution, I'd prefer RCA > which > were not reviewed by a curator to be classed as 'electronically > inferred' because they are essentially "automated". > > My 2p > > Val > > > On Sun, 30 Mar 2008, Suzanna Lewis wrote: > > >> This is very much along the lines that I've been trying to foster >> (remember the meeting in Cambridge at Jesus College). The bit-code >> (or >> bar-code) for evidence codes, with each bit indicating one of these >> flags for a different piece of information. Not only automated/ >> manual, >> but also large-scale/small-scale, and other characteristics of the >> evidence. >> >> As Kara (and many others) have said, there is quite a bit of over- >> loading of multiple pieces of information in the current evidence >> codes. It would be nice one day to see these distinguished into >> different constituent bits of information. >> >> -S >> >> p.s. I thought that IEA did not -require- the with column. >> p.p.s Was the decision tree a step in this direction? >> >> On Mar 26, 2008, at 1:59 PM, Kara Dolinski wrote: >> >> >>> Hi, >>> >>> The root of the problem, as I see it, is that we are mixing apples >>> and oranges with evidence codes. All but one of the evidence codes >>> indicate the type of experimental evidence for a GO annotation, but >>> we have one oddball, IEA, that indicates not what the experiment is, >>> but rather how the annotation was done. We keep running into >>> variations of the same problem: we have some evidence (whether >>> experimental or computational) for a GO annotation, but also want to >>> indicate whether a curator looked at it or not. >>> >>> My proposed (albeit radical) solution: >>> >>> Remove IEA as an evidence code. >>> >>> Create a new property for GO annotations (or add a new type of >>> qualifier) that captures how the annotation was done: manual or >>> automated. >>> >>> Everything that is currently IEA would be given the 'automated' >>> property/qualifier, and then would be given a new evidence code as >>> appropriate (mostly a flavor of ISS I would assume). >>> There can be a rule that all 'automated' annotations that are a >>> flavor of ISS must have a 'with' value. >>> >>> This would allow us to use 'RCA' as appropriate, in some cases >>> they'd be 'manual', in others, they'd be 'automated'. In Rama's >>> case, the annotations would be 'RCA' with an 'automated' qualifier. >>> >>> I realize the issues involved in making such a drastic change, so I >>> understand if we don't go there, but I do think that some approach >>> such as the one above is the best representation of the information >>> that we are trying to capture. >>> >>> Cheers, >>> Kara >>> >>> On Mar 26, 2008, at 4:30 PM, Rama Balakrishnan wrote: >>> >>> >>>> Hi All, >>>> >>>> SGD has come across couple of computationally predicted GO >>>> annotation data sets for S. cerevisiae that we would like to add to >>>> our database. The GO annotations from these data sets are >>>> predictions based on multiple high-throughput data sets. RCA >>>> evidence code came to our minds but according to the documentation, >>>> the annotations all have to be manually reviewed by a curator to >>>> use this evidence. There are several 100 annotations of this kind >>>> and it is not feasible for us to manually review these annotations. >>>> >>>> Hence, we thought these annotations can be bulk loaded with IEA >>>> evidence code. However, in the Jan 2007 (Cambridge) GO meeting, it >>>> was decided that the 'with' column information has to be filled in >>>> for all IEAs (else Mike's filtering script strips them out). But >>>> these GO annotations being predictions based on multiple high- >>>> throughput data sets, don't have any information for the with >>>> column. So, we are left with no choice. >>>> >>>> Which evidence code do people think should be used for these kinds >>>> of computational datasets when there is not an obvious "with"? >>>> >>>> Thanks for your input. >>>> >>>> Rama >>>> > _______________________________________________ > Annotation mailing list > Annotation at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/annotation > From cherry at stanford.edu Wed Apr 2 14:35:43 2008 From: cherry at stanford.edu (Mike Cherry) Date: Wed, 2 Apr 2008 14:35:43 -0700 Subject: [Annotation] Scheduling 2nd Curator Discussion In-Reply-To: <4026A2A4-E83A-4A4A-BBBF-92800623F730@stanford.edu> References: <2D1A025E-D2A7-4DE2-86FD-20B07E258209@stanford.edu> <4026A2A4-E83A-4A4A-BBBF-92800623F730@stanford.edu> Message-ID: There have been 29 responses to the doodle and so I'd like to schedule the next meeting. The winning time is: April 14th from 12:30-1:30P EDT. In Pacific time thats 9:30-10:30A, and in the UK its 5:30-6:30P. We'll have some of the future meetings to be earlier for those in the UK. The agenda has not been defined yet. There is still time to send in your suggestions. The dial in numbers will be the same as last time. US: 866-365-4406 UK: 08004960580 access code: 7237541 -Mike On Apr 1, 2008, at 10:56 AM, Mike Cherry wrote: > Sorry the correct doodle hyperlink is: > > http://www.doodle.ch/vibfa5m58hgb3gma > > -Mike > > On Apr 1, 2008, at 10:48 AM, Mike Cherry wrote: >> Hello, >> >> This is to schedule a Curator Discussion for the week of April 14th. >> >> Potential topics are the discussion of a selected paper, I believe >> WormBase or MGI were potentially thinking of proposing a paper. Two >> other topics that I am interested in discussing have to do with the >> communication of the various projects with their communities. Such >> as >> what information is put on home pages, is a newsletters or wiki, what >> is announced and what is announced. The later topic would be the >> beginning of a discussion how the biocuration group conduct business. >> I think this is a good place to start before we get into annotation >> procedures and requirements. >> >> I'm calling these calls "Curator Discussions", anyone have a better >> name? >> >> -Mike >> >> _______________________________________________ >> Annotation mailing list >> Annotation at geneontology.org >> http://fafner.stanford.edu/mailman/listinfo/annotation > From rama at genome.stanford.edu Wed Apr 2 14:40:11 2008 From: rama at genome.stanford.edu (Rama Balakrishnan) Date: Wed, 2 Apr 2008 14:40:11 -0700 Subject: [Annotation] evidence code advice In-Reply-To: <47F2D977.5050904@informatics.jax.org> References: <7EA8D90D-C57F-4F76-A060-3D28A470865D@genome.stanford.edu> <26C47A9C-74CD-4033-BE4E-086D6015713D@genomics.princeton.edu> <3CC10808-17BB-45BF-9963-B8075045E3B8@fruitfly.org> <47F2D977.5050904@informatics.jax.org> Message-ID: <6DBBBF0D-8570-4ED2-90F2-323AD906F71F@genome.stanford.edu> Added to the agenda. Rama On Apr 1, 2008, at 5:55 PM, Judith Blake wrote: > Rama, > If this hasn't been done, would you please add to the wiki agenda > list with a pointer to a page with Kara's (and others?) emails... > > Thanks very much > judy > > Rama Balakrishnan wrote: >>> Anyway, in light of that history, I think it would make most >>> sense if the >>> absolute requirement for the with column to be filled for IEA was >>> dropped >>> in the short term, so that we can use the IEA code for unreviewed >>> annotations from RCA methods. >>> >> >> I think it is important to require the 'with' column for IEAs to >> prevent circular annotations. >> The other option is to revert the RCA code to its original version >> which required only the computational method to be reviewed and >> not every annotation. >> >> I also really like Kara's proposal and hopefully this will be >> discussed at the upcoming GO meeting. >> >> Rama >> >> >> >>> In the long term, I think Kara's proposal is a better way to go. >>> >>> -Karen >>> >>> >>> On Sun, 30 Mar 2008, Suzanna Lewis wrote: >>> >>> >>>> This is very much along the lines that I've been trying to foster >>>> (remember the meeting in Cambridge at Jesus College). The bit- >>>> code (or >>>> bar-code) for evidence codes, with each bit indicating one of these >>>> flags for a different piece of information. Not only automated/ >>>> manual, >>>> but also large-scale/small-scale, and other characteristics of the >>>> evidence. >>>> >>>> As Kara (and many others) have said, there is quite a bit of over- >>>> loading of multiple pieces of information in the current evidence >>>> codes. It would be nice one day to see these distinguished into >>>> different constituent bits of information. >>>> >>>> -S >>>> >>>> p.s. I thought that IEA did not -require- the with column. >>>> p.p.s Was the decision tree a step in this direction? >>>> >>>> On Mar 26, 2008, at 1:59 PM, Kara Dolinski wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> The root of the problem, as I see it, is that we are mixing apples >>>>> and oranges with evidence codes. All but one of the evidence >>>>> codes >>>>> indicate the type of experimental evidence for a GO annotation, >>>>> but >>>>> we have one oddball, IEA, that indicates not what the experiment >>>>> is, >>>>> but rather how the annotation was done. We keep running into >>>>> variations of the same problem: we have some evidence (whether >>>>> experimental or computational) for a GO annotation, but also >>>>> want to >>>>> indicate whether a curator looked at it or not. >>>>> >>>>> My proposed (albeit radical) solution: >>>>> >>>>> Remove IEA as an evidence code. >>>>> >>>>> Create a new property for GO annotations (or add a new type of >>>>> qualifier) that captures how the annotation was done: manual or >>>>> automated. >>>>> >>>>> Everything that is currently IEA would be given the 'automated' >>>>> property/qualifier, and then would be given a new evidence code as >>>>> appropriate (mostly a flavor of ISS I would assume). >>>>> There can be a rule that all 'automated' annotations that are a >>>>> flavor of ISS must have a 'with' value. >>>>> >>>>> This would allow us to use 'RCA' as appropriate, in some cases >>>>> they'd be 'manual', in others, they'd be 'automated'. In Rama's >>>>> case, the annotations would be 'RCA' with an 'automated' >>>>> qualifier. >>>>> >>>>> I realize the issues involved in making such a drastic change, >>>>> so I >>>>> understand if we don't go there, but I do think that some approach >>>>> such as the one above is the best representation of the >>>>> information >>>>> that we are trying to capture. >>>>> >>>>> Cheers, >>>>> Kara >>>>> >>>>> On Mar 26, 2008, at 4:30 PM, Rama Balakrishnan wrote: >>>>> >>>>> >>>>>> Hi All, >>>>>> >>>>>> SGD has come across couple of computationally predicted GO >>>>>> annotation data sets for S. cerevisiae that we would like to >>>>>> add to >>>>>> our database. The GO annotations from these data sets are >>>>>> predictions based on multiple high-throughput data sets. RCA >>>>>> evidence code came to our minds but according to the >>>>>> documentation, >>>>>> the annotations all have to be manually reviewed by a curator to >>>>>> use this evidence. There are several 100 annotations of this kind >>>>>> and it is not feasible for us to manually review these >>>>>> annotations. >>>>>> >>>>>> Hence, we thought these annotations can be bulk loaded with IEA >>>>>> evidence code. However, in the Jan 2007 (Cambridge) GO meeting, >>>>>> it >>>>>> was decided that the 'with' column information has to be filled >>>>>> in >>>>>> for all IEAs (else Mike's filtering script strips them out). But >>>>>> these GO annotations being predictions based on multiple high- >>>>>> throughput data sets, don't have any information for the with >>>>>> column. So, we are left with no choice. >>>>>> >>>>>> Which evidence code do people think should be used for these >>>>>> kinds >>>>>> of computational datasets when there is not an obvious "with"? >>>>>> >>>>>> Thanks for your input. >>>>>> >>>>>> >>>>>> Rama >>>>>> >>>>>> >>>>>> +-----o--o >>>>>> --------------------------------------------------------------- >>>>>> o-o Rama Balakrishnan Ph.D >>>>>> O Senior Scientific Curator >>>>>> o-o Saccharomyces Genome Database >>>>>> o---o Stanford University >>>>>> o----o Stanford, CA 94305-5120 >>>>>> O-----O Ph: 650.725.8956 Fax: 650.723.7016 >>>>>> 0--o email: rama at genome.stanford.edu >>>>>> O Website: http://www.yeastgenome.org >>>>>> o-o SGD Wiki- http://wiki.yeastgenome.org >>>>>> +- o---o >>>>>> ----------------------------------------------------------------- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Annotation mailing list >>>>> Annotation at geneontology.org >>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>> >>>> _______________________________________________ >>>> Annotation mailing list >>>> Annotation at geneontology.org >>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>> >>>> >>> _______________________________________________ >>> Annotation mailing list >>> Annotation at geneontology.org >>> http://fafner.stanford.edu/mailman/listinfo/annotation >>> >> >> _______________________________________________ >> Annotation mailing list >> Annotation at geneontology.org >> http://fafner.stanford.edu/mailman/listinfo/annotation >> > From dph at informatics.jax.org Wed Apr 2 18:00:58 2008 From: dph at informatics.jax.org (David Hill) Date: Wed, 02 Apr 2008 21:00:58 -0400 Subject: [Annotation] evidence code advice In-Reply-To: <208AC9B5-008A-4034-9358-5E652AF36F5C@stanford.edu> References: <0FB252E3-24AE-48BF-A2C7-F4BCCEA73427@stanford.edu> <208AC9B5-008A-4034-9358-5E652AF36F5C@stanford.edu> Message-ID: <47F42C4A.2070506@informatics.jax.org> Mike, The vast majority of our RCA annotations at MGI come from the FANTOM mouse cDNA annotation project. For this project, a suite of analysis was done on cDNA clones and curators were given a set of GO annotations that they could either accept or reject as they were annotating cDNAs. This was all done during 2 huge jamborees at the FANTOM meetings. So in this case, the original data was a suite of computational analyses and then curators used their judgment to determine if they thought the GO predictions were acceptable. I think Harold has also curated RCA annotations from a limited number of papers that reported on large-scale experiments where there were a few hundred annotations. In your case, RCA seems to be the most reasonable evidence code to use. I actually remember some discussion at a GOC meeting a while back as to whether every annotation had to be reviewed for RCA or whether the analysis needed to be carefully reviewed to put more confidence in the annotations than just an IEA. I remember that the issue was even IEA methods are reviewed, so where is the cut off. I don't think we ever came to a firm conclusion, but I remember it was discussed. David Mike Cherry wrote: > We need an evidence code for the data Rama mentioned. As IEA > annotations have cardinality of 1 for the WITH field (this was defined > at the Jesus College GOC meeting) and RCA seems to require each > association to be curated. We have a catch-22. I too agree that > Kara's proposal would be useful, but gets us into some bigger > changes. I also agree with Suzanna that a solution would be to remove > the requirement that every association be curated for RCA. This is > not perfect but could be a temporary solution. There likely needs to > be a curated and non-curated form of a RCA-like evidence code. > > On curation of RCA: > > The RCA documentation lists two examples, the Samanta and the > Troyanskaya papers. In those papers only a slice of their predictions > were published to make their case for the methods used. They did not > include all their significant predictions from their databases. We > curated the slice published but because of the curated requirement did > not pull out other significant results from their datasets. We now > have other papers like those with many more potential annotations > reported in the paper. Also we still have potential annotations that > could be added from the Troyanskaya database (BioPixie) that are > continually refined and updated. > > Ability to curate all these annotations: > > The two papers mentioned by Rama include several hundred, not just a > 100, assertions being made from the combination of experimental > results. We disagree that these annotations are often wrong. The > combinations of all these data removes the questionable results. > These methods are generally reviewed for publication to allow the > specificity and recall to be determined. SGD has been involved in > some of these analyses by reviewing a large number of their results -- > but not all. These annotations are generally very useful in our view. > > For us there are too many of these annotations to curate. These are > assertions that are made by an analysis of IGI, IPI, IEP, IDA and > sometimes ISM evidence to make new interesting and statistically > significant associations. There is no literature for many of the > specific associations and would thus not be possible to curate. These > associations often identify errors in the literature and plus add new > associations that have not been reported, but are supported by the > combined data. These are not based on just HTP data, the methods are > typically trained using all existing non-IEA data from SGD. We use > the results from these papers to identify problems with the literature > annotations, but we are not able to review each of the assertions from > these new papers. > > I am interested to learn how Gramene (60,938 - 75% of all > associations), TAIR (23,486 - 22%), MGI (12,999 - 8%), RGD (5,089 - > 2%) and PseudoCAP (2,572 - 35%) use RCA -- thats the number of RCA and > the percent of total associations provided by the project. If > everyone has curated all those annotations then more power to them and > SGD just needs to figure out how to do more. > > We don't believe any of the current evidence codes as defined are > appropriate for the associations we would like to include. IEA > requires the WITH field and RCA requires every annotation to be > curated. So what should we do? > > -Mike > > > On Apr 2, 2008, at 3:14 AM, Valerie Wood wrote: > >> Rama Balakrishnan wrote: >> >>>> Anyway, in light of that history, I think it would make most sense >>>> if the >>>> absolute requirement for the with column to be filled for IEA was >>>> dropped >>>> in the short term, so that we can use the IEA code for unreviewed >>>> annotations from RCA methods. >>>> >>>> >>> I think it is important to require the 'with' column for IEAs to >>> prevent circular annotations. >>> The other option is to revert the RCA code to its original version >>> which required only the computational method to be reviewed and not >>> every annotation. >>> >>> >> Hi Rama, >> >> I wonder about the value of RCA annotations as part of the body of GO >> annotations if they are not reviewed? >> This code usually provides the most tentative annotation, because >> they >> are generally 'function predictions' >> >> i.e. >> >> * Predictions based on computational analyses of large-scale >> experimental data sets >> * Predictions based on computational analyses that integrate >> datasets of several types, including experimental data (e.g. >> expression data, protein-protein interaction data, genetic >> interaction data, etc.), sequence data (e.g. promoter sequence, >> sequence-based structural predictions, etc.), or mathematical >> models >> >> they frequently seem to be >> >> i) Obviously wrong, in a way which would easily be spotted by a >> curator >> ii) Redundant with existing experimental, or other manually curated >> annotations, or even IEA annotations >> iii) Obvious annotation omissions (i.e when there is an ISS to >> transporter activity, but no ISS to transporter) >> >> Several 100 doesn't seem so many to manually review (at least to make >> sure they satisfy the criteria above). It would probably save time in >> the long run....(I'm also amazed there are so many good 'predictions' >> for S. cerevisiae which are unnannotated already?). >> >> For these reasons, pending any long term solution, I'd prefer RCA >> which >> were not reviewed by a curator to be classed as 'electronically >> inferred' because they are essentially "automated". >> >> My 2p >> >> Val >> >> >> On Sun, 30 Mar 2008, Suzanna Lewis wrote: >> >> >> >>> This is very much along the lines that I've been trying to foster >>> (remember the meeting in Cambridge at Jesus College). The bit-code >>> (or >>> bar-code) for evidence codes, with each bit indicating one of these >>> flags for a different piece of information. Not only automated/ >>> manual, >>> but also large-scale/small-scale, and other characteristics of the >>> evidence. >>> >>> As Kara (and many others) have said, there is quite a bit of over- >>> loading of multiple pieces of information in the current evidence >>> codes. It would be nice one day to see these distinguished into >>> different constituent bits of information. >>> >>> -S >>> >>> p.s. I thought that IEA did not -require- the with column. >>> p.p.s Was the decision tree a step in this direction? >>> >>> On Mar 26, 2008, at 1:59 PM, Kara Dolinski wrote: >>> >>> >>> >>>> Hi, >>>> >>>> The root of the problem, as I see it, is that we are mixing apples >>>> and oranges with evidence codes. All but one of the evidence codes >>>> indicate the type of experimental evidence for a GO annotation, but >>>> we have one oddball, IEA, that indicates not what the experiment is, >>>> but rather how the annotation was done. We keep running into >>>> variations of the same problem: we have some evidence (whether >>>> experimental or computational) for a GO annotation, but also want to >>>> indicate whether a curator looked at it or not. >>>> >>>> My proposed (albeit radical) solution: >>>> >>>> Remove IEA as an evidence code. >>>> >>>> Create a new property for GO annotations (or add a new type of >>>> qualifier) that captures how the annotation was done: manual or >>>> automated. >>>> >>>> Everything that is currently IEA would be given the 'automated' >>>> property/qualifier, and then would be given a new evidence code as >>>> appropriate (mostly a flavor of ISS I would assume). >>>> There can be a rule that all 'automated' annotations that are a >>>> flavor of ISS must have a 'with' value. >>>> >>>> This would allow us to use 'RCA' as appropriate, in some cases >>>> they'd be 'manual', in others, they'd be 'automated'. In Rama's >>>> case, the annotations would be 'RCA' with an 'automated' qualifier. >>>> >>>> I realize the issues involved in making such a drastic change, so I >>>> understand if we don't go there, but I do think that some approach >>>> such as the one above is the best representation of the information >>>> that we are trying to capture. >>>> >>>> Cheers, >>>> Kara >>>> >>>> On Mar 26, 2008, at 4:30 PM, Rama Balakrishnan wrote: >>>> >>>> >>>> >>>>> Hi All, >>>>> >>>>> SGD has come across couple of computationally predicted GO >>>>> annotation data sets for S. cerevisiae that we would like to add to >>>>> our database. The GO annotations from these data sets are >>>>> predictions based on multiple high-throughput data sets. RCA >>>>> evidence code came to our minds but according to the documentation, >>>>> the annotations all have to be manually reviewed by a curator to >>>>> use this evidence. There are several 100 annotations of this kind >>>>> and it is not feasible for us to manually review these annotations. >>>>> >>>>> Hence, we thought these annotations can be bulk loaded with IEA >>>>> evidence code. However, in the Jan 2007 (Cambridge) GO meeting, it >>>>> was decided that the 'with' column information has to be filled in >>>>> for all IEAs (else Mike's filtering script strips them out). But >>>>> these GO annotations being predictions based on multiple high- >>>>> throughput data sets, don't have any information for the with >>>>> column. So, we are left with no choice. >>>>> >>>>> Which evidence code do people think should be used for these kinds >>>>> of computational datasets when there is not an obvious "with"? >>>>> >>>>> Thanks for your input. >>>>> >>>>> Rama >>>>> >>>>> >> _______________________________________________ >> Annotation mailing list >> Annotation at geneontology.org >> http://fafner.stanford.edu/mailman/listinfo/annotation >> >> > > > _______________________________________________ > Annotation mailing list > Annotation at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/annotation > From tberardi at acoma.stanford.edu Wed Apr 2 21:21:12 2008 From: tberardi at acoma.stanford.edu (Tanya Berardini) Date: Wed, 2 Apr 2008 21:21:12 -0700 Subject: [Annotation] evidence code advice In-Reply-To: <208AC9B5-008A-4034-9358-5E652AF36F5C@stanford.edu> References: <0FB252E3-24AE-48BF-A2C7-F4BCCEA73427@stanford.edu> <208AC9B5-008A-4034-9358-5E652AF36F5C@stanford.edu> Message-ID: <8e22ab960804022121s7c57d29am9b164ec1121f997@mail.gmail.com> Hi Mike, I am interested to learn how > TAIR (23,486 - 22%), > use RCA -- thats the number of RCA and > the percent of total associations provided by the project. > Most of the TAIR RCA annotations come from the Arabidopsis thaliana annotations that we integrated from TIGR. They had a large fraction of annotations that used the evidence code ISS. However, after the RCA evidence code was adopted, and we discussed it with Linda Hannick, we thought that the RCA evidence code would be more appropriate than ISS and so we moved all annotations with this type of analysis from ISS to RCA. A description of their assignment method is here: http://arabidopsis.org/servlets/TairObject?type=communication&id=501714663 There are probably a handful of TAIR curated RCA annotations that derive from papers. Tanya -------------- next part -------------- An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20080402/38d524c2/attachment.html From cherry at stanford.edu Mon Apr 7 10:01:26 2008 From: cherry at stanford.edu (Mike Cherry) Date: Mon, 7 Apr 2008 10:01:26 -0700 Subject: [Annotation] Agenda for Curator Discussion April 14th Message-ID: <0C24FE2A-B5C2-4823-898A-F1ACA9A7A308@stanford.edu> The next Curator Discussion teleconference will be April 14th from 12:30-1:30P EDT. The dial in information is the same as last time. US: 866-365-4406 UK: 08004960580 access code: 7237541 For the April 14th call we will discuss the following. Please think about how your procedures. 1. How do projects interact with one another, how much interaction is practical, and how can interactions be improved. 2. Standards for announcements and updates to the community. Methods of announcements: web sites, email, wiki. What do you announce and your requirements for those announcements. For example, do you have standards required announces: regular updates, special changes to sites, downtime, ... 3. Discuss a paper provided by Andrei Petcherski at WormBase, Caltech. This will allow a discussion of data types and the process of abstracting them into a database. We don't necessarily want a discussion of any particular annotation system rather this is to allow a discussion of the general process of curation. A bit on this paper from Andrei, "I have marked up the data types we are interested in (highlights and sticky notes). I have also attached a snap-shot of what actually gets emailed to the curators extracting the data and a cumulative summary of ~4000 papers that went through our first-pass (the last two pages in the same pdf). I am not sure if this is the best paper to look at for the conference call, but it does have a decent number of data types." You can retrieve the paper from the following URL. http://geneontology.org/meeting/curators/Chuang-2007-WB-firstpass.pdf Thanks to Cindy Krieger and Julie Park of SGD for providing minutes of the March 20th Curator Discussion. You can retrieve the minutes from this URL. http://geneontology.org/meeting/curators/Chuang-2007-WB-firstpass.pdf -Mike P.S. I thought this might be of interest. From Genome Technology Online, April 7, 2008, "OpenHelix reports that NCBI has cut their outreach staff and canceled training seminars, due to funding concerns. On their blog, OpenHelix posts a sample notification that people have received that says that all outreach programs, including NCBI's field guide, mini-courses, structures, and PubChem, have been terminated." From pfey at northwestern.edu Wed Apr 9 10:56:19 2008 From: pfey at northwestern.edu (Petra Fey) Date: Wed, 9 Apr 2008 12:56:19 -0500 Subject: [Annotation] Biocurator Society Survey reminder Message-ID: <1B68E667-E5E1-44A1-BF00-93ECD7647BA8@northwestern.edu> Dear Biocurators, this is a reminder to please fill out the short survey regarding the formation of a Biocurator Society if you have not yet done so. http://www.surveymonkey.com/s.aspx?sm=V3dhsWNvWMo4Zus9FGjSDg_3d_3d Also, there is now a public website that contains a draft of the mission statement for the Biocurator Society. http://biocurator.org/BiocuratorSociety.html Please feel free to comment on the draft by writing to the biocurator email list. Thanks for your interest and participation. Petra on behalf of the planning committee -------------- next part -------------- An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20080409/b090f23a/attachment.html From cherry at stanford.edu Mon Apr 14 10:51:05 2008 From: cherry at stanford.edu (Mike Cherry) Date: Mon, 14 Apr 2008 10:51:05 -0700 Subject: [Annotation] recording of April 14th Curator Discussion Message-ID: Thank you all for participating in todays call. Particularly to Andrei Petcherski at Wormbase for his work to make the annotated PDF, and for getting our discussion started. The recording for today's call can be downloaded at the following link. You can put it on your iPod or listen to it with any MP3 player. http://www.geneontology.org/meeting/curators/Curators-20080414.mp3 I'll have more information in the near future about the idea of a wiki for our discussions. -Mike From cherry at stanford.edu Mon Apr 14 10:52:24 2008 From: cherry at stanford.edu (Mike Cherry) Date: Mon, 14 Apr 2008 10:52:24 -0700 Subject: [Annotation] Fwd: Journal Submission Form References: <4803976C.2080703@sanger.ac.uk> Message-ID: <060F04B4-37E2-4E4D-914B-E5B87ADDE5A7@stanford.edu> Begin forwarded message: > From: Mary Ann Tuli > Date: April 14, 2008 10:42:04 AM PDT > To: biocurator at tairgroup.org > Subject: [Biocurator] Journal Submission Form > > Hello, > I needed to leave the meeting (it's quite late here in the UK!), but I > have a couple of thoughts. > > The policy in which authors are required to have an ISND accession > number before acceptance of their paper is called the mandatory > submission policy. I am not sure how many journals have this policy > but > it might be a good place to tie in the form we are discussing with > this > requirement. However, based on my understanding of the reluctance of > some journals to accept the mandatory submission policy, any form we > have would I think need to be capturing only the very basic but > essential information. > > Someone mentioned that it would be useful to mention that any paper > including this essential information would be prioritised for > curation. > From my (WormBase) point of view, this is certainly not the case. Data > which is submitted using our in-house submission forms (crude as > some of > them are) gets the highest priority. > > Thanks, > > Mary Ann > > > > ~ Mary Ann Tuli > > ~ WormBase Group: www.wormbase.org > ~ > ~ The Morgan Building, Sanger Institute, > ~ The Wellcome Trust Genome Campus, > ~ Hinxton, Cambridge, CB10 1HH, UK. > > ~ Tel: +44 (0)1223 496885 > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Biocurator mailing list > Biocurator at tairgroup.org > http://mailman.tairgroup.org/mailman/listinfo/biocurator From shimoyama at mcw.edu Mon Apr 14 15:09:48 2008 From: shimoyama at mcw.edu (Shimoyama, Mary) Date: Mon, 14 Apr 2008 17:09:48 -0500 Subject: [Annotation] [Biocurator] Journal Submission Form In-Reply-To: <4803976C.2080703@sanger.ac.uk> References: <4803976C.2080703@sanger.ac.uk> Message-ID: <1448A38A42714048B9C53E473E13CCF00150C21A@davis.hmgc.mcw.edu> It would also be true for RGD that directly submitted data is prioritized for curation - this tends to be QTLs and strains rather than genes, though. Mary Shimoyama Program Manager Rat Genome Database Human and Molecular Genetics Center Medical College of Wisconsin shimoyama at mcw.edu Tel: 414-456-7505 Fax: 414-456-6595 http://rgd.mcw.edu -----Original Message----- From: biocurator-bounces+shimoyama=mcw.edu at tairgroup.org [mailto:biocurator-bounces+shimoyama=mcw.edu at tairgroup.org] On Behalf Of Mary Ann Tuli Sent: Monday, April 14, 2008 12:42 PM To: biocurator at tairgroup.org Subject: [Biocurator] Journal Submission Form Hello, I needed to leave the meeting (it's quite late here in the UK!), but I have a couple of thoughts. The policy in which authors are required to have an ISND accession number before acceptance of their paper is called the mandatory submission policy. I am not sure how many journals have this policy but it might be a good place to tie in the form we are discussing with this requirement. However, based on my understanding of the reluctance of some journals to accept the mandatory submission policy, any form we have would I think need to be capturing only the very basic but essential information. Someone mentioned that it would be useful to mention that any paper including this essential information would be prioritised for curation. From my (WormBase) point of view, this is certainly not the case. Data which is submitted using our in-house submission forms (crude as some of them are) gets the highest priority. Thanks, Mary Ann ~ Mary Ann Tuli ~ WormBase Group: www.wormbase.org ~ ~ The Morgan Building, Sanger Institute, ~ The Wellcome Trust Genome Campus, ~ Hinxton, Cambridge, CB10 1HH, UK. ~ Tel: +44 (0)1223 496885 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Biocurator mailing list Biocurator at tairgroup.org http://mailman.tairgroup.org/mailman/listinfo/biocurator From midori at ebi.ac.uk Tue Apr 15 08:58:38 2008 From: midori at ebi.ac.uk (Midori Harris) Date: Tue, 15 Apr 2008 16:58:38 +0100 (BST) Subject: [Annotation] [annotation] gamma-aminobutyric acid metabolic process/derivatives question In-Reply-To: <47A743AD.3070308@sanger.ac.uk> References: <47A743AD.3070308@sanger.ac.uk> Message-ID: I've put this on the GOC meeting agenda ... On Mon, 4 Feb 2008, Valerie Wood wrote: > I have a gene uga1 annotated to > 4-aminobutyrate transaminase activity > and > gamma-aminobutyric acid metabolic process > > at the moment in GO derivatives-of-x metabolism under x metabolism generally > so > gamma-aminobutyric acid metabolic process > inherits process annotations to > fatty acid metabolic process and > amine metabolic process > http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0009448 > > even though GABA isn't a 'fatty acid' > > Do people have any feelings whether this is correct for GO ? > > i.e. would you expect to see genes annotated to "gamma-aminobutyric acid > metabolic process" > annotated to "fatty acid metabolic process" ? > > respond to SF > > https://sourceforge.net/tracker/?func=detail&atid=440764&aid=1885151&group_id=36855 > > > > > > > From cherry at stanford.edu Thu Apr 17 16:41:55 2008 From: cherry at stanford.edu (Mike Cherry) Date: Thu, 17 Apr 2008 16:41:55 -0700 Subject: [Annotation] May Curator Discussion scheduling Message-ID: For simplicity this time I have just listed two times, 1300 and 1400 EDT. Those were by far the most popular times in the previous doodles. I have also limited this poll to May 5-16. http://www.doodle.ch/q6zuhxmmmsghreav I've also started putting together the wiki. Here is what I have so far. In the wiki spirit please edit as you think is appropriate. http://wiki.geneontology.org/index.php/BioCurator_Forum -Mike From pj37 at cornell.edu Fri Apr 18 08:54:33 2008 From: pj37 at cornell.edu (Pankaj Jaiswal) Date: Fri, 18 Apr 2008 11:54:33 -0400 Subject: [Annotation] Usage of the With/From Column for IEA ? Message-ID: <4808C439.3050505@cornell.edu> http://www.geneontology.org/GO.evidence.shtml#iea Ref: From the above site Usage of the With/From Column for IEA At the January 2007 GOC meeting, it was agreed that it will be required to make an entry in the with/from column for all annotations made after May 1, 2007 when using this evidence code to indicate what individual sequences, sequence objects, methods, keyword mapping files, etc. are the basis of the annotation. When multiple entries are placed in the with/from field, they are separated by pipes. ------------ Based on this rule, would you please suggest, what would be the value for the 'WITH' column. - if we are making a TMHMM prediction for a 'putative tansmembrane' annotation - if we have used TargetP/SignalP/Predotar/Psort to predict a putative cellular localization None of the above prediction softwares give appropriate value to be filled in the 'WITH' column. If you agree our suggestion is to relax this rule for IEA's. Pankaj From midori at ebi.ac.uk Wed Apr 23 16:00:08 2008 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Wed, 23 Apr 2008 23:00:08 UT Subject: [Annotation] SourceForge Annotation Tracker Update Message-ID: <200804232300.m3NN0811409388@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20080423/c8c21867/attachment-0001.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20080423/c8c21867/attachment-0001.pl From midori at ebi.ac.uk Fri Apr 25 16:00:07 2008 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Fri, 25 Apr 2008 23:00:07 UT Subject: [Annotation] SourceForge Annotation Tracker Update Message-ID: <200804252300.m3PN07x1416522@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20080425/3c1a004c/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20080425/3c1a004c/attachment.pl From midori at ebi.ac.uk Sat Apr 26 16:00:07 2008 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Sat, 26 Apr 2008 23:00:07 UT Subject: [Annotation] SourceForge Annotation Tracker Update Message-ID: <200804262300.m3QN07M1149670@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20080426/9b8d8571/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20080426/9b8d8571/attachment.pl From midori at ebi.ac.uk Mon Apr 28 16:00:07 2008 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Mon, 28 Apr 2008 23:00:07 UT Subject: [Annotation] SourceForge Annotation Tracker Update Message-ID: <200804282300.m3SN07c1144763@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20080428/c6cef016/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20080428/c6cef016/attachment.pl From midori at ebi.ac.uk Tue Apr 29 16:00:07 2008 From: midori at ebi.ac.uk (midori at ebi.ac.uk) Date: Tue, 29 Apr 2008 23:00:07 UT Subject: [Annotation] SourceForge Annotation Tracker Update Message-ID: <200804292300.m3TN07i1434670@mozart.ebi.ac.uk> An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/annotation/attachments/20080429/92edd808/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://fafner.stanford.edu/pipermail/annotation/attachments/20080429/92edd808/attachment.pl