From midori at ebi.ac.uk Mon Sep 3 11:31:13 2007 From: midori at ebi.ac.uk (Midori Harris) Date: Mon, 3 Sep 2007 19:31:13 +0100 (BST) Subject: [go] Ontology development - August highlights Message-ID: Dear GO, The most recent monthly report on ontology content, for July 2007, is now available at: http://gocwiki.geneontology.org/index.php/Aug2007_ontology_report Some highlights from August: * Work to follow up on the content meeting on muscle development is in progress (http://wiki.geneontology.org/index.php/Muscle_Development). * Work following the meeting on cardiovascular physiology is done (http://gocwiki.geneontology.org/index.php/Cardiovascular_physiology/development). * Work on GO-Cell cross-products has resumed (http://wiki.geneontology.org/index.php/XP:Meetings#2007.2F07.2F26_:_GO-CL_XPs.2C_next_steps). * We've made a lot of progress on renaming 'sensu' terms and improving their definitions (http://gocwiki.geneontology.org/index.php/Sensu_Main_Page). In September, we will focus on regulation, with the aim of adding cross-products and using the long-awaited 'regulates' relationship; see http://gocwiki.geneontology.org/index.php/Regulation_Main_Page. Also, the GO managers are preparing an update for the January 2008 NAR Database Issue. A draft manuscript will go to the GO list shortly. As usual, details of small- and medium-scale changes are available in the SourceForge Curator Requests tracker. Please contact us if you want to help out with ontology work in a particular area, or if you have any comments or questions about what's going on. Midori & David on behalf of GO's ontology developers From midori at ebi.ac.uk Wed Sep 5 07:38:52 2007 From: midori at ebi.ac.uk (Midori Harris) Date: Wed, 5 Sep 2007 15:38:52 +0100 (BST) Subject: [go] NAR database issue update Message-ID: Dear GO, As you may have heard, we are doing an update article for the NAR database issue that will be published in January 2008. The GO managers have contributed content, which I have assembled into a draft manuscript (attached). Please read it and send comments, suggestions, etc. Some specific points: - Please check the list of GO consortium members and their affiliations. I've made some updates from last time, but there may be other things to change. - I've put in a small placeholder blurb for software, but I don't think it's all that good. Any improvements from the software group would be most welcome indeed. - In its present format it looks longer than the allowed 4 pages, but NAR's format will shrink it somewhat. Nevertheless, I doubt that it can get much longer, so all but the smallest additions will have to be matched by trimming something. - Sections can be reorganized if anyone suggests an improved order/outline. - I've put in a portion of Mary's reference genome graph for MSH2 as a figure, based on a suggestion from Pascale. No one has complained so far, but if a different gene is preferred, let me know. I'm thinking of including the full graph as supplementary material. - Any other suggestions for supplements? Midori -------------- next part -------------- A non-text attachment was scrubbed... Name: nar2008.doc Type: application/octet-stream Size: 185856 bytes Desc: Url : http://fafner.stanford.edu/pipermail/go/attachments/20070905/c523568e/attachment.obj From pj37 at cornell.edu Wed Sep 5 07:52:22 2007 From: pj37 at cornell.edu (Pankaj Jaiswal) Date: Wed, 05 Sep 2007 10:52:22 -0400 Subject: [go] NAR database issue update In-Reply-To: References: Message-ID: <46DEC2A6.8010708@cornell.edu> Just a quick suggestion. Under Acknowledgments In addition to the NIH grant can we add ... The additional contributions in terms of Ontology development and ontology annotations were made by the XYZ grants funded to Source-databases-so-and-so by the abc-funding agency. This will help many of us to list our contributions and the funding. Pankaj Midori Harris wrote: > Dear GO, > > As you may have heard, we are doing an update article for the NAR > databaseissue that will be published in January 2008. The GO managers > have contributedcontent, which I have assembled into a draft manuscript > (attached). > > Please read it and send comments, suggestions, etc. Some specific points: > > - Please check the list of GO consortium members and their > affiliations.I've made some updates from last time, but there may be > other things tochange. > > - I've put in a small placeholder blurb for software, but I don't > thinkit's all that good. Any improvements from the software group would > be mostwelcome indeed. > > - In its present format it looks longer than the allowed 4 pages, > butNAR's format will shrink it somewhat. Nevertheless, I doubt that it > canget much longer, so all but the smallest additions will have to be > matchedby trimming something. > > - Sections can be reorganized if anyone suggests an improvedorder/outline. > > - I've put in a portion of Mary's reference genome graph for MSH2 as > afigure, based on a suggestion from Pascale. No one has complained so > far,but if a different gene is preferred, let me know. I'm thinking > ofincluding the full graph as supplementary material. > > - Any other suggestions for supplements? > > Midori -- Pankaj Jaiswal G-15, Bradfield Hall Dept. of Plant Breeding and Genetics Cornell University Ithaca, NY-14853, USA Ph. +1-607-255-3103 / 4199 fax: +1-607-255-6683 From cjm at fruitfly.org Wed Sep 5 07:26:27 2007 From: cjm at fruitfly.org (Chris Mungall) Date: Wed, 5 Sep 2007 15:26:27 +0100 Subject: [go] synonym category In-Reply-To: References: Message-ID: On Aug 30, 2007, at 11:01 AM, Midori Harris wrote: > Hi, > > For SF 1195550, I have added *lots* of synonyms for enzyme activity > terms. Because these synonyms are all derived from EC (obtained via > Expasy), it's been suggested that we flag them by creating a > synonym category. For the moment I've done so in a file in the go/ > scratch/ directory (go_EC_synonyms.obo). Before I put them in the > live gene_ontology_edit.obo file, I have some questions: > > - Will any scripts, e.g. the obo2obo script that generates the > gene_ontology.obo (1.0 format) file, need to be changed? in theory, no (although the mapping will be lossy), but we should of course do the usual check. > - Should we make an announcement on the GO or GO-friends list? do you mean pre-announce to warn people? Probably, yes. But you probably don't want to hear this as the longer we delay it the harder it will be to integrate this back in... Note that of course people shouldn't be downloading go_edit.obo, this was intended to be the editor's version, not the obof1.2 version. Before we do this change I would like to sort out this situation > At the last GOC meeting we agreed to use a synonym category to > denote Obol-friendly "structure synonyms," note that obol uses any exact synonym, it doesn't need anything explicitly designated. but it's always good to give additional information on why a particular synonym was chosen if the implementation cost is negligible. I'm not sure it is here - once we start designating structured synonyms we are compelled to maintain them are we not? > but that category has not yet gone live. The "EC synonym" category > would therefore be the first to be added to the > gene_ontology_edit.obo file. > > SF link: > https://sourceforge.net/tracker/? > func=detail&atid=440764&aid=1195550&group_id=36855 > > Thanks, > Midori > From midori at ebi.ac.uk Wed Sep 5 13:00:48 2007 From: midori at ebi.ac.uk (Midori Harris) Date: Wed, 5 Sep 2007 21:00:48 +0100 (BST) Subject: [go] NAR database issue update In-Reply-To: References: Message-ID: A few people have had problems with the previous attachment. Two more attachments here: a second attempt at a .doc, and a .rtf. Let me know if there are still problems; if all else fails I can try sending a PDF. midori On Wed, 5 Sep 2007, Midori Harris wrote: > Dear GO, > > As you may have heard, we are doing an update article for the NAR > database issue that will be published in January 2008. The GO managers > have contributed content, which I have assembled into a draft manuscript > (attached). > > Please read it and send comments, suggestions, etc. Some specific points: > > - Please check the list of GO consortium members and their affiliations. > I've made some updates from last time, but there may be other things to > change. > > - I've put in a small placeholder blurb for software, but I don't think > it's all that good. Any improvements from the software group would be > most welcome indeed. > > - In its present format it looks longer than the allowed 4 pages, but > NAR's format will shrink it somewhat. Nevertheless, I doubt that it can > get much longer, so all but the smallest additions will have to be > matched by trimming something. > > - Sections can be reorganized if anyone suggests an improved > order/outline. > > - I've put in a portion of Mary's reference genome graph for MSH2 as a > figure, based on a suggestion from Pascale. No one has complained so > far, but if a different gene is preferred, let me know. I'm thinking of > including the full graph as supplementary material. > > - Any other suggestions for supplements? > > Midori -------------- next part -------------- A non-text attachment was scrubbed... Name: nar2008.doc Type: application/msword Size: 167424 bytes Desc: Url : http://fafner.stanford.edu/pipermail/go/attachments/20070905/f6904c0d/attachment.doc -------------- next part -------------- A non-text attachment was scrubbed... Name: nar2008.rtf Type: application/rtf Size: 39565 bytes Desc: Url : http://fafner.stanford.edu/pipermail/go/attachments/20070905/f6904c0d/attachment.rtf From midori at ebi.ac.uk Thu Sep 6 08:20:42 2007 From: midori at ebi.ac.uk (Midori Harris) Date: Thu, 6 Sep 2007 16:20:42 +0100 (BST) Subject: [go] synonym category In-Reply-To: References: Message-ID: On Wed, 5 Sep 2007, Chris Mungall wrote: > > On Aug 30, 2007, at 11:01 AM, Midori Harris wrote: > >> Hi, >> >> For SF 1195550, I have added *lots* of synonyms for enzyme activity terms. >> Because these synonyms are all derived from EC (obtained via Expasy), it's >> been suggested that we flag them by creating a synonym category. For the >> moment I've done so in a file in the go/scratch/ directory >> (go_EC_synonyms.obo). Before I put them in the live gene_ontology_edit.obo >> file, I have some questions: >> >> - Will any scripts, e.g. the obo2obo script that generates the >> gene_ontology.obo (1.0 format) file, need to be changed? > > in theory, no (although the mapping will be lossy), but we should of course > do the usual check. > >> - Should we make an announcement on the GO or GO-friends list? > > do you mean pre-announce to warn people? Probably, yes. But you probably > don't want to hear this as the longer we delay it the harder it will be to > integrate this back in... I'm hoping it won't be a big deal to feed it to obomerge, so I'm happy to err on the side of warning. > Note that of course people shouldn't be downloading go_edit.obo, this was > intended to be the editor's version, not the obof1.2 version. Before we do > this change I would like to sort out this situation Fine (and good luck ...) > >> At the last GOC meeting we agreed to use a synonym category to denote >> Obol-friendly "structured synonyms," > > note that obol uses any exact synonym, it doesn't need anything explicitly > designated. > > but it's always good to give additional information on why a particular > synonym was chosen if the implementation cost is negligible. I'm not sure it > is here - once we start designating structured synonyms we are compelled to > maintain them are we not? I'm actually agnostic on the "structured synonyms" ... the only reason I mentioned them here was to remind everyone that they've heard of synonym categories in that context. m > >> but that category has not yet gone live. The "EC synonym" category would >> therefore be the first to be added to the gene_ontology_edit.obo file. >> >> SF link: >> https://sourceforge.net/tracker/?func=detail&atid=440764&aid=1195550&group_id=36855 >> >> Thanks, >> Midori > From camon at ebi.ac.uk Fri Sep 7 04:41:28 2007 From: camon at ebi.ac.uk (camon at ebi.ac.uk) Date: Fri, 7 Sep 2007 12:41:28 +0100 (BST) Subject: [go] GOC Webpage:GO Annotation for the Immune System Message-ID: <33161.217.43.213.91.1189165288.squirrel@webmail.ebi.ac.uk> Hi, The WT grant to fund Immune gene GO curation at MGI, EBI, UCL and TCD has been submitted and we will not hear any more about that until Feb 2008. In the meantime we have created an Immunology specific GOC set of webpages http://www.geneontology.org/GO.immunology.shtml Thanks to Alex, Ruth, Michael, Judy, Jen and Cliona for comments on content and especially Amelia for formatting the pages and publishing. Some pages will be expanded further if the project gets funded, fingers crossed. kind regards Evelyn P.S BioMed Central said that we can use any of the figures in their articles with just author's permission as long as the articles are cited, good to know. From midori at ebi.ac.uk Fri Sep 7 06:51:24 2007 From: midori at ebi.ac.uk (Midori Harris) Date: Fri, 7 Sep 2007 14:51:24 +0100 (BST) Subject: [go] NAR database issue update In-Reply-To: References: Message-ID: > On Wed, 5 Sep 2007, Midori Harris wrote: > >> Dear GO, >> >> As you may have heard, we are doing an update article for the NAR database >> issue that will be published in January 2008. The GO managers have >> contributed content, which I have assembled into a draft manuscript >> (attached). >> >> Please read it and send comments, suggestions, etc. Some specific points: >> >> - Please check the list of GO consortium members and their affiliations. >> I've made some updates from last time, but there may be other things to >> change. >> >> - I've put in a small placeholder blurb for software, but I don't think >> it's all that good. Any improvements from the software group would be most >> welcome indeed. >> >> - In its present format it looks longer than the allowed 4 pages, but NAR's >> format will shrink it somewhat. Nevertheless, I doubt that it can get much >> longer, so all but the smallest additions will have to be matched by >> trimming something. >> >> - Sections can be reorganized if anyone suggests an improved order/outline. >> >> - I've put in a portion of Mary's reference genome graph for MSH2 as a >> figure, based on a suggestion from Pascale. No one has complained so far, >> but if a different gene is preferred, let me know. I'm thinking of >> including the full graph as supplementary material. >> >> - Any other suggestions for supplements? >> >> Midori From midori at ebi.ac.uk Fri Sep 7 06:54:48 2007 From: midori at ebi.ac.uk (Midori Harris) Date: Fri, 7 Sep 2007 14:54:48 +0100 (BST) Subject: [go] NAR database issue update In-Reply-To: References: Message-ID: Hi all, Apologies for the stray email that just came ... I've attached an updated draft with changes based on the comments I've received so far. Further suggestions are still welcome for another couple of (working) days; it has to be submitted by the end of next week. Midori > > Dear GO, > > As you may have heard, we are doing an update article for the NAR database > issue that will be published in January 2008. The GO managers have > contributed content, which I have assembled into a draft manuscript > (attached). > > Please read it and send comments, suggestions, etc. Some specific points: > > - Please check the list of GO consortium members and their affiliations. > I've made some updates from last time, but there may be other things to > change. > > - I've put in a small placeholder blurb for software, but I don't think > it's all that good. Any improvements from the software group would be most > welcome indeed. > > - In its present format it looks longer than the allowed 4 pages, but NAR's > format will shrink it somewhat. Nevertheless, I doubt that it can get much > longer, so all but the smallest additions will have to be matched by > trimming something. > > - Sections can be reorganized if anyone suggests an improved order/outline. > > - I've put in a portion of Mary's reference genome graph for MSH2 as a > figure, based on a suggestion from Pascale. No one has complained so far, > but if a different gene is preferred, let me know. I'm thinking of > including the full graph as supplementary material. > > - Any other suggestions for supplements? > > Midori -------------- next part -------------- A non-text attachment was scrubbed... Name: NAR2008-with-fig.doc Type: application/msword Size: 188928 bytes Desc: Url : http://fafner.stanford.edu/pipermail/go/attachments/20070907/61c3f05e/attachment.doc From val at sanger.ac.uk Mon Sep 10 03:03:36 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Mon, 10 Sep 2007 11:03:36 +0100 Subject: [go] accommodation at Princeton Message-ID: <46E51678.5050901@sanger.ac.uk> I left my accommodation booking a little late and the Nassau were only able to offer me a suite for the 2 days of the ref genome meeting (nights 26/27). Has anybody had the same problem and managed to find cheaper alternative accommodation nearby for these 2 nights? Or anybody know the area and can suggest anywhere nearby? Thanks Val -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From aji at ebi.ac.uk Mon Sep 10 07:31:02 2007 From: aji at ebi.ac.uk (Amelia Ireland) Date: Mon, 10 Sep 2007 15:31:02 +0100 Subject: [go] accommodation at Princeton In-Reply-To: <6123BF526ABF5845ADF7B9DC2ADAE18403529475@EXCLUSTER.pu.win.princeton.edu> References: <46E5504B.1050202@informatics.jax.org> <6123BF526ABF5845ADF7B9DC2ADAE18403529475@EXCLUSTER.pu.win.princeton.edu> Message-ID: <3D4CC997-185F-4FD2-8ABA-A16CF21B66D7@ebi.ac.uk> > Subject: [go] accommodation at Princeton > Date: Mon, 10 Sep 2007 11:03:36 +0100 > From: Valerie Wood > To: GO Mailing List > > > I left my accommodation booking a little late and the Nassau were only > able to offer me a suite for the 2 days of the ref genome meeting > (nights 26/27). Has anybody had the same problem and managed to find > cheaper alternative accommodation nearby for these > 2 nights? Or anybody know the area and can suggest anywhere nearby? I am in the same position as Val for the nights of the 26th / 27th, so any advice would be gratefully received! Alternatively, if anyone is staying in one of those suites and wouldn't mind me camping out on their sofa... ;) Thanks, Amelia. -- Amelia Ireland GO Editorial Office, European Bioinformatics Institute, UK. Carbon neutral driving: http://www.targetneutral.com/TONIC/index.jsp From jblake at informatics.jax.org Mon Sep 10 08:16:06 2007 From: jblake at informatics.jax.org (Judith Blake) Date: Mon, 10 Sep 2007 11:16:06 -0400 Subject: [go] accommodation at Princeton In-Reply-To: <3D4CC997-185F-4FD2-8ABA-A16CF21B66D7@ebi.ac.uk> References: <46E5504B.1050202@informatics.jax.org> <6123BF526ABF5845ADF7B9DC2ADAE18403529475@EXCLUSTER.pu.win.princeton.edu> <3D4CC997-185F-4FD2-8ABA-A16CF21B66D7@ebi.ac.uk> Message-ID: <46E55FB6.7080204@informatics.jax.org> Details on meetings gradually being added to wiki under 'Consortium Meetings' Reference Genome meetings to be held in Frist Campus Center - Multipurpose Room C see Princeton Map http://www.princeton.edu/~pumap/index.html?id=26 Judy Amelia Ireland wrote: >> Subject: [go] accommodation at Princeton >> Date: Mon, 10 Sep 2007 11:03:36 +0100 >> From: Valerie Wood >> To: GO Mailing List >> >> >> I left my accommodation booking a little late and the Nassau were only >> able to offer me a suite for the 2 days of the ref genome meeting >> (nights 26/27). Has anybody had the same problem and managed to find >> cheaper alternative accommodation nearby for these >> 2 nights? Or anybody know the area and can suggest anywhere nearby? > > I am in the same position as Val for the nights of the 26th / 27th, so > any advice would be gratefully received! > > Alternatively, if anyone is staying in one of those suites and > wouldn't mind me camping out on their sofa... ;) > > Thanks, > Amelia. > > -- > Amelia Ireland > GO Editorial Office, > European Bioinformatics Institute, UK. > Carbon neutral driving: http://www.targetneutral.com/TONIC/index.jsp > > > From jimhu at tamu.edu Mon Sep 10 09:32:08 2007 From: jimhu at tamu.edu (Jim Hu) Date: Mon, 10 Sep 2007 12:32:08 -0400 Subject: [go] accommodation at Princeton In-Reply-To: <46E51678.5050901@sanger.ac.uk> References: <46E51678.5050901@sanger.ac.uk> Message-ID: Meant to send to the list, not just Val... I thought it was just me! I booked in something called the Clarion Hotel Palmer Inn based on trying to interpret the maps on Travelocity (I did select the search to look for fee internet). They have a 15% discount for staying more than 2 nights, so I'm paying ~$800 for Sat night through Thurs night. I just called them and they said they have a complimentary shuttle to campus as long as it's reserved a day in advance. http://travel.travelocity.com/hotel/HotelDetailReview.do? Service=TRAVELOCITY&propertyId=52742 Jim On Sep 10, 2007, at 6:03 AM, Valerie Wood wrote: > > > I left my accommodation booking a little late and the Nassau were > only able to offer me a suite for the 2 days of the ref genome > meeting (nights 26/27). Has anybody had the same problem and > managed to find cheaper alternative accommodation nearby for these > 2 nights? Or anybody know the area and can suggest anywhere nearby? > > > Thanks > > Val > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/go/attachments/20070910/cefee110/attachment.html From jimhu at tamu.edu Mon Sep 10 09:40:15 2007 From: jimhu at tamu.edu (Jim Hu) Date: Mon, 10 Sep 2007 12:40:15 -0400 Subject: [go] accommodation at Princeton In-Reply-To: References: <46E51678.5050901@sanger.ac.uk> Message-ID: <0FEC48FE-6A23-450D-AFE9-9A275069B9A9@tamu.edu> p.s. Disclaimer: I've never stayed there and don't have any info other than what's on the web, so if others join me there and it turns out to be bad... On Sep 10, 2007, at 12:32 PM, Jim Hu wrote: > Meant to send to the list, not just Val... > > I thought it was just me! I booked in something called the Clarion > Hotel Palmer Inn based on trying to interpret the maps on > Travelocity (I did select the search to look for fee internet). > They have a 15% discount for staying more than 2 nights, so I'm > paying ~$800 for Sat night through Thurs night. > > I just called them and they said they have a complimentary shuttle > to campus as long as it's reserved a day in advance. > > http://travel.travelocity.com/hotel/HotelDetailReview.do? > Service=TRAVELOCITY&propertyId=52742 > > Jim > On Sep 10, 2007, at 6:03 AM, Valerie Wood wrote: > >> >> >> I left my accommodation booking a little late and the Nassau were >> only able to offer me a suite for the 2 days of the ref genome >> meeting (nights 26/27). Has anybody had the same problem and >> managed to find cheaper alternative accommodation nearby for these >> 2 nights? Or anybody know the area and can suggest anywhere nearby? >> >> >> Thanks >> >> Val >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose >> registered office is 215 Euston Road, London, NW1 2BE. > > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/go/attachments/20070910/160cde18/attachment.html From hitz at genome.Stanford.EDU Mon Sep 10 10:03:15 2007 From: hitz at genome.Stanford.EDU (Benjamin Hitz) Date: Mon, 10 Sep 2007 10:03:15 -0700 Subject: [go] accommodation at Princeton In-Reply-To: <46E51678.5050901@sanger.ac.uk> References: <46E51678.5050901@sanger.ac.uk> Message-ID: <8A58A71A-2390-447A-BEA1-A14E6B4254C5@genome.stanford.edu> Rama and I just made reservations for 22-26 at Nassau Inn... so maybe there were some cancellations? Ben On Sep 10, 2007, at 3:03 AM, Valerie Wood wrote: > > > I left my accommodation booking a little late and the Nassau were > only able to offer me a suite for the 2 days of the ref genome > meeting (nights 26/27). Has anybody had the same problem and > managed to find cheaper alternative accommodation nearby for these > 2 nights? Or anybody know the area and can suggest anywhere nearby? > > > Thanks > > Val > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. -- Ben Hitz Senior Scientific Programmer ** Saccharomyces Genome Database ** GO Consortium Stanford University ** hitz at genome.stanford.edu From kara at genomics.princeton.edu Mon Sep 10 10:08:39 2007 From: kara at genomics.princeton.edu (Kara Dolinski) Date: Mon, 10 Sep 2007 13:08:39 -0400 Subject: [go] housing for GO meeting Message-ID: <5ED91FD4-141B-4FCE-9FE2-A4474CF64FED@genomics.princeton.edu> Hello, I have received a few messages this morning from people who were told that Nassau Inn only had suites available, and so alternative hotels were needed for the GO meeting. Options are: - those who still need a room to double up and stay in a suite together in the Nassau Inn (if you want to contact each other, I've heard from Val, Amelia, and Rama about needing a place, though Val might be set?). - Hyatt Regency: http://www.princeton.hyatt.com/hyatt/hotels/index.jsp They have a free shuttle to campus; it's a 5-10 minute car ride to campus. There isn't a schedule available; you need to make a reservation in advance and the time of the campus run is based on availability. -Hyatt Place suites: http://www.amerisuites.com/hotels/listhotel.php?code=AJPR Free shuttle to campus that runs every 30 minutes from 7 am to 10 pm, with a break between 2-4 pm; it's a 5-10 minute car ride to campus. - B&B: A budget option, though a bit of a hike (~ 1 - 1.5 miles) to the meeting location, and we've never used the place ourselves, so we cannot vouch for it (though it's in a nice neighborhood): RIVERSIDE HOUSE Bed and Breakfast 45 Knoll Drive Princeton, NJ 08540 For reservations, call Bonnie Hunter 609-924-7868 or email: riverside.house at gmail.com Located on a quiet street near Lake Carnegie, Riverside House is a private home in a residential neighborhood, walking distrance from the University campus and the New York bus. The house is fully air-conditioned; all three guest bedrooms have color, cable TVs and hair dryers. Two large bedrooms share a bathroom; one smaller bedroom has a private bathroom. Guests also have use of the front room (which is a living-room/library) and the dining room for an expanded continental breakfast. A large outdoor deck for guests overlooks a private yard. Laundry service is available at an extra charge; there is also an iron and ironing board for guest use. There is ample room for parking and one guest telephone. Breakfasts consist of pastries or toast, a choice of cereals, three kinds of juices, a bowl of fresh fruit for each guest, and regular or decaf coffee and teas. Cost is $50 a night for a single; $60 a night for a double. These rates include breakfast and all the taxes. There is a $5 a night discount after the tenth night. Payment is by cash, travelers checks, or University checks. There is a two-night minimum stay requirement. Bonnie Hunter Riverside House B&B 45 Knoll Drive Princeton, NJ 0854 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/go/attachments/20070910/f138cbdf/attachment.html From rama at genome.Stanford.EDU Mon Sep 10 10:17:58 2007 From: rama at genome.Stanford.EDU (Rama Balakrishnan) Date: Mon, 10 Sep 2007 10:17:58 -0700 Subject: [go] housing for GO meeting In-Reply-To: <5ED91FD4-141B-4FCE-9FE2-A4474CF64FED@genomics.princeton.edu> References: <5ED91FD4-141B-4FCE-9FE2-A4474CF64FED@genomics.princeton.edu> Message-ID: <226D501F-4EF1-41C6-A196-6562735B2332@genome.stanford.edu> I hope you saw Ben's email. I did manage to get a room in the Nassau Inn. Thanks, Rama On Sep 10, 2007, at 10:08 AM, Kara Dolinski wrote: > Hello, > > I have received a few messages this morning from people who were > told that Nassau Inn only had suites available, and so alternative > hotels were needed for the GO meeting. > Options are: > > - those who still need a room to double up and stay in a suite > together in the Nassau Inn (if you want to contact each other, I've > heard from Val, Amelia, and Rama about needing a place, though Val > might be set?). > > - Hyatt Regency: > http://www.princeton.hyatt.com/hyatt/hotels/index.jsp > They have a free shuttle to campus; it's a 5-10 minute car ride to > campus. There isn't a schedule available; you need to make a > reservation in advance and the time of the campus run is based on > availability. > > -Hyatt Place suites: > http://www.amerisuites.com/hotels/listhotel.php?code=AJPR > Free shuttle to campus that runs every 30 minutes from 7 am to 10 > pm, with a break between 2-4 pm; it's a 5-10 minute car ride to > campus. > > - B&B: > A budget option, though a bit of a hike (~ 1 - 1.5 miles) to the > meeting location, and we've never used the place ourselves, so we > cannot vouch for it (though it's in a nice neighborhood): > > RIVERSIDE HOUSE > > Bed and Breakfast > > 45 Knoll Drive > > Princeton, NJ 08540 > > For reservations, > > call Bonnie Hunter 609-924-7868 > > or email: riverside.house at gmail.com > > Located on a quiet street near Lake Carnegie, Riverside House is a > private home in a residential neighborhood, walking distrance from > the University campus and the New York bus. > > The house is fully air-conditioned; all three guest bedrooms have > color, cable TVs and hair dryers. Two large bedrooms share a > bathroom; one smaller bedroom has a private bathroom. Guests also > have use of the front room (which is a living-room/library) and the > dining room for an expanded continental breakfast. A large outdoor > deck for guests overlooks a private yard. Laundry service is > available at an extra charge; there is also an iron and ironing > board for guest use. > > There is ample room for parking and one guest telephone. > > Breakfasts consist of pastries or toast, a choice of cereals, three > kinds of juices, a bowl of fresh fruit for each guest, and regular > or decaf coffee and teas. > > Cost is $50 a night for a single; $60 a night for a double. These > rates include breakfast and all the taxes. There is a $5 a night > discount after the tenth night. Payment is by cash, travelers > checks, or University checks. There is a two-night minimum stay > requirement. > > Bonnie Hunter > > Riverside House B&B > > 45 Knoll Drive > > Princeton, NJ 0854 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/go/attachments/20070910/6f407d0f/attachment.html From kchris at genome.Stanford.EDU Mon Sep 10 16:15:37 2007 From: kchris at genome.Stanford.EDU (Karen Christie) Date: Mon, 10 Sep 2007 16:15:37 -0700 (PDT) Subject: [go] finishing up Evidence Code Issues Message-ID: Hi, Since I'm due on September 20th and will be going on maternity leave shortly before the GO meeting, Mike asked me to send these remaining items to finish up the Evidence Code documentation directly to the list to at least get the discussion started. Some issues may need to be discussed at the GO meeting as well. This email will contain some responses to Midori's last email and a few other email comments to resolve some minor comments on the current draft of the new Evidence Code documentation. I will send separate emails to deal with each of these specific issues: 1. Restriction that all unknowns MUST use ND 2. IMP vs IGI for single gene mutations, regardless of gene being annotated 3. How to put program or method names in the with column for ISS 4. Scope of the RCA evidence code For both issues 2 and 4, I think that the recommendations I've made will help make it possible to create a decision tree/flowchart that is fairly simple and clear. I'll send a very rough draft of a flowchart separately as well. Note that for both #s 3 and 4, I have put some supplemental info into html docs in my personal space. I did not spend much time doing html formatting for these docs, on the thought that people might prefer to move them to the GOC wiki. However, as the Evidence Code Committee was not designated as a Working Group, I have no idea where to put them within the wiki structure. If a spot is designated for them, they can be moved to the wiki. -Karen Responses and comments on things in red on this page: http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml 1. GO_REF documentation > We should have documentation that explains GO_REF's and links to it > when we refer to them. Midori (15 Jun 2007): Links can go to the existing GO References page: http://www.geneontology.org/cgi-bin/references.cgi I can write up a description (which will be brief; there's not an enormous amount to say) and give it to Amelia to be added to the blurb at the top of this page. The plain text file from which the web page is generated contains a brief description of the format, which could be HTMLified and also added to the blurb if it would be useful. Karen (9 Sept 2007): Please do. It would also be good if the page for the GO_REFs is made easier to find in general in our documentation. 2. ChEBI IDs in with field? > Do we allow things like ChEBI IDs in the with field? Midori (15 Jun 2007): I would say yes. Karen (9 Sept 2007): Perhaps we should make this a quick agenda item for the next GO meeting, so that people can ratify this face to face, unless we get an overwhelming response via email to proceed with allowing this new ID for the with field. 3. IMP examples > any more positive examples for IMP?, e.g. phenotypic similarity Midori (15 Jun 2007): Dredged up from email from January 2002 ... Erich Schwarz needed to know which code to use for "other mutations sharing a complex mutant phenotype syndrome with [a well-characterized mutant]." My comment at the time was: "The situation you've described is IMP, not IGI, because (if I understand correctly) you're looking at one mutation at a time. Comparing the phenotype of one mutation to that of another helps you interpret the meaning, but is not a kind of genetic interaction." I think this still holds. Erich provided some details of an example, which I can forward if you want. Karen (9 Sept 2007): We can certainly include it, the more examples the better in my opinion, but don't send it to me. I'll be going on maternity leave soon and don't want to be responsible for this getting added. 4. use of with field for NAS > The Evidence Code Committee discussed the idea of making GO > annotations from Reactome entries. ... What does the full group feel > about the idea of allowing the ID for a database record, when such > exist, in the with field? Midori (15 Jun 2007): I'm all for including annotations based on Reactome entries -- they have a well-developed curation system that deeply involves expert biologists, so the statements in their records are very reliable. I am not in favor of putting the Reactome ID in the with field for these annotations, however, because the Reactome entry does not modify or supplement the evidence; rather, the entry provides the evidence. GO would effectively be using a Recatome record as a source of information about a gene product, so it would make much more sense to put the Reactome ID in the reference field. For the more general database record case, it may be that I don't sufficiently understand what might go in a GO_REF (or equivalent), so I don't understand the rationale for allowing 'with' for NAS. For the case where the author infers one thing from another, using a GO ID in 'with' makes more sense, but I think it's not really necessary because the author (presumably) hasn't actually made any GO annotations, and hasn't stated observations or conclusions in terms of, well, GO terms. (Perhaps this will change some day!) Also, note that we have expressly disallowed the use of 'with' for NAS, so the script would have to be changed if the use of with-for-NAS is agreed. Karen (9 Sept 2007): Regarding the idea of allowing Reactome IDs in the with field, the thought was that it provided the specific information about which record in Reactome made the statement, but the idea was controversial even just with the Evidence Code Committee. Regarding the idea of allowing GOids for NAS, I think you bring up a good point that this may not make sense since the author has typically not stated their statement in terms of a GOid from which an inference was made. Allowing this may just be more confusing than helpful, especially since deciding which GOid to put in the with field will almost always be a curator judgement. However, I wasn't one of the proponents of this idea, so those who are may wish to defend it. In any case, rather than adding yet another usage of the with column that is potentially confusing to users, I could personallyjust go with not allowing use of the with column at all for NAS. 5. Representation of examples for with/from: Susan (14 Jun 2007): IPI examples Looks good but there something odd about the IPI example, assuming I am looking at the latest version ok. Firstly, the paper is about mouse proteins not Drosophila so could we change FB to MGI please. Also, I am confused as to why there are three lines shown - MGI just list the middle one: FB:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI UniProt:protein_2_ID ... FB:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI UniProt:protein_2_ID|UniProt:protein_3_ID ... FB:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI FB:gene_2_ID So unless I'm missing something I suggest we lose the extra lines and have either: MGI:1349216 Abcd3 GO:0005515 PMID:10551832 IPI UniProt:P33897|UniProt:Q61285 OR MGI:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI UniProt:protein_2_ID|UniProt:protein_3_ID I'd prefer to include the real identifiers so it isn't a mix of 'real' and 'example'. Similarly there seems to be a mix of FB and SGD db identifiers in the IGI examples. A possible alternative for IGI is: In PMID:9043060, flies simultaneously mutant for three genes: klingon (klg), sevenless (sev) and Son of sevenless (Sos) are used to show that klingon participates in R7 photoreceptor fate commitment. This leads to the annotation: FB:FBgn0017590 klg GO:0045466 PMID:9043060 IGI FB:FBgn0003366|FB:FBgn0001965 Karen (9 Sept 2007): I'm all for real examples, but I don't have time to dig them up for every evidence code. Perhaps we could distribute this task around, so that we have multiple real examples for each evidence code. It would be good to have at least one example with one entry in the with column, as well as the one with multiple. It would also be good if they showed various IDs in the with field. This would be a reasonable task if there was one person for each evidence code to find some real examples, and then hopefully it would be easy for Amelia to put them in the right format if she was given all the specific info that should be in the table. 6. ISS & with col: > Note that there should be good evidence that the gene product(s) > placed in the with/from column actually has the activity, process, > etc. being annotated. Midori (15 Jun 2007): Do we want to specifically say the "good evidence" should be *experimental* evidence? Would be consistent with the Ref Genome requirement, and good practice generally ... Karen (9 Sept 2007): We do have to remember that this Evidence Code document is not just for the use of the Reference Genomes. While did agree that ISS should not be made from pairwise BLAST unless the gene to be placed in the with column has been experimentally characterized, the ISS code covers more situations than just that. The with field may also contain Pfams, Prosite, TIGRFAMS, CBS, COG, PANTHER, and we also have to determine how to include method names here for stuff like tRNAscan and my specific question about snoRNAs. Michelle Gwinn may wish to comment on this too. Typos, other trivial fixes: ------------------------------------------------------- 1. IGI > Should we add a statement in the paragraph above to IGI, similar to > the one in IMP, about care in making annotations from gain of > function mutations ...? Midori (15 Jun 2007): Sounds reasonable to me. Karen (9 Sept 2007): OK, added to first paragraph of IGI. 2. Last paragraph of Introduction: Midori (15 Jun 2007): Change "effect" to "affect" in "... will also effect the quality of the resulting annotation." Karen (9 Sept 2007): done 3. IDA & IMP: Midori (15 Jun 2007): Does "over-expression" really need to be hyphenated? I've seen it unhyphenated more frequently; also, there's one unhyphenated occurrence in the document. Karen (9 Sept 2007): changed to unhyphenated 4. IGI examples: Midori (15 Jun 2007): The statements "For this type of experiment, use the IGI Code" could be deleted -- they're redundant with the fact that the description appears in a list headed "where the IGI code should be used." Karen (9 Sept 2007): done From kchris at genome.Stanford.EDU Mon Sep 10 16:17:03 2007 From: kchris at genome.Stanford.EDU (Karen Christie) Date: Mon, 10 Sep 2007 16:17:03 -0700 (PDT) Subject: [go] Requirement for all 'unknown' annotations to use ND code Message-ID: Requirement for all 'unknown' annotations to use ND code ----------------------------------------------------------- Hi all, A question was brought up about the requirement that ND be the only evidence code allowed for (unknown) annotations to the root nodes within the Evidence Code Committee, and was not resolved there. Discussion so far on the list is also mixed. To me, the issue is that is at the Jan GO meeting we agreed that evidence codes are ONLY about the type of evidence used to make the annotation, and not about anything else. However, by saying that people can use the ND evidence code as a way to find all the unknown annotations, we are encoding an extra meaning into it. The email discussion of this issue is below. -Karen Requirement that ND be the only allowable evidence code for unknown annotations proposed new rule for ND: Even if an author states in a paper that there is no data available or nothing is known about the gene product in a particular GO aspect, annotation to the corresponding root node should be made with ND evidence code citing either the annotating group's internal reference or the GOC's reference on use of the ND evidence code, not a specific paper. comment in red in draft document: I realize that we agreed to the above statement at the last GOC meeting, but... The more I think about it, the more I'm uncomfortable with the decision that we made that unknown annotations can only be made with ND, especially since the reason stated to do so has nothing to do with evidence, but is to help people better identify the unknown annotations. I think this is encoding information into the evidence code that is about something other than the evidence itself. I think this is poor practice, especially when we spent so much time at the Jan GO meeting discussing that evidence codes would be JUST a statement of the method by which the annotation was made. Jane Lomax (15 Jun 2007) I was under the impression that we'd agreed 2. at the Jan meeting i.e. ND is now the only allowable evidence code for unknown annotations? Midori Harris (15 Jun 2007) I understand, and would add that it also loses the information that at the time of writing, the authors -- who are presumably pretty well informed about the genes/gene products they study --are aware of no relevant data. (Tho this concern is not as grave as that of overloading an evidence code.) Valerie Wood (22 Jun 2007) I'm not so sure because: 1. If authors have specifically asserted that there is no information, this is usually a statement which is made based on looking at the database (for example if the author is dealing with a gene set). 2. Papers are frequently published concurrently and it is clear that the authors have no knowledge of the parallel papers, so an author statement is not always necessarily a good indication that there is no functional data without a curator check. 3. I'm pretty sure that when the unknowns disappeared, we advised software developers that they could retrieve the unknown annotations using the ND evidence code..... Although I agree it seems bad practice to put info in the evidence code other than the evidence itself, I think its more important that there is a very clear way to identify 'unknown' annotations. It seems like not many of the softwares have caught up with the previous change to unknowns (for example I havn't yet managed to find a way to look at GO term enrichment which recognises the unknown annotations.... does anybody know of one?) From kchris at genome.Stanford.EDU Mon Sep 10 16:18:27 2007 From: kchris at genome.Stanford.EDU (Karen Christie) Date: Mon, 10 Sep 2007 16:18:27 -0700 (PDT) Subject: [go] Boundary between IMP and IGI Message-ID: Boundary between IMP and IGI ------------------------------------------------------- In response to the new draft of the evidence code documentation, some discussion came up between Midori and Val about the usage of the IGI versus the IMP evidence codes. As this issue was not a specific gripe of anyone on the Evidence Code Committee, it was not discussed. However, one of the goals of this revision was to have guidelines that make sense and I completely see the point that it doesn't really make sense to say that making an inference from a strain with one mutation is a genetic interaction, even when you are annotating a gene other than the one that is mutant. We were also asked to make a decision tree/flow chart for evidence code decisions (I have a draft I'll send out later), and I think it would be a much simpler decision if there was a clear line between 1 mutant gene and multiple mutant genes. I think it would make a lot more sense if any annotations made on the basis of mutation, or comparison between alleles, of a single gene should use IMP. Since we already allow use of the with field for IMP to record the mutant allele, it might make more sense to use IMP for any annotation based on a phenotype of a single gene and just record the mutant allele in the with field. Since not all groups track alleles, perhaps we should also allow with for IMP to contain the name of the gene without specifically designating an allele. Below is transcript of the discussion that occurred on this issue. -Karen **IMP: > mutation in gene B provides information about gene A being > annotated. For this type of experiment, use the IGI code. and IGI: > Inference about one gene drawn from the phenotype of a mutation in a > different gene Midori (15 Jun 2007): I have always disagreed with this usage: I've argued that IMP would be more appropriate, because in the examples given, only one gene is mutated, so the "combination of alterations" criterion for IGI is not met. But it's an argument that I lost years ago. Oh well. Val (22 Jun 2007): This is still a bit is unclear to me "We also use this code for situations where a mutation in gene A provides information about the function, process, or component of gene B. If a mutation in gene A causes a mislocalization of gene B, gene A is annotated to protein localization with gene B in the with/from column using IGI." In the protein localization example above a mutation in gene A is providing information about gene A (protein localization) not about gene B (the protein localized). I have made a number of these type of annotations to 'protein localization, (the fission yeast community are very keen on localization dependency experiments for functionally connected gene products). However, I thought I had used the wrong evidence code (using the existing documentation) and that they should be IMP (I wanted to capture the protein localized and at the time I had no other way to do it). These were on my todo list to fix. It now seems they are OK as IGI, so I just wanted to double check......... The original documentation says: # Inference about one gene drawn from the phenotype of a mutation in a different gene I don't have an example of this though. I forgot what it is used for, although I used to know...... Midori (22 Jun 2007, in response to Val): > I have made a number of these type of annotations to 'protein > localization, (the fission yeast community are very keen on > localization dependency experiments for functionally connected gene > products). However, I thought I had used the wrong evidence code > (using the existing documentation) and that they should be IMP (I > wanted to capture the protein localized and at the time I had no > other way to do it). These were on my todo list to fix. It now > seems they are OK as IGI, so I just wanted to double check......... Your annotations are consistent with the existing documentation. What I'm saying is that I think the documentation should recommend IMP for these. I think I still wouldn't put B is 'with' with IMP, because a few groups would put the allele of A used in the experiment, and others would leave 'with' blank. > The original documentation says: # Inference about one gene drawn > from the phenotype of a mutation in a different gene I don't have an > example of this though. I forgot what it is used for, although I > used to know...... I would also prefer to recommend IMP for these. From kchris at genome.Stanford.EDU Mon Sep 10 16:20:08 2007 From: kchris at genome.Stanford.EDU (Karen Christie) Date: Mon, 10 Sep 2007 16:20:08 -0700 (PDT) Subject: [go] Putting method/program names into the with field for ISS Message-ID: Putting method/program names into the with field for ISS -------------------------------------------------------- I've reviewed several papers where ISS is the appropriate code, but for which only a method could be placed into the with field. Thus, I have some comments on how we might want to do this. I'll start with a little background. At the last GO meeting, we agreed to "Always use a WITH column for IEA and ISS, containing a program name if necessary. For example, make a ref to tRNAscan." However, we did not work out how to implement doing this. As phrased in the minutes, it sounds like the idea is just to put the name of the method in the with column. If that's all that is required then it's fairly simple to find an appropriate text string from a paper to put in the with column. However, I'm kind of assuming that we don't want to allow uncontrolled text strings in the with column mixed in with things of the format namespace:ID. Currently, to put something in the with column, it must have a namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names or methods, there are a couple problems with trying to put them into this type of format. One is that some of the methods to which research refer are not given an official name. The second, which applies to all the papers I've read so far, is that none of them have a namespace. If we need to format these in a way that is compatible with the namespace:ID format, then GO could generate a 'database' of collected methods. An entry in the GO.xrf_abbs file like the one below could define a namespace for such a collection. abbreviation: GO_CM database: Gene Ontology Database collected methods object: Accession (for collected method) example_id: GO_CM:0000001 Then for the second part, we'd have to start a collection of these various methods, probably just a file somewhat like the GO.xrf_abbs file. For this, there are a couple issues to deal with: 1) The authors of methods don't always give them a clear name. 2) There isn't always a single source reference. For programmatic methods, there is often a single source reference. However, for the consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't be comfortable designating a single reference as the source. In these cases, I'd be happier if we could associate a number of relevant refs to the 'method'. In other cases, an algorithm is mentioned by name, but no reference is cited. However, with those issues in mind, perhaps collecting this information would work. - accession: accession ID given by GO - method name: the name given to a program by the authors, when available, or a descriptive name based on the paper - developed in reference: the ID, e.g. PMID:xxxxx, for the reference describing the development of a method, when applicable, but would not be required. Can be filled with Not Applicable) for cases like 'box C/D snoRNA consensus' where there isn't a specific program that was developed. I don't know how we want to deal with cases like 'TMpredict' where they cited a reference that appears irrelevant or 'Kyte-Doolittle algorithm' where I didn't see a citation for the algorithm. - other references: Useful for cases like 'box C/D snoRNA consensus' where there isn't a specific program that was developed, but where you can cite 1 or more references which describe what the consensus is. - method classification: maybe this tag isn't necessary, but I thought it might be useful, particularly if we ever get to a situation where we have this in a database where you can search on this field. Below is what I would fill in for each field for the references listed at: http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-ExamplePapers.html The comments in parentheses are just comments to correlate the info below with the Example papers, and would not be included in the proposed file. accession: GO_CM:0000001 method name: box C/D snoRNA probabilistic model developed in reference: PMID:10024243 method classification: box C/D snoRNA gene prediction (would be used for example #1) accession: GO_CM:0000002 method name: box C/D snoRNA consensus developed in reference: Not Applicable other references: PMID:8674114; PMID:16484372 method classification: box C/D snoRNA gene prediction (would be used for example #s 2 & 3) accession: GO_CM:0000003 method name: snoGPS developed in reference: PMID:15306656 method classification: box H/ACA snoRNA gene prediction (would be used for example #4) accession: GO_CM:0000004 method name: box H/ACA snoRNA consensus developed in reference: Not Applicable other references: PMID:12007400 method classification: box H/ACA snoRNA gene prediction (would be used for example #5) accession: GO_CM:0000005 method name: TMpredict developed in reference: ? (paper #6 cites a reference, but seems incorrect did not find an appropriate citation via PubMed) method classification: protein hydrophobicity (would be used for example #6) accession: GO_CM:0000006 method name: Kyte-Doolittle algorithm developed in reference: ? (paper #7 does not cite a reference) method classification: protein hydrophobicity (would be used for example #7) accession: GO_CM:0000007 method name: tRNAscan developed in reference: PMID:1870126 other references: PMID: method classification: tRNA gene prediction (The Lowe & Eddy tRNAscan-SE ref referred to this program as "tRNAscan 1.3 by Fichant and Burks (12)" and cited this paper. However, this paper doesn't appear to name the algorithm at al. accession: GO_CM:0000008 method name: Pavesi et al. tRNA prediction algorithm developed in reference: PMID:8165140 method classification: tRNA gene prediction (they don't name their algorithm, so this name is derived from what they say, in conjuction with how it was referred to in the Lowe & Eddy paper on tRNAscan-SE.) accession: GO_CM:0000009 method name: tRNAscan-SE developed in reference: PMID:9023104 method classification: tRNA gene prediction From kchris at genome.Stanford.EDU Mon Sep 10 16:22:36 2007 From: kchris at genome.Stanford.EDU (Karen Christie) Date: Mon, 10 Sep 2007 16:22:36 -0700 (PDT) Subject: [go] Scope of the RCA evidence code Message-ID: Scope of the RCA evidence code ------------------------------------------------------- Here is my analysis of and recommendations for the future of the RCA evidence code: Having reviewed six papers of the type that originally prompted SGD to request the RCA evidence code, it is clear that all of these methods described within these papers include analysis of experimental data, e.g. expression data, two hybrid data, mass spec proteomic data, etc. Some also include sequence based data, but it is never the entire basis of the analysis. Two of the analyses (Troyanskaya et al, and Wade et al.) combined expression data with promoter sequence data, a type of sequence data not typically considered in analyses appropriate for the ISS code. Two other analyses (Baxter et al. and Alves et al.) combined structural analysis with either experimental results or with a mathematical model designed to test which mechanisms could reproduce existing published experimental results. Some RCA analyses also utilize existing functional annotations for characterized genes (Gat-Viks et al.). To summarize, all of these analyses combined multiple types of data, generally including experimental data, such as expression data or protein-protein interaction data. Some include sequence data, in this set either promoter sequence info or structural information, but none are based solely on sequence based information. Analyses based purely on sequence similarity based data, including sequence similarity with experimentally characterized gene products, as determined by pairwise or multiple alignment; prediction methods for non-coding RNA genes; recognized functional domains, as determined by tools such as InterPro, Pfam, SMART, etc.; predicted protein features, e.g., transmembrane regions, signal sequence, etc.; structural similarity with experimentally characterized gene products, as determined by crystallography, nuclear magnetic resonance, or computational prediction; should use the ISS evidence code (or the IEA code if it is not reviewed by a curator). The documentation does not currently list mapping files such as InterPro2GO, but I would include this as sequence-only based data since the basic analysis is all based on the sequence of the gene product and the hits by various sequence analysis methods. As a curator-reviewed code, annotations made with the RCA code must be reviewed/assigned by a curator. The documentation currently lists 'Text-based computation (e.g. text mining)' as acceptable for this evidence code. In the absence of specific examples of how this might be applied, I would suggest removing mention of 'Text-based computation' until we have an actual example or two to look at to see whether it fits into this evidence code or not. Accepting these recommendations would bascially return the RCA code to its original intent. It would also be consistent with the recommendation of the Evidence Code Committee (ECC) to overturn the 2006 Annotation Camp's recommendation to use RCA for sequence similarity comparisons where you could not put an experimentally characterized ortholog into the with column and also with the January 2007 GOC meeting decision that all methods based on only sequence-based info should use the ISS code. The GOC may not wish to consider renaming the evidence code, but having reviewed this set of papers, I think the phrase "Integrated Computational Analysis" would be a more descriptive name and more consistent with how authors of these types of methods describe them (the red highlighting in the sample papers page, url below, shows where the authors used that word). I'm not sure this is sufficient to make clear the distinction between these methods and sequence-only based methods, but it is better than "Reviewed Computational Analysis". In addition, right now the RCA documentation would exclude an analysis of this type if it was performed internally by a database group and not published. Thus, if the GOC is amenable to the idea of changing the name of the evidence code, I would suggest that we call it "Integrated Computational Analysis" with the abbreviation ICA. Here are links to supplemental information regarding this evidence code: Examples of the types of analyses the RCA code was intended to cover: http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html History of the RCA code: http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAhistory.html Summary of controversy over RCA vs ISS in Evidence Code Committee: http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAvsISScontroversy.html Proposed draft of new documentation for this code: http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#ica (note that original RCA doc is still present for comparison) From hitz at genome.Stanford.EDU Mon Sep 10 16:31:14 2007 From: hitz at genome.Stanford.EDU (Benjamin Hitz) Date: Mon, 10 Sep 2007 16:31:14 -0700 Subject: [go] Boundary between IMP and IGI In-Reply-To: References: Message-ID: <193E607C-FCF1-43CE-8194-8A287164AC90@genome.stanford.edu> Do we really need IGI and IMP? Is the only difference technically that IGI = double (or 2+?) mutant, IPI = single mutant? Ben On Sep 10, 2007, at 4:18 PM, Karen Christie wrote: > Boundary between IMP and IGI > ------------------------------------------------------- > > In response to the new draft of the evidence code documentation, some > discussion came up between Midori and Val about the usage of the IGI > versus the IMP evidence codes. As this issue was not a specific gripe > of anyone on the Evidence Code Committee, it was not discussed. > > However, one of the goals of this revision was to have guidelines that > make sense and I completely see the point that it doesn't really make > sense to say that making an inference from a strain with one mutation > is a genetic interaction, even when you are annotating a gene other > than the one that is mutant. > > We were also asked to make a decision tree/flow chart for evidence > code decisions (I have a draft I'll send out later), and I think it > would be a much simpler decision if there was a clear line between 1 > mutant gene and multiple mutant genes. > > I think it would make a lot more sense if any annotations made on the > basis of mutation, or comparison between alleles, of a single gene > should use IMP. Since we already allow use of the with field for IMP > to record the mutant allele, it might make more sense to use IMP for > any annotation based on a phenotype of a single gene and just record > the mutant allele in the with field. Since not all groups track > alleles, perhaps we should also allow with for IMP to contain the name > of the gene without specifically designating an allele. > > Below is transcript of the discussion that occurred on this issue. > > -Karen > > > **IMP: > >> mutation in gene B provides information about gene A being >> annotated. For this type of experiment, use the IGI code. and IGI: >> Inference about one gene drawn from the phenotype of a mutation in a >> different gene > > Midori (15 Jun 2007): > I have always disagreed with this usage: I've argued that IMP > would be > more appropriate, because in the examples given, only one gene is > mutated, so the "combination of alterations" criterion for IGI is > not > met. But it's an argument that I lost years ago. Oh well. > > Val (22 Jun 2007): > This is still a bit is unclear to me > > "We also use this code for situations where a mutation in gene A > provides information about the function, process, or component of > gene > B. If a mutation in gene A causes a mislocalization of gene B, > gene A > is annotated to protein localization with gene B in the with/from > column using IGI." > > In the protein localization example above a mutation in gene A is > providing information about gene A (protein localization) not about > gene B (the protein localized). > > I have made a number of these type of annotations to 'protein > localization, (the fission yeast community are very keen on > localization dependency experiments for functionally connected gene > products). However, I thought I had used the wrong evidence code > (using the existing documentation) and that they should be IMP (I > wanted to capture the protein localized and at the time I had no > other > way to do it). These were on my todo list to fix. It now seems they > are OK as IGI, so I just wanted to double check......... > > The original documentation says: > # Inference about one gene drawn from the phenotype of a mutation > in a > different gene I don't have an example of this though. I forgot what > it is used for, although I used to know...... > > Midori (22 Jun 2007, in response to Val): > > I have made a number of these type of annotations to 'protein > > localization, (the fission yeast community are very keen on > > localization dependency experiments for functionally connected > gene > > products). However, I thought I had used the wrong evidence code > > (using the existing documentation) and that they should be IMP (I > > wanted to capture the protein localized and at the time I had no > > other way to do it). These were on my todo list to fix. It now > > seems they are OK as IGI, so I just wanted to double > check......... > > Your annotations are consistent with the existing documentation. > What > I'm saying is that I think the documentation should recommend IMP > for > these. > > I think I still wouldn't put B is 'with' with IMP, because a few > groups would put the allele of A used in the experiment, and others > would leave 'with' blank. > > > The original documentation says: # Inference about one gene drawn > > from the phenotype of a mutation in a different gene I don't > have an > > example of this though. I forgot what it is used for, although I > > used to know...... > > I would also prefer to recommend IMP for these. -- Ben Hitz Senior Scientific Programmer ** Saccharomyces Genome Database ** GO Consortium Stanford University ** hitz at genome.stanford.edu From hitz at genome.Stanford.EDU Mon Sep 10 16:34:28 2007 From: hitz at genome.Stanford.EDU (Benjamin Hitz) Date: Mon, 10 Sep 2007 16:34:28 -0700 Subject: [go] Putting method/program names into the with field for ISS In-Reply-To: References: Message-ID: <41A9BFCC-C2CD-439C-A494-EC94B0344BF5@genome.stanford.edu> Maybe this could exactly be the distinction between ISS and RCA. If you can specify a WITH value which corresponds to a single sequence, structure, or "family" (read as HMM or other statistical model) then it's ISS. Otherwise it's RCA (if curated, obv.) Ben On Sep 10, 2007, at 4:20 PM, Karen Christie wrote: > Putting method/program names into the with field for ISS > -------------------------------------------------------- > > I've reviewed several papers where ISS is the appropriate code, but > for which only a method could be placed into the with field. Thus, I > have some comments on how we might want to do this. I'll start with a > little background. > > At the last GO meeting, we agreed to "Always use a WITH column for IEA > and ISS, containing a program name if necessary. For example, make a > ref to tRNAscan." However, we did not work out how to implement doing > this. > > As phrased in the minutes, it sounds like the idea is just to put the > name of the method in the with column. If that's all that is required > then it's fairly simple to find an appropriate text string from a > paper to put in the with column. However, I'm kind of assuming that we > don't want to allow uncontrolled text strings in the with column mixed > in with things of the format namespace:ID. > > Currently, to put something in the with column, it must have a > namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names > or methods, there are a couple problems with trying to put them into > this type of format. One is that some of the methods to which research > refer are not given an official name. The second, which applies to all > the papers I've read so far, is that none of them have a namespace. > > If we need to format these in a way that is compatible with the > namespace:ID format, then GO could generate a 'database' of collected > methods. An entry in the GO.xrf_abbs file like the one below could > define a namespace for such a collection. > > abbreviation: GO_CM > database: Gene Ontology Database collected methods > object: Accession (for collected method) > example_id: GO_CM:0000001 > > Then for the second part, we'd have to start a collection of these > various methods, probably just a file somewhat like the GO.xrf_abbs > file. For this, there are a couple issues to deal with: > > 1) The authors of methods don't always give them a clear name. > > 2) There isn't always a single source reference. For programmatic > methods, there is often a single source reference. However, for the > consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't > be comfortable designating a single reference as the source. In these > cases, I'd be happier if we could associate a number of relevant refs > to the 'method'. In other cases, an algorithm is mentioned by name, > but no reference is cited. > > However, with those issues in mind, perhaps collecting this > information would work. > > - accession: accession ID given by GO > > - method name: the name given to a program by the authors, when > available, or a descriptive name based on the paper > > - developed in reference: the ID, e.g. PMID:xxxxx, for the reference > describing the development of a method, when applicable, but would not > be required. Can be filled with Not Applicable) for cases like 'box > C/D snoRNA consensus' where there isn't a specific program that was > developed. I don't know how we want to deal with cases like > 'TMpredict' where they cited a reference that appears irrelevant or > 'Kyte-Doolittle algorithm' where I didn't see a citation for the > algorithm. > > - other references: Useful for cases like 'box C/D snoRNA consensus' > where there isn't a specific program that was developed, but where you > can cite 1 or more references which describe what the consensus is. > > - method classification: maybe this tag isn't necessary, but I thought > it might be useful, particularly if we ever get to a situation where > we have this in a database where you can search on this field. > > Below is what I would fill in for each field for the references listed > at: http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS- > ExamplePapers.html > > The comments in parentheses are just comments to correlate the info > below with the Example papers, and would not be included in the > proposed file. > > accession: GO_CM:0000001 > method name: box C/D snoRNA probabilistic model > developed in reference: PMID:10024243 > method classification: box C/D snoRNA gene prediction > (would be used for example #1) > > accession: GO_CM:0000002 > method name: box C/D snoRNA consensus > developed in reference: Not Applicable > other references: PMID:8674114; PMID:16484372 > method classification: box C/D snoRNA gene prediction > (would be used for example #s 2 & 3) > > accession: GO_CM:0000003 > method name: snoGPS > developed in reference: PMID:15306656 > method classification: box H/ACA snoRNA gene prediction > (would be used for example #4) > > accession: GO_CM:0000004 > method name: box H/ACA snoRNA consensus > developed in reference: Not Applicable > other references: PMID:12007400 > method classification: box H/ACA snoRNA gene prediction > (would be used for example #5) > > accession: GO_CM:0000005 > method name: TMpredict > developed in reference: ? > (paper #6 cites a reference, but seems incorrect > did not find an appropriate citation via PubMed) > method classification: protein hydrophobicity > (would be used for example #6) > > accession: GO_CM:0000006 > method name: Kyte-Doolittle algorithm > developed in reference: ? (paper #7 does not cite a reference) > method classification: protein hydrophobicity > (would be used for example #7) > > accession: GO_CM:0000007 > method name: tRNAscan > developed in reference: PMID:1870126 > other references: PMID: > method classification: tRNA gene prediction > (The Lowe & Eddy tRNAscan-SE ref referred to this program as > "tRNAscan 1.3 by Fichant and Burks (12)" and cited this > paper. However, this paper doesn't appear to name the > algorithm at al. > > accession: GO_CM:0000008 > method name: Pavesi et al. tRNA prediction algorithm > developed in reference: PMID:8165140 > method classification: tRNA gene prediction > (they don't name their algorithm, so this name is > derived from what they say, in conjuction with how > it was referred to in the Lowe & Eddy paper on > tRNAscan-SE.) > > accession: GO_CM:0000009 > method name: tRNAscan-SE > developed in reference: PMID:9023104 > method classification: tRNA gene prediction > -- Ben Hitz Senior Scientific Programmer ** Saccharomyces Genome Database ** GO Consortium Stanford University ** hitz at genome.stanford.edu From kchris at genome.Stanford.EDU Mon Sep 10 16:41:34 2007 From: kchris at genome.Stanford.EDU (Karen Christie) Date: Mon, 10 Sep 2007 16:41:34 -0700 (PDT) Subject: [go] Putting method/program names into the with field for ISS In-Reply-To: <41A9BFCC-C2CD-439C-A494-EC94B0344BF5@genome.stanford.edu> References: <41A9BFCC-C2CD-439C-A494-EC94B0344BF5@genome.stanford.edu> Message-ID: Read the proposed scope of RCA. This code was requested to cover an entirely different type of analysis than sequence similarity comparisons. In addition, if you read the last example for ISS, provided by Michelle Gwinn on the basis of what TIGR does in their sequence analysis methods, in the proposed new documentation (url below), it states that ISS analyses may include more than one type of evidence. http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#iss -Karen On Mon, 10 Sep 2007, Benjamin Hitz wrote: > > Maybe this could exactly be the distinction between ISS and RCA. > > If you can specify a WITH value which corresponds to a single sequence, > structure, or "family" (read as HMM or other statistical model) then it's > ISS. Otherwise it's RCA (if curated, obv.) > > Ben > > > On Sep 10, 2007, at 4:20 PM, Karen Christie wrote: > >> Putting method/program names into the with field for ISS >> -------------------------------------------------------- >> >> I've reviewed several papers where ISS is the appropriate code, but >> for which only a method could be placed into the with field. Thus, I >> have some comments on how we might want to do this. I'll start with a >> little background. >> >> At the last GO meeting, we agreed to "Always use a WITH column for IEA >> and ISS, containing a program name if necessary. For example, make a >> ref to tRNAscan." However, we did not work out how to implement doing >> this. >> >> As phrased in the minutes, it sounds like the idea is just to put the >> name of the method in the with column. If that's all that is required >> then it's fairly simple to find an appropriate text string from a >> paper to put in the with column. However, I'm kind of assuming that we >> don't want to allow uncontrolled text strings in the with column mixed >> in with things of the format namespace:ID. >> >> Currently, to put something in the with column, it must have a >> namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names >> or methods, there are a couple problems with trying to put them into >> this type of format. One is that some of the methods to which research >> refer are not given an official name. The second, which applies to all >> the papers I've read so far, is that none of them have a namespace. >> >> If we need to format these in a way that is compatible with the >> namespace:ID format, then GO could generate a 'database' of collected >> methods. An entry in the GO.xrf_abbs file like the one below could >> define a namespace for such a collection. >> >> abbreviation: GO_CM >> database: Gene Ontology Database collected methods >> object: Accession (for collected method) >> example_id: GO_CM:0000001 >> >> Then for the second part, we'd have to start a collection of these >> various methods, probably just a file somewhat like the GO.xrf_abbs >> file. For this, there are a couple issues to deal with: >> >> 1) The authors of methods don't always give them a clear name. >> >> 2) There isn't always a single source reference. For programmatic >> methods, there is often a single source reference. However, for the >> consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't >> be comfortable designating a single reference as the source. In these >> cases, I'd be happier if we could associate a number of relevant refs >> to the 'method'. In other cases, an algorithm is mentioned by name, >> but no reference is cited. >> >> However, with those issues in mind, perhaps collecting this >> information would work. >> >> - accession: accession ID given by GO >> >> - method name: the name given to a program by the authors, when >> available, or a descriptive name based on the paper >> >> - developed in reference: the ID, e.g. PMID:xxxxx, for the reference >> describing the development of a method, when applicable, but would not >> be required. Can be filled with Not Applicable) for cases like 'box >> C/D snoRNA consensus' where there isn't a specific program that was >> developed. I don't know how we want to deal with cases like >> 'TMpredict' where they cited a reference that appears irrelevant or >> 'Kyte-Doolittle algorithm' where I didn't see a citation for the >> algorithm. >> >> - other references: Useful for cases like 'box C/D snoRNA consensus' >> where there isn't a specific program that was developed, but where you >> can cite 1 or more references which describe what the consensus is. >> >> - method classification: maybe this tag isn't necessary, but I thought >> it might be useful, particularly if we ever get to a situation where >> we have this in a database where you can search on this field. >> >> Below is what I would fill in for each field for the references listed >> at: >> http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-ExamplePapers.html >> >> The comments in parentheses are just comments to correlate the info below >> with the Example papers, and would not be included in the proposed file. >> >> accession: GO_CM:0000001 >> method name: box C/D snoRNA probabilistic model >> developed in reference: PMID:10024243 >> method classification: box C/D snoRNA gene prediction >> (would be used for example #1) >> >> accession: GO_CM:0000002 >> method name: box C/D snoRNA consensus >> developed in reference: Not Applicable >> other references: PMID:8674114; PMID:16484372 >> method classification: box C/D snoRNA gene prediction >> (would be used for example #s 2 & 3) >> >> accession: GO_CM:0000003 >> method name: snoGPS >> developed in reference: PMID:15306656 >> method classification: box H/ACA snoRNA gene prediction >> (would be used for example #4) >> >> accession: GO_CM:0000004 >> method name: box H/ACA snoRNA consensus >> developed in reference: Not Applicable >> other references: PMID:12007400 >> method classification: box H/ACA snoRNA gene prediction >> (would be used for example #5) >> >> accession: GO_CM:0000005 >> method name: TMpredict >> developed in reference: ? >> (paper #6 cites a reference, but seems incorrect >> did not find an appropriate citation via PubMed) >> method classification: protein hydrophobicity >> (would be used for example #6) >> >> accession: GO_CM:0000006 >> method name: Kyte-Doolittle algorithm >> developed in reference: ? (paper #7 does not cite a reference) >> method classification: protein hydrophobicity >> (would be used for example #7) >> >> accession: GO_CM:0000007 >> method name: tRNAscan >> developed in reference: PMID:1870126 >> other references: PMID: >> method classification: tRNA gene prediction >> (The Lowe & Eddy tRNAscan-SE ref referred to this program as >> "tRNAscan 1.3 by Fichant and Burks (12)" and cited this >> paper. However, this paper doesn't appear to name the >> algorithm at al. >> >> accession: GO_CM:0000008 >> method name: Pavesi et al. tRNA prediction algorithm >> developed in reference: PMID:8165140 >> method classification: tRNA gene prediction >> (they don't name their algorithm, so this name is >> derived from what they say, in conjuction with how >> it was referred to in the Lowe & Eddy paper on >> tRNAscan-SE.) >> >> accession: GO_CM:0000009 >> method name: tRNAscan-SE >> developed in reference: PMID:9023104 >> method classification: tRNA gene prediction >> > > -- > Ben Hitz > Senior Scientific Programmer ** Saccharomyces Genome Database ** GO > Consortium > Stanford University ** hitz at genome.stanford.edu > > From val at sanger.ac.uk Tue Sep 11 01:24:25 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Tue, 11 Sep 2007 09:24:25 +0100 Subject: [go] Putting method/program names into the with field for ISS In-Reply-To: References: <41A9BFCC-C2CD-439C-A494-EC94B0344BF5@genome.stanford.edu> Message-ID: <46E650B9.40901@sanger.ac.uk> I agree with Ben that this would be a useful general distinction. It would prevent people from considering RCA as being similar to 'IEA' and reinforce the fact that curator approval is a requirement. It would also resolve the issue 'what to put in the with column' which keeps recurring for the TMM /GPI and annotations based on signal peptides, tRNA scan and other predictors (where the algorithms model additional constraints on the feature . i.e TMMs/GPI includes hydrophobicity and (I think) spatial information, tRNA scan uses complementary bp info etc. etc.). These could be RCA if approved by a curator, otherwise they would be IEA. ISS would then be ' curator approved' based on alignment only (whether pairwise RBH, multiple alignment, HMM or threading), RCA would be everything else (i.e any functional prediction which was not purely *alignment* based). I don't think this is conficting with the new proposal but it would make the distinction between RCA and ISS and IEA clearer. Val Karen Christie wrote: > Read the proposed scope of RCA. This code was requested to cover an > entirely different type of analysis than sequence similarity comparisons. > > In addition, if you read the last example for ISS, provided by > Michelle Gwinn on the basis of what TIGR does in their sequence > analysis methods, in the proposed new documentation (url below), it > states that ISS analyses may include more than one type of evidence. > > http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#iss > > -Karen > > On Mon, 10 Sep 2007, Benjamin Hitz wrote: > >> >> Maybe this could exactly be the distinction between ISS and RCA. >> >> If you can specify a WITH value which corresponds to a single >> sequence, structure, or "family" (read as HMM or other statistical >> model) then it's ISS. Otherwise it's RCA (if curated, obv.) >> >> Ben >> >> >> On Sep 10, 2007, at 4:20 PM, Karen Christie wrote: >> >>> Putting method/program names into the with field for ISS >>> -------------------------------------------------------- >>> >>> I've reviewed several papers where ISS is the appropriate code, but >>> for which only a method could be placed into the with field. Thus, I >>> have some comments on how we might want to do this. I'll start with a >>> little background. >>> >>> At the last GO meeting, we agreed to "Always use a WITH column for IEA >>> and ISS, containing a program name if necessary. For example, make a >>> ref to tRNAscan." However, we did not work out how to implement doing >>> this. >>> >>> As phrased in the minutes, it sounds like the idea is just to put the >>> name of the method in the with column. If that's all that is required >>> then it's fairly simple to find an appropriate text string from a >>> paper to put in the with column. However, I'm kind of assuming that we >>> don't want to allow uncontrolled text strings in the with column mixed >>> in with things of the format namespace:ID. >>> >>> Currently, to put something in the with column, it must have a >>> namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names >>> or methods, there are a couple problems with trying to put them into >>> this type of format. One is that some of the methods to which research >>> refer are not given an official name. The second, which applies to all >>> the papers I've read so far, is that none of them have a namespace. >>> >>> If we need to format these in a way that is compatible with the >>> namespace:ID format, then GO could generate a 'database' of collected >>> methods. An entry in the GO.xrf_abbs file like the one below could >>> define a namespace for such a collection. >>> >>> abbreviation: GO_CM >>> database: Gene Ontology Database collected methods >>> object: Accession (for collected method) >>> example_id: GO_CM:0000001 >>> >>> Then for the second part, we'd have to start a collection of these >>> various methods, probably just a file somewhat like the GO.xrf_abbs >>> file. For this, there are a couple issues to deal with: >>> >>> 1) The authors of methods don't always give them a clear name. >>> >>> 2) There isn't always a single source reference. For programmatic >>> methods, there is often a single source reference. However, for the >>> consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't >>> be comfortable designating a single reference as the source. In these >>> cases, I'd be happier if we could associate a number of relevant refs >>> to the 'method'. In other cases, an algorithm is mentioned by name, >>> but no reference is cited. >>> >>> However, with those issues in mind, perhaps collecting this >>> information would work. >>> >>> - accession: accession ID given by GO >>> >>> - method name: the name given to a program by the authors, when >>> available, or a descriptive name based on the paper >>> >>> - developed in reference: the ID, e.g. PMID:xxxxx, for the reference >>> describing the development of a method, when applicable, but would not >>> be required. Can be filled with Not Applicable) for cases like 'box >>> C/D snoRNA consensus' where there isn't a specific program that was >>> developed. I don't know how we want to deal with cases like >>> 'TMpredict' where they cited a reference that appears irrelevant or >>> 'Kyte-Doolittle algorithm' where I didn't see a citation for the >>> algorithm. >>> >>> - other references: Useful for cases like 'box C/D snoRNA consensus' >>> where there isn't a specific program that was developed, but where you >>> can cite 1 or more references which describe what the consensus is. >>> >>> - method classification: maybe this tag isn't necessary, but I thought >>> it might be useful, particularly if we ever get to a situation where >>> we have this in a database where you can search on this field. >>> >>> Below is what I would fill in for each field for the references listed >>> at: >>> http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-ExamplePapers.html >>> >>> >>> The comments in parentheses are just comments to correlate the info >>> below with the Example papers, and would not be included in the >>> proposed file. >>> >>> accession: GO_CM:0000001 >>> method name: box C/D snoRNA probabilistic model >>> developed in reference: PMID:10024243 >>> method classification: box C/D snoRNA gene prediction >>> (would be used for example #1) >>> >>> accession: GO_CM:0000002 >>> method name: box C/D snoRNA consensus >>> developed in reference: Not Applicable >>> other references: PMID:8674114; PMID:16484372 >>> method classification: box C/D snoRNA gene prediction >>> (would be used for example #s 2 & 3) >>> >>> accession: GO_CM:0000003 >>> method name: snoGPS >>> developed in reference: PMID:15306656 >>> method classification: box H/ACA snoRNA gene prediction >>> (would be used for example #4) >>> >>> accession: GO_CM:0000004 >>> method name: box H/ACA snoRNA consensus >>> developed in reference: Not Applicable >>> other references: PMID:12007400 >>> method classification: box H/ACA snoRNA gene prediction >>> (would be used for example #5) >>> >>> accession: GO_CM:0000005 >>> method name: TMpredict >>> developed in reference: ? >>> (paper #6 cites a reference, but seems incorrect >>> did not find an appropriate citation via PubMed) >>> method classification: protein hydrophobicity >>> (would be used for example #6) >>> >>> accession: GO_CM:0000006 >>> method name: Kyte-Doolittle algorithm >>> developed in reference: ? (paper #7 does not cite a reference) >>> method classification: protein hydrophobicity >>> (would be used for example #7) >>> >>> accession: GO_CM:0000007 >>> method name: tRNAscan >>> developed in reference: PMID:1870126 >>> other references: PMID: >>> method classification: tRNA gene prediction >>> (The Lowe & Eddy tRNAscan-SE ref referred to this program as >>> "tRNAscan 1.3 by Fichant and Burks (12)" and cited this >>> paper. However, this paper doesn't appear to name the >>> algorithm at al. >>> >>> accession: GO_CM:0000008 >>> method name: Pavesi et al. tRNA prediction algorithm >>> developed in reference: PMID:8165140 >>> method classification: tRNA gene prediction >>> (they don't name their algorithm, so this name is >>> derived from what they say, in conjuction with how >>> it was referred to in the Lowe & Eddy paper on >>> tRNAscan-SE.) >>> >>> accession: GO_CM:0000009 >>> method name: tRNAscan-SE >>> developed in reference: PMID:9023104 >>> method classification: tRNA gene prediction >>> >> >> -- >> Ben Hitz >> Senior Scientific Programmer ** Saccharomyces Genome Database ** GO >> Consortium >> Stanford University ** hitz at genome.stanford.edu >> >> > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From val at sanger.ac.uk Tue Sep 11 01:37:26 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Tue, 11 Sep 2007 09:37:26 +0100 Subject: [go] Putting method/program names into the with field for ISS In-Reply-To: References: Message-ID: <46E653C6.5090207@sanger.ac.uk> I responded to the RCA issue before I saw this. I still think the previous distinction makes sense. I also didn't reilise "with" was mandatory for IEA, and propose that it would be simpler if it was only mandatory *if * there were no supporting PubMed ID. We don't want to make life any more complicated than it already is. Moving some of these 'combination algorithms' to the scope of RCA if curator approved (which they fit perfectly) would simplify things, without any loss of information. Val Karen Christie wrote: > Putting method/program names into the with field for ISS > -------------------------------------------------------- > > I've reviewed several papers where ISS is the appropriate code, but > for which only a method could be placed into the with field. Thus, I > have some comments on how we might want to do this. I'll start with a > little background. > > At the last GO meeting, we agreed to "Always use a WITH column for IEA > and ISS, containing a program name if necessary. For example, make a > ref to tRNAscan." However, we did not work out how to implement doing > this. > > As phrased in the minutes, it sounds like the idea is just to put the > name of the method in the with column. If that's all that is required > then it's fairly simple to find an appropriate text string from a > paper to put in the with column. However, I'm kind of assuming that we > don't want to allow uncontrolled text strings in the with column mixed > in with things of the format namespace:ID. > > Currently, to put something in the with column, it must have a > namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names > or methods, there are a couple problems with trying to put them into > this type of format. One is that some of the methods to which research > refer are not given an official name. The second, which applies to all > the papers I've read so far, is that none of them have a namespace. > > If we need to format these in a way that is compatible with the > namespace:ID format, then GO could generate a 'database' of collected > methods. An entry in the GO.xrf_abbs file like the one below could > define a namespace for such a collection. > > abbreviation: GO_CM > database: Gene Ontology Database collected methods > object: Accession (for collected method) > example_id: GO_CM:0000001 > > Then for the second part, we'd have to start a collection of these > various methods, probably just a file somewhat like the GO.xrf_abbs > file. For this, there are a couple issues to deal with: > > 1) The authors of methods don't always give them a clear name. > > 2) There isn't always a single source reference. For programmatic > methods, there is often a single source reference. However, for the > consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't > be comfortable designating a single reference as the source. In these > cases, I'd be happier if we could associate a number of relevant refs > to the 'method'. In other cases, an algorithm is mentioned by name, > but no reference is cited. > > However, with those issues in mind, perhaps collecting this > information would work. > > - accession: accession ID given by GO > > - method name: the name given to a program by the authors, when > available, or a descriptive name based on the paper > > - developed in reference: the ID, e.g. PMID:xxxxx, for the reference > describing the development of a method, when applicable, but would not > be required. Can be filled with Not Applicable) for cases like 'box > C/D snoRNA consensus' where there isn't a specific program that was > developed. I don't know how we want to deal with cases like > 'TMpredict' where they cited a reference that appears irrelevant or > 'Kyte-Doolittle algorithm' where I didn't see a citation for the > algorithm. > > - other references: Useful for cases like 'box C/D snoRNA consensus' > where there isn't a specific program that was developed, but where you > can cite 1 or more references which describe what the consensus is. > > - method classification: maybe this tag isn't necessary, but I thought > it might be useful, particularly if we ever get to a situation where > we have this in a database where you can search on this field. > > Below is what I would fill in for each field for the references listed > at: > http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-ExamplePapers.html > > > The comments in parentheses are just comments to correlate the info > below with the Example papers, and would not be included in the > proposed file. > > accession: GO_CM:0000001 > method name: box C/D snoRNA probabilistic model > developed in reference: PMID:10024243 > method classification: box C/D snoRNA gene prediction > (would be used for example #1) > > accession: GO_CM:0000002 > method name: box C/D snoRNA consensus > developed in reference: Not Applicable > other references: PMID:8674114; PMID:16484372 > method classification: box C/D snoRNA gene prediction > (would be used for example #s 2 & 3) > > accession: GO_CM:0000003 > method name: snoGPS > developed in reference: PMID:15306656 > method classification: box H/ACA snoRNA gene prediction > (would be used for example #4) > > accession: GO_CM:0000004 > method name: box H/ACA snoRNA consensus > developed in reference: Not Applicable > other references: PMID:12007400 > method classification: box H/ACA snoRNA gene prediction > (would be used for example #5) > > accession: GO_CM:0000005 > method name: TMpredict > developed in reference: ? > (paper #6 cites a reference, but seems incorrect > did not find an appropriate citation via PubMed) > method classification: protein hydrophobicity > (would be used for example #6) > > accession: GO_CM:0000006 > method name: Kyte-Doolittle algorithm > developed in reference: ? (paper #7 does not cite a reference) > method classification: protein hydrophobicity > (would be used for example #7) > > accession: GO_CM:0000007 > method name: tRNAscan > developed in reference: PMID:1870126 > other references: PMID: > method classification: tRNA gene prediction > (The Lowe & Eddy tRNAscan-SE ref referred to this program as > "tRNAscan 1.3 by Fichant and Burks (12)" and cited this > paper. However, this paper doesn't appear to name the > algorithm at al. > > accession: GO_CM:0000008 > method name: Pavesi et al. tRNA prediction algorithm > developed in reference: PMID:8165140 > method classification: tRNA gene prediction > (they don't name their algorithm, so this name is > derived from what they say, in conjuction with how > it was referred to in the Lowe & Eddy paper on > tRNAscan-SE.) > > accession: GO_CM:0000009 > method name: tRNAscan-SE > developed in reference: PMID:9023104 > method classification: tRNA gene prediction > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From val at sanger.ac.uk Tue Sep 11 01:50:39 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Tue, 11 Sep 2007 09:50:39 +0100 Subject: [go] Boundary between IMP and IGI In-Reply-To: <193E607C-FCF1-43CE-8194-8A287164AC90@genome.stanford.edu> References: <193E607C-FCF1-43CE-8194-8A287164AC90@genome.stanford.edu> Message-ID: <46E656DF.10704@sanger.ac.uk> Benjamin Hitz wrote: > > Do we really need IGI and IMP? Is the only difference technically > that IGI = double (or 2+?) mutant, IPI = single mutant? We need. These are *very* different types of biological data. The IMP the annotation is derived in some way directly from the *observable phenotype*, but with IGI it is an inference from the the actual *interaction* (the phenotype may be suppressed or the cell may be dead). Our current use of IGI includes things which aren't 'truly' genetic interactions but on the whole, it is likely that most databases have recorded non- canonical use and these can be filtered (i.e functional complementation by a heterologous system can be filtered because the 'with' column will contain a entry from another taxon. Val > > Ben > > On Sep 10, 2007, at 4:18 PM, Karen Christie wrote: > >> Boundary between IMP and IGI >> ------------------------------------------------------- >> >> In response to the new draft of the evidence code documentation, some >> discussion came up between Midori and Val about the usage of the IGI >> versus the IMP evidence codes. As this issue was not a specific gripe >> of anyone on the Evidence Code Committee, it was not discussed. >> >> However, one of the goals of this revision was to have guidelines that >> make sense and I completely see the point that it doesn't really make >> sense to say that making an inference from a strain with one mutation >> is a genetic interaction, even when you are annotating a gene other >> than the one that is mutant. >> >> We were also asked to make a decision tree/flow chart for evidence >> code decisions (I have a draft I'll send out later), and I think it >> would be a much simpler decision if there was a clear line between 1 >> mutant gene and multiple mutant genes. >> >> I think it would make a lot more sense if any annotations made on the >> basis of mutation, or comparison between alleles, of a single gene >> should use IMP. Since we already allow use of the with field for IMP >> to record the mutant allele, it might make more sense to use IMP for >> any annotation based on a phenotype of a single gene and just record >> the mutant allele in the with field. Since not all groups track >> alleles, perhaps we should also allow with for IMP to contain the name >> of the gene without specifically designating an allele. >> >> Below is transcript of the discussion that occurred on this issue. >> >> -Karen >> >> >> **IMP: >> >>> mutation in gene B provides information about gene A being >>> annotated. For this type of experiment, use the IGI code. and IGI: >>> Inference about one gene drawn from the phenotype of a mutation in a >>> different gene >> >> >> Midori (15 Jun 2007): >> I have always disagreed with this usage: I've argued that IMP >> would be >> more appropriate, because in the examples given, only one gene is >> mutated, so the "combination of alterations" criterion for IGI is not >> met. But it's an argument that I lost years ago. Oh well. >> >> Val (22 Jun 2007): >> This is still a bit is unclear to me >> >> "We also use this code for situations where a mutation in gene A >> provides information about the function, process, or component of >> gene >> B. If a mutation in gene A causes a mislocalization of gene B, gene A >> is annotated to protein localization with gene B in the with/from >> column using IGI." >> >> In the protein localization example above a mutation in gene A is >> providing information about gene A (protein localization) not about >> gene B (the protein localized). >> >> I have made a number of these type of annotations to 'protein >> localization, (the fission yeast community are very keen on >> localization dependency experiments for functionally connected gene >> products). However, I thought I had used the wrong evidence code >> (using the existing documentation) and that they should be IMP (I >> wanted to capture the protein localized and at the time I had no >> other >> way to do it). These were on my todo list to fix. It now seems they >> are OK as IGI, so I just wanted to double check......... >> >> The original documentation says: >> # Inference about one gene drawn from the phenotype of a mutation >> in a >> different gene I don't have an example of this though. I forgot what >> it is used for, although I used to know...... >> >> Midori (22 Jun 2007, in response to Val): >> > I have made a number of these type of annotations to 'protein >> > localization, (the fission yeast community are very keen on >> > localization dependency experiments for functionally connected gene >> > products). However, I thought I had used the wrong evidence code >> > (using the existing documentation) and that they should be IMP (I >> > wanted to capture the protein localized and at the time I had no >> > other way to do it). These were on my todo list to fix. It now >> > seems they are OK as IGI, so I just wanted to double check......... >> >> Your annotations are consistent with the existing documentation. What >> I'm saying is that I think the documentation should recommend IMP for >> these. >> >> I think I still wouldn't put B is 'with' with IMP, because a few >> groups would put the allele of A used in the experiment, and others >> would leave 'with' blank. >> >> > The original documentation says: # Inference about one gene drawn >> > from the phenotype of a mutation in a different gene I don't >> have an >> > example of this though. I forgot what it is used for, although I >> > used to know...... >> >> I would also prefer to recommend IMP for these. > > > -- > Ben Hitz > Senior Scientific Programmer ** Saccharomyces Genome Database ** GO > Consortium > Stanford University ** hitz at genome.stanford.edu > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From val at sanger.ac.uk Tue Sep 11 02:23:07 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Tue, 11 Sep 2007 10:23:07 +0100 Subject: [go] Boundary between IMP and IGI In-Reply-To: References: Message-ID: <46E65E7B.3060800@sanger.ac.uk> The pombe community are very fond of 'localization dependency' experiments to dissect the pathways involved in the function and formation of large proteinaceous complexes such as the spindle pole body, or the polarisome, or signaling networks like the SIN. For example: PMID:12034771 PMID:11676915 PMID: 16775007 PMID: 10864871 I've curated these as IGI (in accordance with current documentation), but I'd be happy to move them to IMP as only a single gene is mutated. It has never bothered me too much that they are IGI because one could predict that if there is a 'localization dependency', and because the experiments are targeted at gene products known to exist in the same complex or pathway, and have the same phenotype when mutated, then the pair of genes would also display a genetic interaction (although these experiments don't show one). If I did move these to IMP I would like to be able to continue to capture the 'gene product' which didn't localize properly in the 'with' column. But would be confusing because this is usually the allele for IMP. Alternatively I could just filter out this from the GO submission file (although it is biologically useful information). Val Karen Christie wrote: > Boundary between IMP and IGI > ------------------------------------------------------- > > In response to the new draft of the evidence code documentation, some > discussion came up between Midori and Val about the usage of the IGI > versus the IMP evidence codes. As this issue was not a specific gripe > of anyone on the Evidence Code Committee, it was not discussed. > > However, one of the goals of this revision was to have guidelines that > make sense and I completely see the point that it doesn't really make > sense to say that making an inference from a strain with one mutation > is a genetic interaction, even when you are annotating a gene other > than the one that is mutant. > > We were also asked to make a decision tree/flow chart for evidence > code decisions (I have a draft I'll send out later), and I think it > would be a much simpler decision if there was a clear line between 1 > mutant gene and multiple mutant genes. > > I think it would make a lot more sense if any annotations made on the > basis of mutation, or comparison between alleles, of a single gene > should use IMP. Since we already allow use of the with field for IMP > to record the mutant allele, it might make more sense to use IMP for > any annotation based on a phenotype of a single gene and just record > the mutant allele in the with field. Since not all groups track > alleles, perhaps we should also allow with for IMP to contain the name > of the gene without specifically designating an allele. > > Below is transcript of the discussion that occurred on this issue. > > -Karen > > > **IMP: > >> mutation in gene B provides information about gene A being >> annotated. For this type of experiment, use the IGI code. and IGI: >> Inference about one gene drawn from the phenotype of a mutation in a >> different gene > > > Midori (15 Jun 2007): > I have always disagreed with this usage: I've argued that IMP would be > more appropriate, because in the examples given, only one gene is > mutated, so the "combination of alterations" criterion for IGI is not > met. But it's an argument that I lost years ago. Oh well. > > Val (22 Jun 2007): > This is still a bit is unclear to me > > "We also use this code for situations where a mutation in gene A > provides information about the function, process, or component of gene > B. If a mutation in gene A causes a mislocalization of gene B, gene A > is annotated to protein localization with gene B in the with/from > column using IGI." > > In the protein localization example above a mutation in gene A is > providing information about gene A (protein localization) not about > gene B (the protein localized). > > I have made a number of these type of annotations to 'protein > localization, (the fission yeast community are very keen on > localization dependency experiments for functionally connected gene > products). However, I thought I had used the wrong evidence code > (using the existing documentation) and that they should be IMP (I > wanted to capture the protein localized and at the time I had no other > way to do it). These were on my todo list to fix. It now seems they > are OK as IGI, so I just wanted to double check......... > > The original documentation says: > # Inference about one gene drawn from the phenotype of a mutation in a > different gene I don't have an example of this though. I forgot what > it is used for, although I used to know...... > > Midori (22 Jun 2007, in response to Val): > > I have made a number of these type of annotations to 'protein > > localization, (the fission yeast community are very keen on > > localization dependency experiments for functionally connected gene > > products). However, I thought I had used the wrong evidence code > > (using the existing documentation) and that they should be IMP (I > > wanted to capture the protein localized and at the time I had no > > other way to do it). These were on my todo list to fix. It now > > seems they are OK as IGI, so I just wanted to double check......... > > Your annotations are consistent with the existing documentation. What > I'm saying is that I think the documentation should recommend IMP for > these. > > I think I still wouldn't put B is 'with' with IMP, because a few > groups would put the allele of A used in the experiment, and others > would leave 'with' blank. > > > The original documentation says: # Inference about one gene drawn > > from the phenotype of a mutation in a different gene I don't have an > > example of this though. I forgot what it is used for, although I > > used to know...... > > I would also prefer to recommend IMP for these. > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From hjd at informatics.jax.org Tue Sep 11 06:25:30 2007 From: hjd at informatics.jax.org (Harold Drabkin) Date: Tue, 11 Sep 2007 09:25:30 -0400 Subject: [go] finishing up Evidence Code Issues In-Reply-To: References: Message-ID: <46E6974A.4050704@informatics.jax.org> We are still not happy at all with the ISS, so I'd like it added to the potential discussion if it hasn't been done already. hjd; > Hi, > > Since I'm due on September 20th and will be going on maternity leave > shortly before the GO meeting, Mike asked me to send these remaining > items to finish up the Evidence Code documentation directly to the > list to at least get the discussion started. Some issues may need to > be discussed at the GO meeting as well. > > This email will contain some responses to Midori's last email and a > few other email comments to resolve some minor comments on the current > draft of the new Evidence Code documentation. I will send separate > emails to deal with each of these specific issues: > > 1. Restriction that all unknowns MUST use ND > > 2. IMP vs IGI for single gene mutations, regardless of gene being > annotated > > 3. How to put program or method names in the with column for ISS > > 4. Scope of the RCA evidence code > > For both issues 2 and 4, I think that the recommendations I've made > will help make it possible to create a decision tree/flowchart that is > fairly simple and clear. I'll send a very rough draft of a flowchart > separately as well. > > Note that for both #s 3 and 4, I have put some supplemental info into > html docs in my personal space. I did not spend much time doing html > formatting for these docs, on the thought that people might prefer to > move them to the GOC wiki. However, as the Evidence Code Committee was > not designated as a Working Group, I have no idea where to put them > within the wiki structure. If a spot is designated for them, they can > be moved to the wiki. > > -Karen > > > > Responses and comments on things in red on this page: > > http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml > > > 1. GO_REF documentation > >> We should have documentation that explains GO_REF's and links to it >> when we refer to them. > > Midori (15 Jun 2007): > Links can go to the existing GO References page: > > http://www.geneontology.org/cgi-bin/references.cgi > > I can write up a description (which will be brief; there's not an > enormous amount to say) and give it to Amelia to be added to the blurb > at the top of this page. The plain text file from which the web page > is generated contains a brief description of the format, which could > be HTMLified and also added to the blurb if it would be useful. > > Karen (9 Sept 2007): > > Please do. It would also be good if the page for the GO_REFs is made > easier to find in general in our documentation. > > > 2. ChEBI IDs in with field? > >> Do we allow things like ChEBI IDs in the with field? > > Midori (15 Jun 2007): > I would say yes. > > Karen (9 Sept 2007): > > Perhaps we should make this a quick agenda item for the next GO > meeting, so that people can ratify this face to face, unless we get an > overwhelming response via email to proceed with allowing this new ID > for the with field. > > 3. IMP examples > >> any more positive examples for IMP?, e.g. phenotypic similarity > > Midori (15 Jun 2007): > Dredged up from email from January 2002 ... > > Erich Schwarz needed to know which code to use for "other mutations > sharing a complex mutant phenotype syndrome with [a well-characterized > mutant]." My comment at the time was: "The situation you've described > is IMP, not IGI, because (if I understand correctly) you're looking at > one mutation at a time. Comparing the phenotype of one mutation to > that of another helps you interpret the meaning, but is not a kind of > genetic interaction." > > I think this still holds. Erich provided some details of an example, > which I can forward if you want. > > Karen (9 Sept 2007): > > We can certainly include it, the more examples the better in my > opinion, but don't send it to me. I'll be going on maternity leave > soon and don't want to be responsible for this getting added. > > > 4. use of with field for NAS > >> The Evidence Code Committee discussed the idea of making GO >> annotations from Reactome entries. ... What does the full group feel >> about the idea of allowing the ID for a database record, when such >> exist, in the with field? > > Midori (15 Jun 2007): > I'm all for including annotations based on Reactome entries -- they > have a well-developed curation system that deeply involves expert > biologists, so the statements in their records are very reliable. > > I am not in favor of putting the Reactome ID in the with field for > these annotations, however, because the Reactome entry does not modify > or supplement the evidence; rather, the entry provides the > evidence. GO would effectively be using a Recatome record as a source > of information about a gene product, so it would make much more sense > to put the Reactome ID in the reference field. > > For the more general database record case, it may be that I don't > sufficiently understand what might go in a GO_REF (or equivalent), so > I don't understand the rationale for allowing 'with' for NAS. > > For the case where the author infers one thing from another, using a > GO ID in 'with' makes more sense, but I think it's not really > necessary because the author (presumably) hasn't actually made any GO > annotations, and hasn't stated observations or conclusions in terms > of, well, GO terms. (Perhaps this will change some day!) Also, note > that we have expressly disallowed the use of 'with' for NAS, so the > script would have to be changed if the use of with-for-NAS is agreed. > > Karen (9 Sept 2007): > > Regarding the idea of allowing Reactome IDs in the with field, the > thought was that it provided the specific information about which > record in Reactome made the statement, but the idea was > controversial even just with the Evidence Code Committee. > > Regarding the idea of allowing GOids for NAS, I think you bring up a > good point that this may not make sense since the author has typically > not stated their statement in terms of a GOid from which an inference > was made. Allowing this may just be more confusing than helpful, > especially since deciding which GOid to put in the with field will > almost always be a curator judgement. > > However, I wasn't one of the proponents of this idea, so those who > are may wish to defend it. > > In any case, rather than adding yet another usage of the with column > that is potentially confusing to users, I could personallyjust go > with not allowing use of the with column at all for NAS. > > > 5. Representation of examples for with/from: > > Susan (14 Jun 2007): > > IPI examples > > Looks good but there something odd about the IPI example, > assuming I am looking at the latest version ok. > > Firstly, the paper is about mouse proteins not Drosophila so could we > change FB to MGI please. Also, I am confused as to why there are three > lines shown - MGI just list the middle one: > > FB:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI > UniProt:protein_2_ID ... > FB:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI > UniProt:protein_2_ID|UniProt:protein_3_ID ... > FB:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI > FB:gene_2_ID > > So unless I'm missing something I suggest we lose the extra lines and > have either: > > MGI:1349216 Abcd3 GO:0005515 PMID:10551832 IPI > UniProt:P33897|UniProt:Q61285 > > OR > > MGI:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI > UniProt:protein_2_ID|UniProt:protein_3_ID > > I'd prefer to include the real identifiers so it isn't a mix of 'real' > and 'example'. > > Similarly there seems to be a mix of FB and SGD db identifiers in the > IGI examples. A possible alternative for IGI is: > > In PMID:9043060, flies simultaneously mutant for three genes: klingon > (klg), sevenless (sev) and Son of sevenless (Sos) are used to show that > klingon participates in R7 photoreceptor fate commitment. This leads to > the annotation: > > FB:FBgn0017590 klg GO:0045466 PMID:9043060 IGI > FB:FBgn0003366|FB:FBgn0001965 > > > Karen (9 Sept 2007): > > I'm all for real examples, but I don't have time to dig them up for > every evidence code. Perhaps we could distribute this task around, so > that we have multiple real examples for each evidence code. It would > be good to have at least one example with one entry in the with > column, as well as the one with multiple. It would also be good if > they showed various IDs in the with field. > > This would be a reasonable task if there was one person for each > evidence code to find some real examples, and then hopefully it would > be easy for Amelia to put them in the right format if she was given > all the specific info that should be in the table. > > 6. ISS & with col: >> Note that there should be good evidence that the gene product(s) >> placed in the with/from column actually has the activity, process, >> etc. being annotated. > > Midori (15 Jun 2007): > Do we want to specifically say the "good evidence" should be > *experimental* evidence? Would be consistent with the Ref Genome > requirement, and good practice generally ... > > Karen (9 Sept 2007): > We do have to remember that this Evidence Code document is not just > for the use of the Reference Genomes. While did agree that ISS should > not be made from pairwise BLAST unless the gene to be placed in the > with column has been experimentally characterized, the ISS code covers > more situations than just that. The with field may also contain Pfams, > Prosite, TIGRFAMS, CBS, COG, PANTHER, and we also have to determine > how to include method names here for stuff like tRNAscan and my > specific question about snoRNAs. Michelle Gwinn may wish to comment > on this too. > > > Typos, other trivial fixes: > ------------------------------------------------------- > > 1. IGI > >> Should we add a statement in the paragraph above to IGI, similar to >> the one in IMP, about care in making annotations from gain of >> function mutations ...? > > Midori (15 Jun 2007): > Sounds reasonable to me. > > Karen (9 Sept 2007): > > OK, added to first paragraph of IGI. > > 2. Last paragraph of Introduction: > > Midori (15 Jun 2007): > Change "effect" to "affect" in "... will also effect the quality of > the resulting annotation." > > Karen (9 Sept 2007): > done > > 3. IDA & IMP: > > Midori (15 Jun 2007): > Does "over-expression" really need to be hyphenated? I've seen it > unhyphenated more frequently; also, there's one unhyphenated > occurrence in the document. > > Karen (9 Sept 2007): > changed to unhyphenated > > 4. IGI examples: > > Midori (15 Jun 2007): > The statements "For this type of experiment, use the IGI Code" could > be deleted -- they're redundant with the fact that the description > appears in a list headed "where the IGI code should be used." > > Karen (9 Sept 2007): > > done > > From MLGwinn at jcvi.org Tue Sep 11 07:55:12 2007 From: MLGwinn at jcvi.org (Gwinn-Giglio, Michelle) Date: Tue, 11 Sep 2007 10:55:12 -0400 Subject: [go] Putting method/program names into the with field for ISS References: <41A9BFCC-C2CD-439C-A494-EC94B0344BF5@genome.stanford.edu> <46E650B9.40901@sanger.ac.uk> Message-ID: Hi, I disagree. I think taking this approach would significantly muddy the waters in terms of distinguishing between ISS and RCA. Anything that is based only on sequence analysis, be it simple Blast or vastly more complicated modeling methods, should be ISS because at their heart they are all comparing sequences of known function to ones with unknown function. Whether they do simple alignments to make that comparison or more complicated models, it is still a sequence based analysis. Karen has proposed and I agree that RCA (or ICA as we would like to rename it) should be reserved for cases where multiple types of evidence are combined to reach a conclusion. Sequence-based analysis is one type, two hybrid screens are another type, mass spec is another type, etc. When these different types of evidence are integrated together and a conclusion is drawn from that integration, this should be RCA (or better yet ICA). I think Karen's ideas of how to store the method in the "with" field are good. It might be able to simplified a bit if necessary or if people think it will be confusing, but what she has proposed will thorougly store the information. Michelle -----Original Message----- From: owner-go at genome.stanford.edu on behalf of Valerie Wood Sent: Tue 9/11/2007 4:24 AM To: Karen Christie Cc: Benjamin Hitz; GO mailing list Subject: Re: [go] Putting method/program names into the with field for ISS I agree with Ben that this would be a useful general distinction. It would prevent people from considering RCA as being similar to 'IEA' and reinforce the fact that curator approval is a requirement. It would also resolve the issue 'what to put in the with column' which keeps recurring for the TMM /GPI and annotations based on signal peptides, tRNA scan and other predictors (where the algorithms model additional constraints on the feature . i.e TMMs/GPI includes hydrophobicity and (I think) spatial information, tRNA scan uses complementary bp info etc. etc.). These could be RCA if approved by a curator, otherwise they would be IEA. ISS would then be ' curator approved' based on alignment only (whether pairwise RBH, multiple alignment, HMM or threading), RCA would be everything else (i.e any functional prediction which was not purely *alignment* based). I don't think this is conficting with the new proposal but it would make the distinction between RCA and ISS and IEA clearer. Val Karen Christie wrote: > Read the proposed scope of RCA. This code was requested to cover an > entirely different type of analysis than sequence similarity comparisons. > > In addition, if you read the last example for ISS, provided by > Michelle Gwinn on the basis of what TIGR does in their sequence > analysis methods, in the proposed new documentation (url below), it > states that ISS analyses may include more than one t