From cherry at stanford.edu Wed Aug 1 09:40:07 2007 From: cherry at stanford.edu (Mike Cherry) Date: Wed, 1 Aug 2007 09:40:07 -0700 Subject: [go] For those going to Princeton meetings References: <5C838399-9C01-4939-B600-CC5C73E57073@genomics.princeton.edu> Message-ID: <0F969F79-903C-4FE0-8EEA-CB016408E09E@stanford.edu> Here are directions from Kara on getting to Princeton, New Jersey. Fly to Newark, New Jersey (airport code EWR). From the airport you can take the train to Princeton, then a taxi. You can also take a shuttle from the airport they have a special form for Princeton reservations (http://www.hudsonltd6.com/cgi-bin/air1/res). You can also take a car (http://www.a1limo.com/). More details below from Kara. -Mike Begin forwarded message: > From: Kara Dolinski > Date: August 1, 2007 9:23:17 AM PDT > To: Midori Harris > Cc: Mike Cherry , Judith Blake > > Subject: Re: meeting logistics > > Hi Midori, > > I'm cc:ing Mike C. and Judy, just in case they get asked about > transportation as well--feel free to pass this on to others. > > From Princeton, I definitely recommend Newark over any other > airport; it's the closest and the easiest to get to/from. You can > take the train or take a shuttle straight from the Nassau Inn. > > Train info (sorry, it's overly detailed, these instructions were > for visiting relatives ;) ): > > You can take a NJ transit train (http://www.njtransit.com/hp/ > hp_servlet.srv?hdnPageAction=HomePageTo), the Northeast Corridor > Line, from the Newark airport (EWR) to Princeton; this is fast, > easy, and convenient. You will get off the train at the Princeton > Junction stop. At this point, you have two options. You can hop > directly in a cab, and it will be a 10-15 minute taxi ride (~$20) > to the Nassau Inn from there. There are usually taxi cabs awaiting > passengers at Princeton Junction. Alternatively, rather than > hopping in a cab, you can instead transfer to the train to > Princeton, which is called "The Dinky". This train runs straight to > Princeton University campus, and the walk from the Dinky stop to > the Nassau Inn is about three blocks up a small hill. You can also > call a cab to meet you at this station. Note that you will have to > decide which option you choose when you buy your train ticket > before boarding the train at EWR because the ticket price is > determined by the last stop (Princeton Junction vs. Princeton (aka > the Dinky)). > > Shuttle info: > > http://www.goairporter.com/ > From midori at ebi.ac.uk Thu Aug 2 03:33:43 2007 From: midori at ebi.ac.uk (Midori Harris) Date: Thu, 2 Aug 2007 11:33:43 +0100 (BST) Subject: [go] Ontology development - July highlights Message-ID: Dear GO, The most recent monthly report on ontology content, for July 2007, is now available at: http://gocwiki.geneontology.org/index.php/July2007_ontology_report Some highlights from July: * The content meeting on muscle development took place on July 25-26 , in collaboration with Giorgio Valle, Erika Feltrin, and several muscle experts at the University of Padua (http://wiki.geneontology.org/index.php/Muscle_Development). Follow-up work has begun. * Work on process terms for rRNA processing is complete and has gone live (http://gocwiki.geneontology.org/index.php/RNA_processing). * As noted last month, reorganization of the transporter terms in the molecular function ontology has been completed and gone live (http://wiki.geneontology.org/index.php?title=Transporters). * Work continues to follow up on the meeting on cardiovascular physiology (http://gocwiki.geneontology.org/index.php/Cardiovascular_physiology/development). * Work on GO-Cell cross-products has resumed (http://wiki.geneontology.org/index.php/XP:Meetings#2007.2F07.2F26_:_GO-CL_XPs.2C_next_steps). In August, we'll carry on with work arising from the muscle and cardiovascular meetings, and work on cross-products with the Cell Ontology and ChEBI. As usual, details of small- and medium-scale changes are available in the SourceForge Curator Requests tracker. Please contact us if you want to help out with ontology work in a particular area, or if you have any comments or questions about what's going on. Midori & David on behalf of GO's ontology developers From jblake at informatics.jax.org Thu Aug 2 05:57:32 2007 From: jblake at informatics.jax.org (Judith Blake) Date: Thu, 02 Aug 2007 08:57:32 -0400 Subject: [go] Ontology development - July highlights In-Reply-To: References: Message-ID: <46B1D4BC.90206@informatics.jax.org> Thanks Midori and David This is a very useful summary and I really appreciate the embedded links Judy Midori Harris wrote: > Dear GO, > > The most recent monthly report on ontology content, for July 2007, is > now available at: > > http://gocwiki.geneontology.org/index.php/July2007_ontology_report > > Some highlights from July: > > * The content meeting on muscle development took place on July 25-26 > , in collaboration with Giorgio Valle, Erika Feltrin, and several > muscle experts at the University of Padua > (http://wiki.geneontology.org/index.php/Muscle_Development). Follow-up > work has begun. > > * Work on process terms for rRNA processing is complete and has gone > live (http://gocwiki.geneontology.org/index.php/RNA_processing). > > * As noted last month, reorganization of the transporter terms in the > molecular function ontology has been completed and gone live > (http://wiki.geneontology.org/index.php?title=Transporters). > > * Work continues to follow up on the meeting on cardiovascular > physiology > (http://gocwiki.geneontology.org/index.php/Cardiovascular_physiology/development). > > > * Work on GO-Cell cross-products has resumed > (http://wiki.geneontology.org/index.php/XP:Meetings#2007.2F07.2F26_:_GO-CL_XPs.2C_next_steps). > > > In August, we'll carry on with work arising from the muscle and > cardiovascular meetings, and work on cross-products with the Cell > Ontology and ChEBI. > > As usual, details of small- and medium-scale changes are available in > the SourceForge Curator Requests tracker. Please contact us if you > want to help out with ontology work in a particular area, or if you > have any comments or questions about what's going on. > > Midori & David > on behalf of GO's ontology developers From eurie at genome.Stanford.EDU Thu Aug 2 10:02:06 2007 From: eurie at genome.Stanford.EDU (Eurie Hong) Date: Thu, 2 Aug 2007 10:02:06 -0700 Subject: [go] Standards for 3rd party software tools Message-ID: <23EACF70-0E17-420E-9AD7-4992EED80AC0@genome.stanford.edu> Several months ago, we started a discussion about the requirements for tools that use GO. Before we take any further steps, we want to refresh your memory and get any additional comments or suggestions on this topic: The current draft of the tools requirement can be viewed on the wiki: http://gocwiki.geneontology.org/index.php/Tools_standards The previous email thread regarding this topic: http://genetics.stanford.edu/go-email/email-go/go-arc/go-2007/0241.html If there are no further comments within the next few weeks, we will proceed with contacting the developers of current tools listed on the GO web page and re-organizing these pages. Thanks, Jane, Chris, Eurie From cherry at stanford.edu Thu Aug 2 15:12:16 2007 From: cherry at stanford.edu (Mike Cherry) Date: Thu, 2 Aug 2007 15:12:16 -0700 Subject: [go] cvs command change for GOC server Message-ID: <9D59C122-F04D-491B-9A7C-D1236931B9B9@stanford.edu> If you use cvs to update and commit files to the GOC CVS repository please read on. This change does not affect the GO Public CVS repository. The cvs server has been updated to allow our larger files to be retrieved. Unfortunately this change also breaks one of the cvs client options. Until further notice please do not use the "z" option. It might look like this "-qz3", change that to "-q". The z option specifies the server to transmit gzipped data. For a currently unknown reason if the z option is used the client command always hangs -- never completing or timing out. Removing the z option will result in slower download times. This version does work for the larger files we now have present in the repository so over all this new version is a good thing. Check your scripts and .cvsrc files to see if the z option was included. Sorry for the hassle and for any problems this change may have caused. -Mike From tberardi at acoma.Stanford.EDU Mon Aug 6 11:48:19 2007 From: tberardi at acoma.Stanford.EDU (Tanya Berardini) Date: Mon, 06 Aug 2007 11:48:19 -0700 Subject: [go] 'regulation of gene expression' Message-ID: <46B76CF3.3010501@acoma.stanford.edu> A (long) while back, we'd talked about having a term for 'regulation of gene expression'. I've been searching the email archives without much luck hoping to find out what happened with respect to that item. The last I found was an action item from the Sept. 2002 (!) meeting. Collective memory, please help out! Thanks, Tanya ------------------------------------------------------------------------------------------ Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu The Arabidopsis Information Resource FAX: (650) 325-6857 Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 Department of Plant Biology URL: http://arabidopsis.org/ 260 Panama St. Stanford, CA 94305 ------------------------------------------------------------------------------------------ From kchris at genome.Stanford.EDU Mon Aug 6 12:13:52 2007 From: kchris at genome.Stanford.EDU (Karen Christie) Date: Mon, 6 Aug 2007 12:13:52 -0700 (PDT) Subject: [go] 'regulation of gene expression' In-Reply-To: <46B76CF3.3010501@acoma.stanford.edu> References: <46B76CF3.3010501@acoma.stanford.edu> Message-ID: Hi Tanya, There seems to be a stalled SF item on this topic: [ 1418820 ] gene expression https://sourceforge.net/tracker/index.php?func=detail&aid=1418820&group_id=36855&atid=440764 -Karen On Mon, 6 Aug 2007, Tanya Berardini wrote: > A (long) while back, we'd talked about having a term for 'regulation of gene > expression'. I've been searching the email archives without much luck hoping > to find out what happened with respect to that item. The last I found was an > action item from the Sept. 2002 (!) meeting. > > Collective memory, please help out! > > Thanks, > > Tanya > > > ------------------------------------------------------------------------------------------ > Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu > The Arabidopsis Information Resource FAX: (650) 325-6857 > Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 > Department of Plant Biology URL: http://arabidopsis.org/ > 260 Panama St. > Stanford, CA 94305 > ------------------------------------------------------------------------------------------ > From tberardi at acoma.Stanford.EDU Mon Aug 6 13:45:55 2007 From: tberardi at acoma.Stanford.EDU (Tanya Berardini) Date: Mon, 06 Aug 2007 13:45:55 -0700 Subject: [go] 'regulation of gene expression' In-Reply-To: References: <46B76CF3.3010501@acoma.stanford.edu> Message-ID: <46B78883.4020802@acoma.stanford.edu> Thanks, Karen and Pascale, for the pointers to this item. Maybe we can kick start the discussion on this topic? Tanya Karen Christie wrote: > Hi Tanya, > > There seems to be a stalled SF item on this topic: > > [ 1418820 ] gene expression > https://sourceforge.net/tracker/index.php?func=detail&aid=1418820&group_id=36855&atid=440764 > > > -Karen > > > On Mon, 6 Aug 2007, Tanya Berardini wrote: > >> A (long) while back, we'd talked about having a term for 'regulation >> of gene expression'. I've been searching the email archives without >> much luck hoping to find out what happened with respect to that item. >> The last I found was an action item from the Sept. 2002 (!) meeting. >> >> Collective memory, please help out! >> >> Thanks, >> >> Tanya >> >> >> ------------------------------------------------------------------------------------------ >> >> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >> The Arabidopsis Information Resource FAX: (650) 325-6857 >> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >> Department of Plant Biology URL: http://arabidopsis.org/ >> 260 Panama St. >> Stanford, CA 94305 >> ------------------------------------------------------------------------------------------ >> >> -- ------------------------------------------------------------------------------------------ Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu The Arabidopsis Information Resource FAX: (650) 325-6857 Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 Department of Plant Biology URL: http://arabidopsis.org/ 260 Panama St. Stanford, CA 94305 ------------------------------------------------------------------------------------------ From dph at informatics.jax.org Mon Aug 6 15:07:39 2007 From: dph at informatics.jax.org (David Hill) Date: Mon, 06 Aug 2007 18:07:39 -0400 Subject: [go] 'regulation of gene expression' In-Reply-To: References: <46B76CF3.3010501@acoma.stanford.edu> Message-ID: <46B79BAB.2010302@informatics.jax.org> I thought it was going to be implemented. David Karen Christie wrote: > Hi Tanya, > > There seems to be a stalled SF item on this topic: > > [ 1418820 ] gene expression > https://sourceforge.net/tracker/index.php?func=detail&aid=1418820&group_id=36855&atid=440764 > > > -Karen > > > On Mon, 6 Aug 2007, Tanya Berardini wrote: > >> A (long) while back, we'd talked about having a term for 'regulation >> of gene expression'. I've been searching the email archives without >> much luck hoping to find out what happened with respect to that >> item. The last I found was an action item from the Sept. 2002 (!) >> meeting. >> >> Collective memory, please help out! >> >> Thanks, >> >> Tanya >> >> >> ------------------------------------------------------------------------------------------ >> >> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >> The Arabidopsis Information Resource FAX: (650) 325-6857 >> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >> Department of Plant Biology URL: http://arabidopsis.org/ >> 260 Panama St. >> Stanford, CA 94305 >> ------------------------------------------------------------------------------------------ >> >> From midori at ebi.ac.uk Tue Aug 7 02:30:29 2007 From: midori at ebi.ac.uk (Midori Harris) Date: Tue, 7 Aug 2007 10:30:29 +0100 (BST) Subject: [go] 'regulation of gene expression' In-Reply-To: <46B79BAB.2010302@informatics.jax.org> References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> Message-ID: So did I! I think what happened is simply that the SF item is assigned to Jane, and she's been so busy with various2go mappings and advocacy stuff that she hasn't had a chance to work on her other SF things. m On Mon, 6 Aug 2007, David Hill wrote: > I thought it was going to be implemented. > > David > > Karen Christie wrote: >> Hi Tanya, >> >> There seems to be a stalled SF item on this topic: >> >> [ 1418820 ] gene expression >> https://sourceforge.net/tracker/index.php?func=detail&aid=1418820&group_id=36855&atid=440764 >> >> -Karen >> >> >> On Mon, 6 Aug 2007, Tanya Berardini wrote: >> >>> A (long) while back, we'd talked about having a term for 'regulation of >>> gene expression'. I've been searching the email archives without much >>> luck hoping to find out what happened with respect to that item. The last >>> I found was an action item from the Sept. 2002 (!) meeting. >>> >>> Collective memory, please help out! >>> >>> Thanks, >>> >>> Tanya >>> >>> >>> >>> ------------------------------------------------------------------------------------------ >>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>> Department of Plant Biology URL: http://arabidopsis.org/ >>> 260 Panama St. >>> Stanford, CA 94305 >>> >>> ------------------------------------------------------------------------------------------ >>> > From jane at ebi.ac.uk Tue Aug 7 08:52:28 2007 From: jane at ebi.ac.uk (Jane Lomax) Date: Tue, 07 Aug 2007 16:52:28 +0100 Subject: [go] 'regulation of gene expression' In-Reply-To: References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> Message-ID: <46B8953C.4070600@ebi.ac.uk> Actually, I was just waiting for Harold's blessing on this item... Jane Midori Harris wrote: > So did I! I think what happened is simply that the SF item is assigned > to Jane, and she's been so busy with various2go mappings and advocacy > stuff that she hasn't had a chance to work on her other SF things. > > m > > On Mon, 6 Aug 2007, David Hill wrote: > >> I thought it was going to be implemented. >> >> David >> >> Karen Christie wrote: >>> Hi Tanya, >>> >>> There seems to be a stalled SF item on this topic: >>> >>> [ 1418820 ] gene expression >>> https://sourceforge.net/tracker/index.php?func=detail&aid=1418820&group_id=36855&atid=440764 >>> >>> -Karen >>> >>> >>> On Mon, 6 Aug 2007, Tanya Berardini wrote: >>> >>>> A (long) while back, we'd talked about having a term for >>>> 'regulation of gene expression'. I've been searching the email >>>> archives without much luck hoping to find out what happened with >>>> respect to that item. The last I found was an action item from the >>>> Sept. 2002 (!) meeting. >>>> >>>> Collective memory, please help out! >>>> >>>> Thanks, >>>> >>>> Tanya >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------------------ >>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>> 260 Panama St. >>>> Stanford, CA 94305 >>>> >>>> ------------------------------------------------------------------------------------------ >>>> >> From hjd at informatics.jax.org Tue Aug 7 12:47:42 2007 From: hjd at informatics.jax.org (Harold Drabkin) Date: Tue, 07 Aug 2007 15:47:42 -0400 Subject: [go] 'regulation of gene expression' In-Reply-To: <46B8953C.4070600@ebi.ac.uk> References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> <46B8953C.4070600@ebi.ac.uk> Message-ID: <46B8CC5E.7000306@informatics.jax.org> Moi??? Well; my feeling hasn't changed but everyone else thinks it's a meaningful term. I like it a bit better than the " biosynthesis of" terms. I still feel it doesn't say much if you say gene product x is involved in the regulation of gene expression based on an observation that doing something to x causes the levels of other gene products to change, without knowing actually what is happening. Still seems like to me like making a term to use for incomplete or ill-defined experiments. Just my take. hjd Jane Lomax wrote: > Actually, I was just waiting for Harold's blessing on this item... > > Jane > > Midori Harris wrote: >> So did I! I think what happened is simply that the SF item is >> assigned to Jane, and she's been so busy with various2go mappings and >> advocacy stuff that she hasn't had a chance to work on her other SF >> things. >> >> m >> >> On Mon, 6 Aug 2007, David Hill wrote: >> >>> I thought it was going to be implemented. >>> >>> David >>> >>> Karen Christie wrote: >>>> Hi Tanya, >>>> >>>> There seems to be a stalled SF item on this topic: >>>> >>>> [ 1418820 ] gene expression >>>> https://sourceforge.net/tracker/index.php?func=detail&aid=1418820&group_id=36855&atid=440764 >>>> >>>> -Karen >>>> >>>> >>>> On Mon, 6 Aug 2007, Tanya Berardini wrote: >>>> >>>>> A (long) while back, we'd talked about having a term for >>>>> 'regulation of gene expression'. I've been searching the email >>>>> archives without much luck hoping to find out what happened with >>>>> respect to that item. The last I found was an action item from >>>>> the Sept. 2002 (!) meeting. >>>>> >>>>> Collective memory, please help out! >>>>> >>>>> Thanks, >>>>> >>>>> Tanya >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------ >>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>> 260 Panama St. >>>>> Stanford, CA 94305 >>>>> >>>>> ------------------------------------------------------------------------------------------ >>>>> >>> > From pgaudet at northwestern.edu Tue Aug 7 12:51:18 2007 From: pgaudet at northwestern.edu (Pascale Gaudet) Date: Tue, 07 Aug 2007 15:51:18 -0400 Subject: [go] 'regulation of gene expression' In-Reply-To: <46B8CC5E.7000306@informatics.jax.org> References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> <46B8953C.4070600@ebi.ac.uk> <46B8CC5E.7000306@informatics.jax.org> Message-ID: <46B8CD36.4070106@northwestern.edu> But isn't it already more information than annotating to the root term? Harold Drabkin wrote: > Moi??? > > Well; my feeling hasn't changed but everyone else thinks it's a > meaningful term. I like it a bit better than the " biosynthesis of" > terms. I still feel it doesn't say much if you say gene product x is > involved in the regulation of gene expression based on an observation > that doing something to x causes the levels of other gene products to > change, without knowing actually what is happening. Still seems like > to me like making a term to use for incomplete or ill-defined > experiments. Just my take. > > hjd > > > Jane Lomax wrote: >> Actually, I was just waiting for Harold's blessing on this item... >> >> Jane >> >> Midori Harris wrote: >>> So did I! I think what happened is simply that the SF item is >>> assigned to Jane, and she's been so busy with various2go mappings >>> and advocacy stuff that she hasn't had a chance to work on her other >>> SF things. >>> >>> m >>> >>> On Mon, 6 Aug 2007, David Hill wrote: >>> >>>> I thought it was going to be implemented. >>>> >>>> David >>>> >>>> Karen Christie wrote: >>>>> Hi Tanya, >>>>> >>>>> There seems to be a stalled SF item on this topic: >>>>> >>>>> [ 1418820 ] gene expression >>>>> https://sourceforge.net/tracker/index.php?func=detail&aid=1418820&group_id=36855&atid=440764 >>>>> >>>>> -Karen >>>>> >>>>> >>>>> On Mon, 6 Aug 2007, Tanya Berardini wrote: >>>>> >>>>>> A (long) while back, we'd talked about having a term for >>>>>> 'regulation of gene expression'. I've been searching the email >>>>>> archives without much luck hoping to find out what happened with >>>>>> respect to that item. The last I found was an action item from >>>>>> the Sept. 2002 (!) meeting. >>>>>> >>>>>> Collective memory, please help out! >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Tanya >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------------------ >>>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>>> 260 Panama St. >>>>>> Stanford, CA 94305 >>>>>> >>>>>> ------------------------------------------------------------------------------------------ >>>>>> >>>> >> > > From hjd at informatics.jax.org Tue Aug 7 13:00:00 2007 From: hjd at informatics.jax.org (Harold Drabkin) Date: Tue, 07 Aug 2007 16:00:00 -0400 Subject: [go] 'regulation of gene expression' In-Reply-To: <46B8CD36.4070106@northwestern.edu> References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> <46B8953C.4070600@ebi.ac.uk> <46B8CC5E.7000306@informatics.jax.org> <46B8CD36.4070106@northwestern.edu> Message-ID: <46B8CF40.1050301@informatics.jax.org> IMHO, no; Some "experiments" do not give real information other than to help you design what experiments will. Again, I am signing off on this with grave reservations. hjd Pascale Gaudet wrote: > But isn't it already more information than annotating to the root term? > > Harold Drabkin wrote: >> Moi??? >> >> Well; my feeling hasn't changed but everyone else thinks it's a >> meaningful term. I like it a bit better than the " biosynthesis of" >> terms. I still feel it doesn't say much if you say gene product x is >> involved in the regulation of gene expression based on an >> observation that doing something to x causes the levels of other gene >> products to change, without knowing actually what is happening. Still >> seems like to me like making a term to use for incomplete or >> ill-defined experiments. Just my take. >> >> hjd >> >> >> Jane Lomax wrote: >>> Actually, I was just waiting for Harold's blessing on this item... >>> >>> Jane >>> >>> Midori Harris wrote: >>>> So did I! I think what happened is simply that the SF item is >>>> assigned to Jane, and she's been so busy with various2go mappings >>>> and advocacy stuff that she hasn't had a chance to work on her >>>> other SF things. >>>> >>>> m >>>> >>>> On Mon, 6 Aug 2007, David Hill wrote: >>>> >>>>> I thought it was going to be implemented. >>>>> >>>>> David >>>>> >>>>> Karen Christie wrote: >>>>>> Hi Tanya, >>>>>> >>>>>> There seems to be a stalled SF item on this topic: >>>>>> >>>>>> [ 1418820 ] gene expression >>>>>> https://sourceforge.net/tracker/index.php?func=detail&aid=1418820&group_id=36855&atid=440764 >>>>>> >>>>>> -Karen >>>>>> >>>>>> >>>>>> On Mon, 6 Aug 2007, Tanya Berardini wrote: >>>>>> >>>>>>> A (long) while back, we'd talked about having a term for >>>>>>> 'regulation of gene expression'. I've been searching the email >>>>>>> archives without much luck hoping to find out what happened with >>>>>>> respect to that item. The last I found was an action item from >>>>>>> the Sept. 2002 (!) meeting. >>>>>>> >>>>>>> Collective memory, please help out! >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Tanya >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------------------ >>>>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>>>> 260 Panama St. >>>>>>> Stanford, CA 94305 >>>>>>> >>>>>>> ------------------------------------------------------------------------------------------ >>>>>>> >>>>> >>> >> >> > From val at sanger.ac.uk Wed Aug 8 05:47:43 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Wed, 08 Aug 2007 13:47:43 +0100 Subject: [go] Paper of potential interest to you In-Reply-To: References: Message-ID: <46B9BB6F.2050408@sanger.ac.uk> Mike Cherry wrote: > Manual curation is not sufficient for annotation of genomic databases > William A. Baumgartner, Jr, K. Bretonnel Cohen, Lynne M. Fox, George > Acquaah-Mensah, and Lawrence Hunter > Bioinformatics 2007 23: i41-i48. > > http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/13/ > i41?etoc > > > This was interesting. Before we all decide its a losing battle, it's not quite so doom and gloom as this analysis suggests. By using mouse and fly they chose the 2 models with the single greatest volume of data. It would have been nice to see the combined progress of the GO curated organisms vs. non GO curated organisms (rather than mouse, fly and then the entire Uniprot knowledge base) Using this criteria (at least one GO annotation) they would have identified a 'best case scenario' (left graph of figure one') for both budding and fission yeasts. However, using these methods, they would never show a 'best case scenario' of GO annotation for ANY organism because they extracted the GO data from the Uniprot records (at least this is what they say in the methods), and Uniprot don't include ISS/IC/NAS/TAS/ or most importantly for this analysis ND (I think that is correct isn't it Emily?) And as they mention one reviewer pointed out, it is impossible here to differentiate between a rate limiting factor of the rate of annotation and the rate of discovery, or the relative contributions of either. As an evaluation of GO coverage it would have been more informative if they had used all the GO data. But its difficult to provide an analysis of curation completion unless you know what is known..... -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From camon at ebi.ac.uk Wed Aug 8 06:05:04 2007 From: camon at ebi.ac.uk (camon at ebi.ac.uk) Date: Wed, 8 Aug 2007 14:05:04 +0100 (BST) Subject: [go] Paper of potential interest to you In-Reply-To: <46B9BB6F.2050408@sanger.ac.uk> References: <46B9BB6F.2050408@sanger.ac.uk> Message-ID: <49679.86.147.71.133.1186578304.squirrel@webmail.ebi.ac.uk> Hi Val, The GOA group ( and therefore UniProtKB) do not integrate ISS or IEA from other GOC members at the moment. UniProtKB/Swiss-Prot shows all manual GO annotation (minus ND), filtered by source, UniProtKB shows manual and IEA annotations. One of our colleagues made some comments on this paper: 'the real question that need to be asked is if manual curation capable of keeping up with the growth in biological knowlege (not the growth in sequences). If a curator annotates a protein in a model organism, or an InterPro family, or a protien with a novel function, they are doing so in the belief that they are making (at some level) a generic statement, not just annotating one of the billions of sequences that happen to exist. Observing the continued existence of unannotated (and frequently, according to the current scientific knowlege, unannotable) things is of very little importance: what matters is how much real, transferrable knowledge is recorded in the databases. and they don't even consider that an annotation may or may not be correct, and may or may not be useful. it would be easy to increase metrics of coverage by adding wrong and/or high level GO terms to every protein' It's a shame we were not contacted before publication, im not sure that these papers help the curation effort already hugely understaffed. Evelyn > Mike Cherry wrote: > >> Manual curation is not sufficient for annotation of genomic databases >> William A. Baumgartner, Jr, K. Bretonnel Cohen, Lynne M. Fox, George >> Acquaah-Mensah, and Lawrence Hunter >> Bioinformatics 2007 23: i41-i48. >> >> http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/13/ >> i41?etoc >> >> >> > > > This was interesting. Before we all decide its a losing battle, it's not > quite so doom and gloom as this analysis suggests. > > By using mouse and fly they chose the 2 models with the single greatest > volume of data. It would have been nice to see the combined progress of > the GO curated organisms vs. non GO curated organisms (rather than > mouse, fly and then the entire Uniprot knowledge base) > > Using this criteria (at least one GO annotation) they would have > identified a 'best case scenario' (left graph of figure one') for both > budding and fission yeasts. > > However, using these methods, they would never show a 'best case > scenario' of GO annotation for ANY organism because they extracted the > GO data from the Uniprot records (at least this is what they say in the > methods), and Uniprot don't include ISS/IC/NAS/TAS/ or most importantly > for this analysis ND (I think that is correct isn't it Emily?) > > And as they mention one reviewer pointed out, it is impossible here to > differentiate between a rate limiting factor of the rate of annotation > and the rate of discovery, or the relative contributions of either. > > As an evaluation of GO coverage it would have been more informative if > they had used all the GO data. But its difficult to provide an analysis > of curation completion unless you know what is known..... > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > From camon at ebi.ac.uk Wed Aug 8 06:11:10 2007 From: camon at ebi.ac.uk (camon at ebi.ac.uk) Date: Wed, 8 Aug 2007 14:11:10 +0100 (BST) Subject: [go] Paper of potential interest to you In-Reply-To: <49679.86.147.71.133.1186578304.squirrel@webmail.ebi.ac.uk> References: <46B9BB6F.2050408@sanger.ac.uk> <49679.86.147.71.133.1186578304.squirrel@webmail.ebi.ac.uk> Message-ID: <36342.86.147.71.133.1186578670.squirrel@webmail.ebi.ac.uk> UniProtKB shows manual and IEA > annotations. I meant UniProtKB/Trembl shows manual and IEA GO annotations... so yes GOC would have had more.. Evelyn > Hi Val, > > The GOA group ( and therefore UniProtKB) do not integrate ISS or IEA from > other GOC members at the moment. UniProtKB/Swiss-Prot shows all manual GO > annotation (minus ND), filtered by source, UniProtKB shows manual and IEA > annotations. > > One of our colleagues made some comments on this paper: > > 'the real question that need to be asked is if manual curation capable of > keeping up with the growth in biological knowlege (not the growth in > sequences). If a curator annotates a protein in a model organism, or an > InterPro family, or a protien with a novel function, they are doing so in > the belief that they are making (at some level) a generic statement, not > just annotating one of the billions of sequences that happen to exist. > Observing the continued existence of unannotated (and frequently, > according to the current scientific knowlege, unannotable) things is of > very little importance: what matters is how much real, transferrable > knowledge is recorded in the databases. > > and they don't even consider that an annotation may or may not be correct, > and may or may not be useful. it would be easy to increase metrics of > coverage by adding wrong and/or high level GO terms to every protein' > > It's a shame we were not contacted before publication, im not sure that > these papers help the curation effort already hugely understaffed. > > Evelyn > > >> Mike Cherry wrote: >> >>> Manual curation is not sufficient for annotation of genomic databases >>> William A. Baumgartner, Jr, K. Bretonnel Cohen, Lynne M. Fox, George >>> Acquaah-Mensah, and Lawrence Hunter >>> Bioinformatics 2007 23: i41-i48. >>> >>> http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/13/ >>> i41?etoc >>> >>> >>> >> >> >> This was interesting. Before we all decide its a losing battle, it's not >> quite so doom and gloom as this analysis suggests. >> >> By using mouse and fly they chose the 2 models with the single greatest >> volume of data. It would have been nice to see the combined progress of >> the GO curated organisms vs. non GO curated organisms (rather than >> mouse, fly and then the entire Uniprot knowledge base) >> >> Using this criteria (at least one GO annotation) they would have >> identified a 'best case scenario' (left graph of figure one') for both >> budding and fission yeasts. >> >> However, using these methods, they would never show a 'best case >> scenario' of GO annotation for ANY organism because they extracted the >> GO data from the Uniprot records (at least this is what they say in the >> methods), and Uniprot don't include ISS/IC/NAS/TAS/ or most importantly >> for this analysis ND (I think that is correct isn't it Emily?) >> >> And as they mention one reviewer pointed out, it is impossible here to >> differentiate between a rate limiting factor of the rate of annotation >> and the rate of discovery, or the relative contributions of either. >> >> As an evaluation of GO coverage it would have been more informative if >> they had used all the GO data. But its difficult to provide an analysis >> of curation completion unless you know what is known..... >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. >> > > > From val at sanger.ac.uk Wed Aug 8 06:41:56 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Wed, 08 Aug 2007 14:41:56 +0100 Subject: [go] Paper of potential interest to you In-Reply-To: <49679.86.147.71.133.1186578304.squirrel@webmail.ebi.ac.uk> References: <46B9BB6F.2050408@sanger.ac.uk> <49679.86.147.71.133.1186578304.squirrel@webmail.ebi.ac.uk> Message-ID: <46B9C824.4010900@sanger.ac.uk> Yes it would be more interesting to see a paper which showed the number of papers read and the number of annotations made vs the number of curators (which I doubt has increased much over time). That would be more eye-opening :) I think most people assume that there is some sort of 'curation army' somewhere. camon at ebi.ac.uk wrote: >Hi Val, > >The GOA group ( and therefore UniProtKB) do not integrate ISS or IEA from >other GOC members at the moment. UniProtKB/Swiss-Prot shows all manual GO >annotation (minus ND), filtered by source, UniProtKB shows manual and IEA >annotations. > >One of our colleagues made some comments on this paper: > >'the real question that need to be asked is if manual curation capable of >keeping up with the growth in biological knowlege (not the growth in >sequences). If a curator annotates a protein in a model organism, or an >InterPro family, or a protien with a novel function, they are doing so in >the belief that they are making (at some level) a generic statement, not >just annotating one of the billions of sequences that happen to exist. >Observing the continued existence of unannotated (and frequently, >according to the current scientific knowlege, unannotable) things is of >very little importance: what matters is how much real, transferrable >knowledge is recorded in the databases. > >and they don't even consider that an annotation may or may not be correct, >and may or may not be useful. it would be easy to increase metrics of >coverage by adding wrong and/or high level GO terms to every protein' > >It's a shame we were not contacted before publication, im not sure that >these papers help the curation effort already hugely understaffed. > >Evelyn > > > > >>Mike Cherry wrote: >> >> >> >>>Manual curation is not sufficient for annotation of genomic databases >>>William A. Baumgartner, Jr, K. Bretonnel Cohen, Lynne M. Fox, George >>>Acquaah-Mensah, and Lawrence Hunter >>>Bioinformatics 2007 23: i41-i48. >>> >>>http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/13/ >>>i41?etoc >>> >>> >>> >>> >>> >>This was interesting. Before we all decide its a losing battle, it's not >>quite so doom and gloom as this analysis suggests. >> >>By using mouse and fly they chose the 2 models with the single greatest >>volume of data. It would have been nice to see the combined progress of >>the GO curated organisms vs. non GO curated organisms (rather than >>mouse, fly and then the entire Uniprot knowledge base) >> >>Using this criteria (at least one GO annotation) they would have >>identified a 'best case scenario' (left graph of figure one') for both >>budding and fission yeasts. >> >>However, using these methods, they would never show a 'best case >>scenario' of GO annotation for ANY organism because they extracted the >>GO data from the Uniprot records (at least this is what they say in the >>methods), and Uniprot don't include ISS/IC/NAS/TAS/ or most importantly >>for this analysis ND (I think that is correct isn't it Emily?) >> >>And as they mention one reviewer pointed out, it is impossible here to >>differentiate between a rate limiting factor of the rate of annotation >>and the rate of discovery, or the relative contributions of either. >> >>As an evaluation of GO coverage it would have been more informative if >>they had used all the GO data. But its difficult to provide an analysis >>of curation completion unless you know what is known..... >> >> >> >> >>-- >>The Wellcome Trust Sanger Institute is operated by Genome Research >>Limited, a charity registered in England with number 1021457 and a >>company registered in England with number 2742969, whose registered >>office is 215 Euston Road, London, NW1 2BE. >> >> >> > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From dph at informatics.jax.org Wed Aug 8 06:45:32 2007 From: dph at informatics.jax.org (David Hill) Date: Wed, 08 Aug 2007 09:45:32 -0400 Subject: [go] Paper of potential interest to you In-Reply-To: <46B9C824.4010900@sanger.ac.uk> References: <46B9BB6F.2050408@sanger.ac.uk> <49679.86.147.71.133.1186578304.squirrel@webmail.ebi.ac.uk> <46B9C824.4010900@sanger.ac.uk> Message-ID: <46B9C8FC.2070304@informatics.jax.org> Yes, my take on this paper was that the biggest thing it showed was that we need more curators! Valerie Wood wrote: > > > > Yes it would be more interesting to see a paper which showed the > number of papers read and the number of annotations made vs the number > of curators (which I doubt has increased much over time). That would > be more eye-opening :) I think most people assume that there is some > sort of 'curation army' somewhere. > > > > > > > camon at ebi.ac.uk wrote: > >> Hi Val, >> >> The GOA group ( and therefore UniProtKB) do not integrate ISS or IEA >> from >> other GOC members at the moment. UniProtKB/Swiss-Prot shows all >> manual GO >> annotation (minus ND), filtered by source, UniProtKB shows manual and >> IEA >> annotations. >> >> One of our colleagues made some comments on this paper: >> >> 'the real question that need to be asked is if manual curation >> capable of >> keeping up with the growth in biological knowlege (not the growth in >> sequences). If a curator annotates a protein in a model organism, or an >> InterPro family, or a protien with a novel function, they are doing >> so in >> the belief that they are making (at some level) a generic statement, not >> just annotating one of the billions of sequences that happen to >> exist. Observing the continued existence of unannotated (and frequently, >> according to the current scientific knowlege, unannotable) things is of >> very little importance: what matters is how much real, transferrable >> knowledge is recorded in the databases. >> >> and they don't even consider that an annotation may or may not be >> correct, >> and may or may not be useful. it would be easy to increase metrics of >> coverage by adding wrong and/or high level GO terms to every protein' >> >> It's a shame we were not contacted before publication, im not sure that >> these papers help the curation effort already hugely understaffed. >> >> Evelyn >> >> >> >> >>> Mike Cherry wrote: >>> >>> >>>> Manual curation is not sufficient for annotation of genomic databases >>>> William A. Baumgartner, Jr, K. Bretonnel Cohen, Lynne M. Fox, George >>>> Acquaah-Mensah, and Lawrence Hunter >>>> Bioinformatics 2007 23: i41-i48. >>>> >>>> http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/13/ >>>> i41?etoc >>>> >>>> >>>> >>>> >>> This was interesting. Before we all decide its a losing battle, it's >>> not >>> quite so doom and gloom as this analysis suggests. >>> >>> By using mouse and fly they chose the 2 models with the single greatest >>> volume of data. It would have been nice to see the combined progress of >>> the GO curated organisms vs. non GO curated organisms (rather than >>> mouse, fly and then the entire Uniprot knowledge base) >>> >>> Using this criteria (at least one GO annotation) they would have >>> identified a 'best case scenario' (left graph of figure one') for both >>> budding and fission yeasts. >>> >>> However, using these methods, they would never show a 'best case >>> scenario' of GO annotation for ANY organism because they extracted the >>> GO data from the Uniprot records (at least this is what they say in the >>> methods), and Uniprot don't include ISS/IC/NAS/TAS/ or most importantly >>> for this analysis ND (I think that is correct isn't it Emily?) >>> >>> And as they mention one reviewer pointed out, it is impossible here to >>> differentiate between a rate limiting factor of the rate of annotation >>> and the rate of discovery, or the relative contributions of either. >>> >>> As an evaluation of GO coverage it would have been more informative if >>> they had used all the GO data. But its difficult to provide an analysis >>> of curation completion unless you know what is known..... >>> >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, a charity registered in England with number 1021457 and a >>> company registered in England with number 2742969, whose registered >>> office is 215 Euston Road, London, NW1 2BE. >>> >>> >> >> >> >> >> > > > From cherry at stanford.edu Wed Aug 8 08:07:55 2007 From: cherry at stanford.edu (Mike Cherry) Date: Wed, 8 Aug 2007 08:07:55 -0700 Subject: [go] Paper of potential interest to you In-Reply-To: <46B9C824.4010900@sanger.ac.uk> References: <46B9BB6F.2050408@sanger.ac.uk> <49679.86.147.71.133.1186578304.squirrel@webmail.ebi.ac.uk> <46B9C824.4010900@sanger.ac.uk> Message-ID: You'll all be able to talk to the senior author about all this at the SAB. Larry Hunter is the chair of our advisory board. -Mike On Aug 8, 2007, at 6:41 AM, Valerie Wood wrote: > > Yes it would be more interesting to see a paper which showed the > number of papers read and the number of annotations made vs the > number of curators (which I doubt has increased much over time). > That would be more eye-opening :) I think most people assume that > there is some sort of 'curation army' somewhere. > From jimhu at tamu.edu Thu Aug 9 08:16:57 2007 From: jimhu at tamu.edu (Jim Hu) Date: Thu, 9 Aug 2007 10:16:57 -0500 Subject: [go] Message-ID: <75DA74C2-42E1-491C-A40B-64C6ABB37B7F@tamu.edu> Is there an existing process term for RNA-mediated inhibition of translation as it works in prokaryotes? There are many examples of small RNAs that bind around the translation start site to block initiation (often with the help of hfq) or promote RNA degradation. I'm seeing these annotated by others to GO:0016246 ! RNA interference, but this strikes me as very wrong. No dsRNA, and the processes are evolutionarily unrelated. I'm not crazy about GO:0042868 ! antisense RNA metabolic process for this either. It also goes into the epigenetic path, which I don't think applies here. Jim ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From midori at ebi.ac.uk Thu Aug 9 08:32:32 2007 From: midori at ebi.ac.uk (Midori Harris) Date: Thu, 9 Aug 2007 16:32:32 +0100 (BST) Subject: [go] In-Reply-To: <75DA74C2-42E1-491C-A40B-64C6ABB37B7F@tamu.edu> References: <75DA74C2-42E1-491C-A40B-64C6ABB37B7F@tamu.edu> Message-ID: Hi Jim, I don't think we've yet added a term in this area that was specifically motivated by prokaryote-annotation needs. From your brief description, I agree that RNA interference is wrong, but would its parent be suitable? id: GO:0035194 name: RNA-mediated posttranscriptional gene silencing def: "Any process of gene inactivation (silencing) in which small RNAs trigger degradation of mRNA." [PMID:15020054, PMID:15066275, PMID:15066283] If you need a different term, either as a child of GO:0035194 or placed elsewhere, we're happy to add something. It will help immensely if you can suggest a name, definition and parent(s). Midori On Thu, 9 Aug 2007, Jim Hu wrote: > Is there an existing process term for RNA-mediated inhibition of translation > as it works in prokaryotes? There are many examples of small RNAs that bind > around the translation start site to block initiation (often with the help of > hfq) or promote RNA degradation. I'm seeing these annotated by others to > GO:0016246 ! RNA interference, > > but this strikes me as very wrong. No dsRNA, and the processes are > evolutionarily unrelated. > > I'm not crazy about > > GO:0042868 ! antisense RNA metabolic process > > for this either. It also goes into the epigenetic path, which I don't think > applies here. > > Jim > > > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > From dph at informatics.jax.org Thu Aug 9 08:36:48 2007 From: dph at informatics.jax.org (David Hill) Date: Thu, 09 Aug 2007 11:36:48 -0400 Subject: [go] In-Reply-To: References: <75DA74C2-42E1-491C-A40B-64C6ABB37B7F@tamu.edu> Message-ID: <46BB3490.3080808@informatics.jax.org> I'm no expert in this area, but it sounds like there might be two things going on, one is the degradation and the other is the negative regulation of translation initiation. Is this all part of the same process. I think we should start a SF request for this. David Midori Harris wrote: > Hi Jim, > > I don't think we've yet added a term in this area that was > specifically motivated by prokaryote-annotation needs. From your brief > description, I agree that RNA interference is wrong, but would its > parent be suitable? > > id: GO:0035194 > name: RNA-mediated posttranscriptional gene silencing > def: "Any process of gene inactivation (silencing) in which small RNAs > trigger degradation of mRNA." [PMID:15020054, PMID:15066275, > PMID:15066283] > > If you need a different term, either as a child of GO:0035194 or > placed elsewhere, we're happy to add something. It will help immensely > if you can suggest a name, definition and parent(s). > > Midori > > On Thu, 9 Aug 2007, Jim Hu wrote: > >> Is there an existing process term for RNA-mediated inhibition of >> translation as it works in prokaryotes? There are many examples of >> small RNAs that bind around the translation start site to block >> initiation (often with the help of hfq) or promote RNA degradation. >> I'm seeing these annotated by others to >> GO:0016246 ! RNA interference, >> >> but this strikes me as very wrong. No dsRNA, and the processes are >> evolutionarily unrelated. >> >> I'm not crazy about >> >> GO:0042868 ! antisense RNA metabolic process >> >> for this either. It also goes into the epigenetic path, which I >> don't think applies here. >> >> Jim >> >> >> ===================================== >> Jim Hu >> Associate Professor >> Dept. of Biochemistry and Biophysics >> 2128 TAMU >> Texas A&M Univ. >> College Station, TX 77843-2128 >> 979-862-4054 >> From midori at ebi.ac.uk Thu Aug 9 08:38:09 2007 From: midori at ebi.ac.uk (Midori Harris) Date: Thu, 9 Aug 2007 16:38:09 +0100 (BST) Subject: [go] In-Reply-To: <46BB3490.3080808@informatics.jax.org> References: <75DA74C2-42E1-491C-A40B-64C6ABB37B7F@tamu.edu> <46BB3490.3080808@informatics.jax.org> Message-ID: Yes, the degradation seems to fit the GO:0035194 df, and we could add a more specific child; the other process would go under GO:0045947. m On Thu, 9 Aug 2007, David Hill wrote: > I'm no expert in this area, but it sounds like there might be two things > going on, one is the degradation and the other is the negative regulation of > translation initiation. Is this all part of the same process. I think we > should start a SF request for this. > > David > > Midori Harris wrote: >> Hi Jim, >> >> I don't think we've yet added a term in this area that was specifically >> motivated by prokaryote-annotation needs. From your brief description, I >> agree that RNA interference is wrong, but would its parent be suitable? >> >> id: GO:0035194 >> name: RNA-mediated posttranscriptional gene silencing >> def: "Any process of gene inactivation (silencing) in which small RNAs >> trigger degradation of mRNA." [PMID:15020054, PMID:15066275, PMID:15066283] >> >> If you need a different term, either as a child of GO:0035194 or placed >> elsewhere, we're happy to add something. It will help immensely if you can >> suggest a name, definition and parent(s). >> >> Midori >> >> On Thu, 9 Aug 2007, Jim Hu wrote: >> >>> Is there an existing process term for RNA-mediated inhibition of >>> translation as it works in prokaryotes? There are many examples of small >>> RNAs that bind around the translation start site to block initiation >>> (often with the help of hfq) or promote RNA degradation. I'm seeing these >>> annotated by others to >>> GO:0016246 ! RNA interference, >>> >>> but this strikes me as very wrong. No dsRNA, and the processes are >>> evolutionarily unrelated. >>> >>> I'm not crazy about >>> >>> GO:0042868 ! antisense RNA metabolic process >>> >>> for this either. It also goes into the epigenetic path, which I don't >>> think applies here. >>> >>> Jim >>> >>> >>> ===================================== >>> Jim Hu >>> Associate Professor >>> Dept. of Biochemistry and Biophysics >>> 2128 TAMU >>> Texas A&M Univ. >>> College Station, TX 77843-2128 >>> 979-862-4054 >>> > From jimhu at tamu.edu Thu Aug 9 08:55:56 2007 From: jimhu at tamu.edu (Jim Hu) Date: Thu, 9 Aug 2007 10:55:56 -0500 Subject: [go] In-Reply-To: References: <75DA74C2-42E1-491C-A40B-64C6ABB37B7F@tamu.edu> <46BB3490.3080808@informatics.jax.org> Message-ID: <62C887A7-3F81-4FF4-87E0-0A5E6C6F37B1@tamu.edu> Hi Midori I'll find an expert here at the the phage meeting to help me formulate an SF request then. On Aug 9, 2007, at 10:38 AM, Midori Harris wrote: > Yes, the degradation seems to fit the GO:0035194 df, and we could > add a more specific child; the other process would go under GO: > 0045947. > > m > > On Thu, 9 Aug 2007, David Hill wrote: > >> I'm no expert in this area, but it sounds like there might be two >> things going on, one is the degradation and the other is the >> negative regulation of translation initiation. Is this all part of >> the same process. I think we should start a SF request for this. >> >> David >> >> Midori Harris wrote: >>> Hi Jim, >>> I don't think we've yet added a term in this area that was >>> specifically motivated by prokaryote-annotation needs. From your >>> brief description, I agree that RNA interference is wrong, but >>> would its parent be suitable? >>> id: GO:0035194 >>> name: RNA-mediated posttranscriptional gene silencing >>> def: "Any process of gene inactivation (silencing) in which small >>> RNAs trigger degradation of mRNA." [PMID:15020054, PMID:15066275, >>> PMID:15066283] I'm concerned that the true path takes that up to silencing, which requires that the effect be long-term. I don't think these are. Jim >>> If you need a different term, either as a child of GO:0035194 or >>> placed elsewhere, we're happy to add something. It will help >>> immensely if you can suggest a name, definition and parent(s). >>> Midori >>> On Thu, 9 Aug 2007, Jim Hu wrote: >>>> Is there an existing process term for RNA-mediated inhibition of >>>> translation as it works in prokaryotes? There are many examples >>>> of small RNAs that bind around the translation start site to >>>> block initiation (often with the help of hfq) or promote RNA >>>> degradation. I'm seeing these annotated by others to >>>> GO:0016246 ! RNA interference, >>>> but this strikes me as very wrong. No dsRNA, and the processes >>>> are evolutionarily unrelated. >>>> I'm not crazy about >>>> >>>> GO:0042868 ! antisense RNA metabolic process >>>> for this either. It also goes into the epigenetic path, which I >>>> don't think applies here. >>>> Jim >>>> ===================================== >>>> Jim Hu >>>> Associate Professor >>>> Dept. of Biochemistry and Biophysics >>>> 2128 TAMU >>>> Texas A&M Univ. >>>> College Station, TX 77843-2128 >>>> 979-862-4054 >> ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/go/attachments/20070809/16590774/attachment.html From pj37 at cornell.edu Thu Aug 9 11:54:48 2007 From: pj37 at cornell.edu (Pankaj Jaiswal) Date: Thu, 09 Aug 2007 14:54:48 -0400 Subject: [go] 'regulation of gene expression' In-Reply-To: <46B8CF40.1050301@informatics.jax.org> References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> <46B8953C.4070600@ebi.ac.uk> <46B8CC5E.7000306@informatics.jax.org> <46B8CD36.4070106@northwestern.edu> <46B8CF40.1050301@informatics.jax.org> Message-ID: <46BB62F8.1070208@cornell.edu> I think here there are two aspects that being captured in terms of expression (transcription/translation processes) and regulation of expression. Former is the case whether the gene-x is participating in the transcription/translation process or the later one where it is simply participating/regulating the transcription/translation process that the gene-x would undergo. Since regulation here always has a dependency of another gene (most likely not gene product), I prefer to see it more like a gene to gene interaction data and less of a biological process, because as Harold says often there is insufficient information to support say gene-x is involved in the process of gene-y. In Gramene we have a way to capture this aspect under gene interaction. Pankaj Harold Drabkin wrote: > IMHO, no; > Some "experiments" do not give real information other than to help you > design what experiments will. > Again, I am signing off on this with grave reservations. > > hjd > > Pascale Gaudet wrote: >> But isn't it already more information than annotating to the root term? >> >> Harold Drabkin wrote: >>> Moi??? >>> >>> Well; my feeling hasn't changed but everyone else thinks it's a >>> meaningful term. I like it a bit better than the " biosynthesis of" >>> terms. I still feel it doesn't say much if you say gene product x is >>> involved in the regulation of gene expression based on an >>> observation that doing something to x causes the levels of other gene >>> products to change, without knowing actually what is happening. Still >>> seems like to me like making a term to use for incomplete or >>> ill-defined experiments. Just my take. >>> >>> hjd >>> >>> >>> Jane Lomax wrote: >>>> Actually, I was just waiting for Harold's blessing on this item... >>>> >>>> Jane >>>> >>>> Midori Harris wrote: >>>>> So did I! I think what happened is simply that the SF item is >>>>> assigned to Jane, and she's been so busy with various2go mappings >>>>> and advocacy stuff that she hasn't had a chance to work on her >>>>> other SF things. >>>>> >>>>> m >>>>> >>>>> On Mon, 6 Aug 2007, David Hill wrote: >>>>> >>>>>> I thought it was going to be implemented. >>>>>> >>>>>> David >>>>>> >>>>>> Karen Christie wrote: >>>>>>> Hi Tanya, >>>>>>> >>>>>>> There seems to be a stalled SF item on this topic: >>>>>>> >>>>>>> [ 1418820 ] gene expression >>>>>>> https://sourceforge.net/tracker/index.php?func=detail&aid=1418820&group_id=36855&atid=440764 >>>>>>> >>>>>>> -Karen >>>>>>> >>>>>>> >>>>>>> On Mon, 6 Aug 2007, Tanya Berardini wrote: >>>>>>> >>>>>>>> A (long) while back, we'd talked about having a term for >>>>>>>> 'regulation of gene expression'. I've been searching the email >>>>>>>> archives without much luck hoping to find out what happened with >>>>>>>> respect to that item. The last I found was an action item from >>>>>>>> the Sept. 2002 (!) meeting. >>>>>>>> >>>>>>>> Collective memory, please help out! >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Tanya >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------------------ >>>>>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>>>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>>>>> 260 Panama St. >>>>>>>> Stanford, CA 94305 >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------------------ >>>>>>>> >>>>>> >>>> >>> >>> >> > > -- Pankaj Jaiswal G-15, Bradfield Hall Dept. of Plant Breeding and Genetics Cornell University Ithaca, NY-14853, USA Ph. +1-607-255-3103 / 4199 fax: +1-607-255-6683 From kchris at genome.Stanford.EDU Thu Aug 9 12:42:55 2007 From: kchris at genome.Stanford.EDU (Karen Christie) Date: Thu, 9 Aug 2007 12:42:55 -0700 (PDT) Subject: [go] 'regulation of gene expression' In-Reply-To: <46B8CF40.1050301@informatics.jax.org> References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> <46B8953C.4070600@ebi.ac.uk> <46B8CC5E.7000306@informatics.jax.org> <46B8CD36.4070106@northwestern.edu> <46B8CF40.1050301@informatics.jax.org> Message-ID: I agree with you Harold that 'gene expression' isn't tremendously informative as a GO annotation, but then I don't think that 'cell' is particularly meaningful as a GO annotation either and we already have that term. However, I don't think usefulness in annotation is the only consideration for whether or not to add the term. I am in favor of adding this term because I think it does increase the accuracy of GO's representation of the biology. In addition, if I remember correctly from earlier discussions, it was also PATO people who wanted this term, and I can see that if PATO is leveraging off of GO to describe phenotypes, that it may be meaningful to say that a mutation of gene X affects the 'gene expression' of gene Y in a way that is broad enough that it does not specify how this is occurring. -Karen On Tue, 7 Aug 2007, Harold Drabkin wrote: > IMHO, no; > Some "experiments" do not give real information other than to help you design > what experiments will. > Again, I am signing off on this with grave reservations. > > hjd > > Pascale Gaudet wrote: >> But isn't it already more information than annotating to the root term? >> >> Harold Drabkin wrote: >>> Moi??? >>> >>> Well; my feeling hasn't changed but everyone else thinks it's a meaningful >>> term. I like it a bit better than the " biosynthesis of" terms. I still >>> feel it doesn't say much if you say gene product x is involved in the >>> regulation of gene expression based on an observation that doing >>> something to x causes the levels of other gene products to change, without >>> knowing actually what is happening. Still seems like to me like making a >>> term to use for incomplete or ill-defined experiments. Just my take. >>> >>> hjd >>> >>> >>> Jane Lomax wrote: >>>> Actually, I was just waiting for Harold's blessing on this item... >>>> >>>> Jane >>>> >>>> Midori Harris wrote: >>>>> So did I! I think what happened is simply that the SF item is assigned >>>>> to Jane, and she's been so busy with various2go mappings and advocacy >>>>> stuff that she hasn't had a chance to work on her other SF things. >>>>> >>>>> m >>>>> >>>>> On Mon, 6 Aug 2007, David Hill wrote: >>>>> >>>>>> I thought it was going to be implemented. >>>>>> >>>>>> David >>>>>> >>>>>> Karen Christie wrote: >>>>>>> Hi Tanya, >>>>>>> >>>>>>> There seems to be a stalled SF item on this topic: >>>>>>> >>>>>>> [ 1418820 ] gene expression >>>>>>> https://sourceforge.net/tracker/index.php?func=detail&aid=1418820&group_id=36855&atid=440764 >>>>>>> -Karen >>>>>>> >>>>>>> >>>>>>> On Mon, 6 Aug 2007, Tanya Berardini wrote: >>>>>>> >>>>>>>> A (long) while back, we'd talked about having a term for 'regulation >>>>>>>> of gene expression'. I've been searching the email archives without >>>>>>>> much luck hoping to find out what happened with respect to that item. >>>>>>>> The last I found was an action item from the Sept. 2002 (!) meeting. >>>>>>>> >>>>>>>> Collective memory, please help out! >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Tanya >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------------------ >>>>>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>>>>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325 >>>>>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>>>>> 260 Panama St. >>>>>>>> Stanford, CA 94305 >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------------------ >>>>>> >>>> >>> >>> >> > From cjm at fruitfly.org Thu Aug 9 13:48:49 2007 From: cjm at fruitfly.org (Chris Mungall) Date: Thu, 9 Aug 2007 13:48:49 -0700 Subject: [go] 'regulation of gene expression' In-Reply-To: References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> <46B8953C.4070600@ebi.ac.uk> <46B8CC5E.7000306@informatics.jax.org> <46B8CD36.4070106@northwestern.edu> <46B8CF40.1050301@informatics.jax.org> Message-ID: I agree with Karen - I sympathise with Harold's concerns. I wonder if the problem here may be with "regulation". What if, say, the definition was changed to "directly regulates". I don't have a clear answer yet for how we should determine direct vs indirect. But it seems this may go some way to addressing Harold's concerns. This is something we have to tackle anyway for the "regulates" relation. In addition, we could have computable rules such as "IMP evidence for regulation of gene expression is suspect". On Aug 9, 2007, at 12:42 PM, Karen Christie wrote: > > > I agree with you Harold that 'gene expression' isn't tremendously > informative as a GO annotation, but then I don't think that 'cell' > is particularly meaningful as a GO annotation either and we already > have that term. However, I don't think usefulness in annotation is > the only consideration for whether or not to add the term. > > I am in favor of adding this term because I think it does increase > the accuracy of GO's representation of the biology. > > In addition, if I remember correctly from earlier discussions, it > was also PATO people who wanted this term, and I can see that if > PATO is leveraging off of GO to describe phenotypes, that it may be > meaningful to say that a mutation of gene X affects the 'gene > expression' of gene Y in a way that is broad enough that it does > not specify how this is occurring. > > -Karen > > > On Tue, 7 Aug 2007, Harold Drabkin wrote: > >> IMHO, no; >> Some "experiments" do not give real information other than to help >> you design what experiments will. >> Again, I am signing off on this with grave reservations. >> >> hjd >> >> Pascale Gaudet wrote: >>> But isn't it already more information than annotating to the root >>> term? >>> Harold Drabkin wrote: >>>> Moi??? >>>> Well; my feeling hasn't changed but everyone else thinks it's a >>>> meaningful term. I like it a bit better than the " biosynthesis >>>> of" terms. I still feel it doesn't say much if you say gene >>>> product x is involved in the regulation of gene expression >>>> based on an observation that doing something to x causes the >>>> levels of other gene products to change, without knowing >>>> actually what is happening. Still seems like to me like making a >>>> term to use for incomplete or ill-defined experiments. Just my >>>> take. >>>> hjd >>>> Jane Lomax wrote: >>>>> Actually, I was just waiting for Harold's blessing on this item... >>>>> Jane >>>>> Midori Harris wrote: >>>>>> So did I! I think what happened is simply that the SF item is >>>>>> assigned to Jane, and she's been so busy with various2go >>>>>> mappings and advocacy stuff that she hasn't had a chance to >>>>>> work on her other SF things. >>>>>> m >>>>>> On Mon, 6 Aug 2007, David Hill wrote: >>>>>>> I thought it was going to be implemented. >>>>>>> David >>>>>>> Karen Christie wrote: >>>>>>>> Hi Tanya, >>>>>>>> There seems to be a stalled SF item on this topic: >>>>>>>> [ 1418820 ] gene expression >>>>>>>> https://sourceforge.net/tracker/index.php? >>>>>>>> func=detail&aid=1418820&group_id=36855&atid=440764 -Karen >>>>>>>> On Mon, 6 Aug 2007, Tanya Berardini wrote: >>>>>>>>> A (long) while back, we'd talked about having a term for >>>>>>>>> 'regulation of gene expression'. I've been searching the >>>>>>>>> email archives without much luck hoping to find out what >>>>>>>>> happened with respect to that item. The last I found was an >>>>>>>>> action item from the Sept. 2002 (!) meeting. >>>>>>>>> Collective memory, please help out! >>>>>>>>> Thanks, >>>>>>>>> Tanya >>>>>>>>> >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> ---------------------------- Tanya Berardini, >>>>>>>>> Ph.D. tberardi at acoma.stanford.edu >>>>>>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>>>>>> Carnegie Institution of Washington Tel: (650) 325-1521 >>>>>>>>> ext. 325 >>>>>>>>> Department of Plant Biology URL: http:// >>>>>>>>> arabidopsis.org/ >>>>>>>>> 260 Panama St. >>>>>>>>> Stanford, CA 94305 >>>>>>>>> >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> ---------------------------- >> > From sherlock at genome.Stanford.EDU Thu Aug 9 19:38:24 2007 From: sherlock at genome.Stanford.EDU (Gavin Sherlock) Date: Thu, 9 Aug 2007 19:38:24 -0700 Subject: [go] mapping between DB_Object_ID and DB_Object_Symbol Message-ID: Hi all, An issue came up with GO::TermFinder, because it chokes on files where the relationship between DB_Object_ID and DB_Object_Symbol is not 1:1, and there are a number of files that have for instance a 1:2 relationship between these columns, e.g.: GeneDB_Spombe: SPCC777.13 maps to SPCC777.13, vps35 pseudocap: PA5429 maps to aspA, adhA RGD: RGD:1359623 maps to Tuba4a, Tuba4 WB: WBGene00000386 maps to cdc-25.1, cdc25.1 My question is, should this be a 1:1 relationship, and the annotation files checking script needs to reject files that deviate from that (presumably these additional names would become synonyms instead), or is a 1:2 or more relationship allowed between those columns, in which case, I'll have to modify GO::TermFinder appropriately. As an additional data point, the pombe file actually lists both SPCC777.13 and vps35 as synonyms for the gene too : whitbread 1001 % grep 'SPCC777.13' gene_association.GeneDB_Spombe GeneDB_Spombe SPCC777.13 SPCC777.13 GO: 0003674 GO_REF:0000015 ND F gene taxon:4896 20070711GeneDB_Spombe GeneDB_Spombe SPCC777.13 vps35 GO:0005768 PMID: 16622069 IMP C retromer complex subunit Vps35 SPCC777.13|vps35 gene taxon:4896 20060424 GeneDB_Spombe GeneDB_Spombe SPCC777.13 vps35 GO:0030904 PMID: 16622069 IMP C retromer complex subunit Vps35 SPCC777.13|vps35 gene taxon:4896 20040625 GeneDB_Spombe GeneDB_Spombe SPCC777.13 vps35 GO:0030904 PMID: 16622069 ISS SGD:S000003690 C retromer complex subunit Vps35 SPCC777.13|vps35gene taxon:4896 20040625 GeneDB_Spombe GeneDB_Spombe SPCC777.13 vps35 GO:0006886 PMID: 16622069 IMP P retromer complex subunit Vps35 SPCC777.13|vps35 gene taxon:4896 20040625 GeneDB_Spombe GeneDB_Spombe SPCC777.13 vps35 GO:0042147 PMID: 16622069 IMP P retromer complex subunit Vps35 SPCC777.13|vps35 gene taxon:4896 20060424 GeneDB_Spombe GeneDB_Spombe SPCC777.13 vps35 GO:0030437 PMID: 15189449 IMP P retromer complex subunit Vps35 SPCC777.13|vps35 gene taxon:4896 20040625 GeneDB_Spombe GeneDB_Spombe SPCC777.13 vps35 GO:0005829 PMID: 16823372 IDA C retromer complex subunit Vps35 SPCC777.13|vps35 gene taxon:4896 20060724 GeneDB_Spombe - is there a rule (I couldn't find one) that says the synonyms should not repeat the DB_Object_ID and DB_Object_Symbol, or should there be? Would it save any space in the file sizes? Cheers, Gavin ________________________________________________________ Gavin Sherlock Dept. of Genetics S201A, Grant Building, Stanford University Medical School, Stanford, CA 94305-5120 Tel: 650 498 6012 Fax: 650 724 3701 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://fafner.stanford.edu/pipermail/go/attachments/20070809/c0331bce/attachment.html From cjm at fruitfly.org Thu Aug 9 20:30:04 2007 From: cjm at fruitfly.org (Chris Mungall) Date: Thu, 9 Aug 2007 20:30:04 -0700 Subject: [go] mapping between DB_Object_ID and DB_Object_Symbol In-Reply-To: References: Message-ID: <035143A8-B2F4-4BDF-BE0F-28548F534B8E@fruitfly.org> I think it should be 1:1 for any one DB. Our model is that every distinct annotated entity has a single designated preferred symbol, the rest should go in the synonyms. I think it's best not to repeat symbols as synonyms, as you lead people to believe that these will always be present, which may potentially lead to them implementing buggy software (if they are extremely sloppy). Those writing software correctly have to defensively implement some kind of filter, if they want to avoid reporting back (mildly confusing) duplicates to their users. Consistency is always a good thing. I think the 1:1 violation is more serious though On Aug 9, 2007, at 7:38 PM, Gavin Sherlock wrote: > Hi all, > > An issue came up with GO::TermFinder, because it chokes on files > where the relationship between DB_Object_ID and DB_Object_Symbol is > not 1:1, and there are a number of files that have for instance a > 1:2 relationship between these columns, e.g.: > > GeneDB_Spombe: SPCC777.13 maps to SPCC777.13, vps35 > pseudocap: PA5429 maps to aspA, adhA > RGD: RGD:1359623 maps to Tuba4a, Tuba4 > WB: WBGene00000386 maps to cdc-25.1, cdc25.1 > > My question is, should this be a 1:1 relationship, and the > annotation files checking script needs to reject files that deviate > from that (presumably these additional names would become synonyms > instead), or is a 1:2 or more relationship allowed between those > columns, in which case, I'll have to modify GO::TermFinder > appropriately. > > As an additional data point, the pombe file actually lists both > SPCC777.13 and vps35 as synonyms for the gene too : > > whitbread 1001 % grep 'SPCC777.13' gene_association.GeneDB_Spombe > GeneDB_Spombe SPCC777.13 SPCC777.13 GO: > 0003674 GO_REF:0000015 ND > F gene taxon:4896 20070711GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0005768 > PMID:16622069 IMP C retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20060424 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0030904 > PMID:16622069 IMP C retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20040625 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0030904 > PMID:16622069 ISS SGD:S000003690 C retromer complex > subunit Vps35 SPCC777.13|vps35gene taxon:4896 > 20040625 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0006886 > PMID:16622069 IMP P retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20040625 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0042147 > PMID:16622069 IMP P retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20060424 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0030437 > PMID:15189449 IMP P retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20040625 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0005829 > PMID:16823372 IDA C retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20060724 GeneDB_Spombe > > - is there a rule (I couldn't find one) that says the synonyms > should not repeat the DB_Object_ID and DB_Object_Symbol, or should > there be? Would it save any space in the file sizes? > > Cheers, > Gavin > ________________________________________________________ > > Gavin Sherlock > Dept. of Genetics > S201A, Grant Building, > Stanford University Medical School, > Stanford, > CA 94305-5120 > > Tel: 650 498 6012 > Fax: 650 724 3701 > > From cjm at fruitfly.org Thu Aug 9 21:02:27 2007 From: cjm at fruitfly.org (Chris Mungall) Date: Thu, 9 Aug 2007 21:02:27 -0700 Subject: [go] "GO compliance" Message-ID: <469475C8-8EE5-4920-A3B7-AFA6C7D50FCC@fruitfly.org> I didn't know we offered a certificate of compliance for ontologies? http://www.hprd.org/FAQ "Our ontology is fully GO compliant." If we did, I think we'd only offer it to databases that implement GO queries correctly. I did a quick experiment and got: Your search for Sarcoplasm in Localization reports 2 matches. Your search for Sarcoplasmic reticulum in Localization reports 27 matches. A shame as the HPRD is a nice resource otherwise. If only they exposed their API rather than locking everything up into a single warehouse forcing you to use their query interface.. From cherry at stanford.edu Thu Aug 9 21:13:57 2007 From: cherry at stanford.edu (Mike Cherry) Date: Thu, 9 Aug 2007 21:13:57 -0700 Subject: [go] mapping between DB_Object_ID and DB_Object_Symbol In-Reply-To: References: Message-ID: <6770B231-403A-4E0E-BBED-2F07E641770E@stanford.edu> I think 1:1 is the intent. The documentation says the DB_Object_Symbol field has cardinality of 1. The checking script is looking for the pipe symbol in that field. It requires just one symbol, and will not allow zero or more than one symbol to be started on a line. The checking script will not find multiple relationships if they are spread across multiple lines in the file. The filtering script is concerned with format of the information, cardinality, and some very basic things like is an abbreviation okay. As written is does not compare two lines within the file, it just checked each line independently. A check of the database could report these errors. There is an easy UNIX command method to check for this problem. For example with the pombe file, its all one long command on one line: % gzcat gene_association.GeneDB_Spombe.gz | cut -f2,3 | sort -u | cut -f1 | sort | uniq -c | sort -rn | grep -v ' 1 ' Any ID in the result that has a number greater than 1 is an ID that has more than 1 symbol associated somewhere within the gene association file. For the current pombe file that would be 388 of the 5073 gene IDs. Yes the RGD (96), pseudocap (1) and WormBase (16) gene association files all have a few of this type of issue. -Mike On Aug 9, 2007, at 7:38 PM, Gavin Sherlock wrote: > Hi all, > > An issue came up with GO::TermFinder, because it chokes on files > where the relationship between DB_Object_ID and DB_Object_Symbol is > not 1:1, and there are a number of files that have for instance a > 1:2 relationship between these columns, e.g.: > > GeneDB_Spombe: SPCC777.13 maps to SPCC777.13, vps35 > pseudocap: PA5429 maps to aspA, adhA > RGD: RGD:1359623 maps to Tuba4a, Tuba4 > WB: WBGene00000386 maps to cdc-25.1, cdc25.1 > > My question is, should this be a 1:1 relationship, and the > annotation files checking script needs to reject files that deviate > from that (presumably these additional names would become synonyms > instead), or is a 1:2 or more relationship allowed between those > columns, in which case, I'll have to modify GO::TermFinder > appropriately. > > As an additional data point, the pombe file actually lists both > SPCC777.13 and vps35 as synonyms for the gene too : > > whitbread 1001 % grep 'SPCC777.13' gene_association.GeneDB_Spombe > GeneDB_Spombe SPCC777.13 SPCC777.13 GO: > 0003674 GO_REF:0000015 ND > F gene taxon:4896 20070711GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0005768 > PMID:16622069 IMP C retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20060424 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0030904 > PMID:16622069 IMP C retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20040625 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0030904 > PMID:16622069 ISS SGD:S000003690 C retromer complex > subunit Vps35 SPCC777.13|vps35gene taxon:4896 > 20040625 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0006886 > PMID:16622069 IMP P retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20040625 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0042147 > PMID:16622069 IMP P retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20060424 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0030437 > PMID:15189449 IMP P retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20040625 GeneDB_Spombe > GeneDB_Spombe SPCC777.13 vps35 GO:0005829 > PMID:16823372 IDA C retromer complex subunit > Vps35 SPCC777.13|vps35 gene taxon:4896 > 20060724 GeneDB_Spombe > > - is there a rule (I couldn't find one) that says the synonyms > should not repeat the DB_Object_ID and DB_Object_Symbol, or should > there be? Would it save any space in the file sizes? > > Cheers, > Gavin > ________________________________________________________ > > Gavin Sherlock > Dept. of Genetics > S201A, Grant Building, > Stanford University Medical School, > Stanford, > CA 94305-5120 > > Tel: 650 498 6012 > Fax: 650 724 3701 > > From sherlock at genome.Stanford.EDU Thu Aug 9 21:26:05 2007 From: sherlock at genome.Stanford.EDU (Gavin Sherlock) Date: Thu, 9 Aug 2007 21:26:05 -0700 Subject: [go] mapping between DB_Object_ID and DB_Object_Symbol In-Reply-To: <6770B231-403A-4E0E-BBED-2F07E641770E@stanford.edu> References: <6770B231-403A-4E0E-BBED-2F07E641770E@stanford.edu> Message-ID: Hi Mike, It's an easy check to add. For a given file, where $databaseId and $name are the DB_Object_ID and DB_Object_Symbol for the current line respectively, something like: if (exists ($databaseId2StandardName{$databaseId}) && $name ne $databaseId2StandardName{$databaseId}){ # do something to say that the databaseId has more than one standard name in the file, and thus reject it }else{ # process # now record that we saw it $databaseId2StandardName{$databaseId} = $name; } works just fine (and is essentially what my GO::TernFinder code does. Probably the reverse check should be done to - i.e. a DB_Object_Symbol maps to only one DB_Object_ID. If it is part of the spec (and is spelled out on the annotation file format page, which it isn't currently), then I think files that don't follow the rule should be rejected. Cheers, Gavin On Aug 9, 2007, at 9:13 PM, Mike Cherry wrote: > I think 1:1 is the intent. > > The documentation says the DB_Object_Symbol field has cardinality > of 1. The checking script is looking for the pipe symbol in that > field. It requires just one symbol, and will not allow zero or > more than one symbol to be started on a line. > > The checking script will not find multiple relationships if they > are spread across multiple lines in the file. The filtering script > is concerned with format of the information, cardinality, and some > very basic things like is an abbreviation okay. As written is does > not compare two lines within the file, it just checked each line > independently. > > A check of the database could report these errors. There is an > easy UNIX command method to check for this problem. For example > with the pombe file, its all one long command on one line: > > % gzcat gene_association.GeneDB_Spombe.gz | cut -f2,3 | sort -u | > cut -f1 | sort | uniq -c | sort -rn | grep -v ' 1 ' > > Any ID in the result that has a number greater than 1 is an ID that > has more than 1 symbol associated somewhere within the gene > association file. For the current pombe file that would be 388 of > the 5073 gene IDs. > > Yes the RGD (96), pseudocap (1) and WormBase (16) gene association > files all have a few of this type of issue. > > -Mike > > > On Aug 9, 2007, at 7:38 PM, Gavin Sherlock wrote: > >> Hi all, >> >> An issue came up with GO::TermFinder, because it chokes on files >> where the relationship between DB_Object_ID and DB_Object_Symbol >> is not 1:1, and there are a number of files that have for instance >> a 1:2 relationship between these columns, e.g.: >> >> GeneDB_Spombe: SPCC777.13 maps to SPCC777.13, vps35 >> pseudocap: PA5429 maps to aspA, adhA >> RGD: RGD:1359623 maps to Tuba4a, Tuba4 >> WB: WBGene00000386 maps to cdc-25.1, cdc25.1 >> >> My question is, should this be a 1:1 relationship, and the >> annotation files checking script needs to reject files that >> deviate from that (presumably these additional names would become >> synonyms instead), or is a 1:2 or more relationship allowed >> between those columns, in which case, I'll have to modify >> GO::TermFinder appropriately. >> >> As an additional data point, the pombe file actually lists both >> SPCC777.13 and vps35 as synonyms for the gene too : >> >> whitbread 1001 % grep 'SPCC777.13' gene_association.GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 SPCC777.13 GO: >> 0003674 GO_REF:0000015 ND >> F gene taxon:4896 20070711GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0005768 >> PMID:16622069 IMP C retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20060424 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0030904 >> PMID:16622069 IMP C retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20040625 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0030904 >> PMID:16622069 ISS SGD:S000003690 C retromer complex >> subunit Vps35 SPCC777.13|vps35gene taxon:4896 >> 20040625 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0006886 >> PMID:16622069 IMP P retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20040625 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0042147 >> PMID:16622069 IMP P retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20060424 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0030437 >> PMID:15189449 IMP P retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20040625 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0005829 >> PMID:16823372 IDA C retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20060724 GeneDB_Spombe >> >> - is there a rule (I couldn't find one) that says the synonyms >> should not repeat the DB_Object_ID and DB_Object_Symbol, or should >> there be? Would it save any space in the file sizes? >> >> Cheers, >> Gavin >> ________________________________________________________ >> >> Gavin Sherlock >> Dept. of Genetics >> S201A, Grant Building, >> Stanford University Medical School, >> Stanford, >> CA 94305-5120 >> >> Tel: 650 498 6012 >> Fax: 650 724 3701 >> >> From val at sanger.ac.uk Fri Aug 10 01:34:09 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Fri, 10 Aug 2007 09:34:09 +0100 Subject: [go] mapping between DB_Object_ID and DB_Object_Symbol In-Reply-To: References: <6770B231-403A-4E0E-BBED-2F07E641770E@stanford.edu> Message-ID: <46BC2301.6030304@sanger.ac.uk> Oops, I just realised that my ND annotations have the systematic ID in the synonyms column instead of the synonym. I think this will account for all the pombe occurrences. Will fix, sorry! Val >>> >>> whitbread 1001 % grep 'SPCC777.13' gene_association.GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 SPCC777.13 GO: >>> 0003674 GO_REF:0000015 ND >>> F gene taxon:4896 20070711GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0005768 >>> PMID:16622069 IMP C retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20060424 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0030904 >>> PMID:16622069 IMP C retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20040625 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0030904 >>> PMID:16622069 ISS SGD:S000003690 C retromer complex >>> subunit Vps35 SPCC777.13|vps35gene taxon:4896 >>> 20040625 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0006886 >>> PMID:16622069 IMP P retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20040625 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0042147 >>> PMID:16622069 IMP P retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20060424 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0030437 >>> PMID:15189449 IMP P retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20040625 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0005829 >>> PMID:16823372 IDA C retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20060724 GeneDB_Spombe >>> >>> - is there a rule (I couldn't find one) that says the synonyms >>> should not repeat the DB_Object_ID and DB_Object_Symbol, or should >>> there be? Would it save any space in the file sizes? >>> >>> Cheers, >>> Gavin >>> ________________________________________________________ >>> >>> Gavin Sherlock >>> Dept. of Genetics >>> S201A, Grant Building, >>> Stanford University Medical School, >>> Stanford, >>> CA 94305-5120 >>> >>> Tel: 650 498 6012 >>> Fax: 650 724 3701 >>> >>> > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From val at sanger.ac.uk Fri Aug 10 02:59:30 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Fri, 10 Aug 2007 10:59:30 +0100 Subject: [go] mapping between DB_Object_ID and DB_Object_Symbol In-Reply-To: <035143A8-B2F4-4BDF-BE0F-28548F534B8E@fruitfly.org> References: <035143A8-B2F4-4BDF-BE0F-28548F534B8E@fruitfly.org> Message-ID: <46BC3702.7070309@sanger.ac.uk> > > > I think it's best not to repeat symbols as synonyms, as you lead > people to believe that these will always be present, which may > potentially lead to them implementing buggy software (if they are > extremely sloppy). Those writing software correctly have to > defensively implement some kind of filter, if they want to avoid > reporting back (mildly confusing) duplicates to their users. > Consistency is always a good thing. > Martin will fix the problems with our DB_Object_ID and DB_Object_Symbol columns which should happen with the next update. But I'd like to clarify about the synonyms. From the documentation 2 DB_Object_ID required S000000296 3 DB_Object_Symbol required PHO3 11 DB_Object_Synonym (|Synonym) optional YBR092C BUT if your gene doesn't have a given name, the systematic ID has to go in column 3. If your gene does have a given name the systematic ID goes in the synonym column (11). This means if the user comes with a list of systematic IDs they aren't always in the same field. I think this is why with GeneDB we put ALL IDs in the synonyms column. I can raise a ticket so that we don't put any duplicates in the synonyms file. But don't we need a single field which contains all the systematic IDs for any organism? These are the IDs which researchers usually use for handling large datasets. Wouldn't it be better to require that the object symbol was the systematic ID rather than a mixture of systematic ids and primary names, and everything else was a synonym? This is the same identifier issue I mentioned on AWG recently, i.e why its a problem to use a list of IDs and search on a single field, because the ID types are necessarily split between different fields. The other problem with the existing file structure, is that IDs swap between fields as genes are 'named'. Val -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From dph at informatics.jax.org Fri Aug 10 05:37:28 2007 From: dph at informatics.jax.org (David Hill) Date: Fri, 10 Aug 2007 08:37:28 -0400 Subject: [go] mapping between DB_Object_ID and DB_Object_Symbol In-Reply-To: <035143A8-B2F4-4BDF-BE0F-28548F534B8E@fruitfly.org> References: <035143A8-B2F4-4BDF-BE0F-28548F534B8E@fruitfly.org> Message-ID: <46BC5C08.3030000@informatics.jax.org> > > > I think it's best not to repeat symbols as synonyms, as you lead > people to believe that these will always be present, which may > potentially lead to them implementing buggy software (if they are > extremely sloppy). But, synonyms for gene symbols are harvested directly from the literature. Unfortunately, bench scientists don't often consider whether the 'handle' they are using for their gene is unique. This is a huge issue in mouse and often a lot of the work of a curator is to determine which gene an author is actually talking about. However, every official gene symbol should only correspond to one database gene ID. All other uses of symbols than the official symbol for a gene should go in the 'synonyms' field. > Those writing software correctly have to defensively implement some > kind of filter, if they want to avoid reporting back (mildly > confusing) duplicates to their users. Consistency is always a good thing. > > I think the 1:1 violation is more serious though > > On Aug 9, 2007, at 7:38 PM, Gavin Sherlock wrote: > >> Hi all, >> >> An issue came up with GO::TermFinder, because it chokes on files >> where the relationship between DB_Object_ID and DB_Object_Symbol is >> not 1:1, and there are a number of files that have for instance a 1:2 >> relationship between these columns, e.g.: >> >> GeneDB_Spombe: SPCC777.13 maps to SPCC777.13, vps35 >> pseudocap: PA5429 maps to aspA, adhA >> RGD: RGD:1359623 maps to Tuba4a, Tuba4 >> WB: WBGene00000386 maps to cdc-25.1, cdc25.1 >> >> My question is, should this be a 1:1 relationship, and the annotation >> files checking script needs to reject files that deviate from that >> (presumably these additional names would become synonyms instead), or >> is a 1:2 or more relationship allowed between those columns, in which >> case, I'll have to modify GO::TermFinder appropriately. >> >> As an additional data point, the pombe file actually lists both >> SPCC777.13 and vps35 as synonyms for the gene too : >> >> whitbread 1001 % grep 'SPCC777.13' gene_association.GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 SPCC777.13 >> GO:0003674 GO_REF:0000015 ND >> F gene taxon:4896 20070711GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0005768 >> PMID:16622069 IMP C retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20060424 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0030904 >> PMID:16622069 IMP C retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20040625 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0030904 >> PMID:16622069 ISS SGD:S000003690 C retromer complex >> subunit Vps35 SPCC777.13|vps35gene taxon:4896 >> 20040625 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0006886 >> PMID:16622069 IMP P retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20040625 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0042147 >> PMID:16622069 IMP P retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20060424 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0030437 >> PMID:15189449 IMP P retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20040625 GeneDB_Spombe >> GeneDB_Spombe SPCC777.13 vps35 GO:0005829 >> PMID:16823372 IDA C retromer complex subunit >> Vps35 SPCC777.13|vps35 gene taxon:4896 >> 20060724 GeneDB_Spombe >> >> - is there a rule (I couldn't find one) that says the synonyms should >> not repeat the DB_Object_ID and DB_Object_Symbol, or should there >> be? Would it save any space in the file sizes? >> >> Cheers, >> Gavin >> ________________________________________________________ >> >> Gavin Sherlock >> Dept. of Genetics >> S201A, Grant Building, >> Stanford University Medical School, >> Stanford, >> CA 94305-5120 >> >> Tel: 650 498 6012 >> Fax: 650 724 3701 >> >> > From val at sanger.ac.uk Fri Aug 10 05:40:13 2007 From: val at sanger.ac.uk (Valerie Wood) Date: Fri, 10 Aug 2007 13:40:13 +0100 Subject: [go] mapping between DB_Object_ID and DB_Object_Symbol In-Reply-To: <46BC5C08.3030000@informatics.jax.org> References: <035143A8-B2F4-4BDF-BE0F-28548F534B8E@fruitfly.org> <46BC5C08.3030000@informatics.jax.org> Message-ID: <46BC5CAD.1000207@sanger.ac.uk> Hi David, Chris is refering to our (GeneDB) odd practice of repeating the names in the synonyms column, so the gene name might be repeated in the synonym field for a single gene) Rather than between genes (which is OK for synonyms). The reason we did this is explained in later e-mail Val David Hill wrote: > >> >> >> I think it's best not to repeat symbols as synonyms, as you lead >> people to believe that these will always be present, which may >> potentially lead to them implementing buggy software (if they are >> extremely sloppy). > > But, synonyms for gene symbols are harvested directly from the > literature. Unfortunately, bench scientists don't often consider > whether the 'handle' they are using for their gene is unique. This is > a huge issue in mouse and often a lot of the work of a curator is to > determine which gene an author is actually talking about. However, > every official gene symbol should only correspond to one database gene > ID. All other uses of symbols than the official symbol for a gene > should go in the 'synonyms' field. > >> Those writing software correctly have to defensively implement some >> kind of filter, if they want to avoid reporting back (mildly >> confusing) duplicates to their users. Consistency is always a good >> thing. >> >> I think the 1:1 violation is more serious though >> >> On Aug 9, 2007, at 7:38 PM, Gavin Sherlock wrote: >> >>> Hi all, >>> >>> An issue came up with GO::TermFinder, because it chokes on files >>> where the relationship between DB_Object_ID and DB_Object_Symbol is >>> not 1:1, and there are a number of files that have for instance a >>> 1:2 relationship between these columns, e.g.: >>> >>> GeneDB_Spombe: SPCC777.13 maps to SPCC777.13, vps35 >>> pseudocap: PA5429 maps to aspA, adhA >>> RGD: RGD:1359623 maps to Tuba4a, Tuba4 >>> WB: WBGene00000386 maps to cdc-25.1, cdc25.1 >>> >>> My question is, should this be a 1:1 relationship, and the >>> annotation files checking script needs to reject files that deviate >>> from that (presumably these additional names would become synonyms >>> instead), or is a 1:2 or more relationship allowed between those >>> columns, in which case, I'll have to modify GO::TermFinder >>> appropriately. >>> >>> As an additional data point, the pombe file actually lists both >>> SPCC777.13 and vps35 as synonyms for the gene too : >>> >>> whitbread 1001 % grep 'SPCC777.13' gene_association.GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 SPCC777.13 >>> GO:0003674 GO_REF:0000015 ND >>> F gene taxon:4896 20070711GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0005768 >>> PMID:16622069 IMP C retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20060424 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0030904 >>> PMID:16622069 IMP C retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20040625 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0030904 >>> PMID:16622069 ISS SGD:S000003690 C retromer complex >>> subunit Vps35 SPCC777.13|vps35gene taxon:4896 >>> 20040625 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0006886 >>> PMID:16622069 IMP P retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20040625 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0042147 >>> PMID:16622069 IMP P retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20060424 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0030437 >>> PMID:15189449 IMP P retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20040625 GeneDB_Spombe >>> GeneDB_Spombe SPCC777.13 vps35 GO:0005829 >>> PMID:16823372 IDA C retromer complex subunit >>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>> 20060724 GeneDB_Spombe >>> >>> - is there a rule (I couldn't find one) that says the synonyms >>> should not repeat the DB_Object_ID and DB_Object_Symbol, or should >>> there be? Would it save any space in the file sizes? >>> >>> Cheers, >>> Gavin >>> ________________________________________________________ >>> >>> Gavin Sherlock >>> Dept. of Genetics >>> S201A, Grant Building, >>> Stanford University Medical School, >>> Stanford, >>> CA 94305-5120 >>> >>> Tel: 650 498 6012 >>> Fax: 650 724 3701 >>> >>> >> > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From dph at informatics.jax.org Fri Aug 10 05:50:23 2007 From: dph at informatics.jax.org (David Hill) Date: Fri, 10 Aug 2007 08:50:23 -0400 Subject: [go] mapping between DB_Object_ID and DB_Object_Symbol In-Reply-To: <46BC5CAD.1000207@sanger.ac.uk> References: <035143A8-B2F4-4BDF-BE0F-28548F534B8E@fruitfly.org> <46BC5C08.3030000@informatics.jax.org> <46BC5CAD.1000207@sanger.ac.uk> Message-ID: <46BC5F0F.7080405@informatics.jax.org> O.K. my misunderstanding. Sorry. Valerie Wood wrote: > > > > Hi David, > > Chris is refering to our (GeneDB) odd practice of repeating the names > in the synonyms column, so the gene name might be repeated in the > synonym field for a single gene) Rather than between genes (which is > OK for synonyms). > > The reason we did this is explained in later e-mail > > Val > > > > > > David Hill wrote: > >> >>> >>> >>> I think it's best not to repeat symbols as synonyms, as you lead >>> people to believe that these will always be present, which may >>> potentially lead to them implementing buggy software (if they are >>> extremely sloppy). >> >> But, synonyms for gene symbols are harvested directly from the >> literature. Unfortunately, bench scientists don't often consider >> whether the 'handle' they are using for their gene is unique. This is >> a huge issue in mouse and often a lot of the work of a curator is to >> determine which gene an author is actually talking about. However, >> every official gene symbol should only correspond to one database >> gene ID. All other uses of symbols than the official symbol for a >> gene should go in the 'synonyms' field. >> >>> Those writing software correctly have to defensively implement some >>> kind of filter, if they want to avoid reporting back (mildly >>> confusing) duplicates to their users. Consistency is always a good >>> thing. >>> >>> I think the 1:1 violation is more serious though >>> >>> On Aug 9, 2007, at 7:38 PM, Gavin Sherlock wrote: >>> >>>> Hi all, >>>> >>>> An issue came up with GO::TermFinder, because it chokes on files >>>> where the relationship between DB_Object_ID and DB_Object_Symbol is >>>> not 1:1, and there are a number of files that have for instance a >>>> 1:2 relationship between these columns, e.g.: >>>> >>>> GeneDB_Spombe: SPCC777.13 maps to SPCC777.13, vps35 >>>> pseudocap: PA5429 maps to aspA, adhA >>>> RGD: RGD:1359623 maps to Tuba4a, Tuba4 >>>> WB: WBGene00000386 maps to cdc-25.1, cdc25.1 >>>> >>>> My question is, should this be a 1:1 relationship, and the >>>> annotation files checking script needs to reject files that deviate >>>> from that (presumably these additional names would become synonyms >>>> instead), or is a 1:2 or more relationship allowed between those >>>> columns, in which case, I'll have to modify GO::TermFinder >>>> appropriately. >>>> >>>> As an additional data point, the pombe file actually lists both >>>> SPCC777.13 and vps35 as synonyms for the gene too : >>>> >>>> whitbread 1001 % grep 'SPCC777.13' gene_association.GeneDB_Spombe >>>> GeneDB_Spombe SPCC777.13 SPCC777.13 >>>> GO:0003674 GO_REF:0000015 ND >>>> F gene taxon:4896 20070711GeneDB_Spombe >>>> GeneDB_Spombe SPCC777.13 vps35 GO:0005768 >>>> PMID:16622069 IMP C retromer complex subunit >>>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>>> 20060424 GeneDB_Spombe >>>> GeneDB_Spombe SPCC777.13 vps35 GO:0030904 >>>> PMID:16622069 IMP C retromer complex subunit >>>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>>> 20040625 GeneDB_Spombe >>>> GeneDB_Spombe SPCC777.13 vps35 GO:0030904 >>>> PMID:16622069 ISS SGD:S000003690 C retromer complex >>>> subunit Vps35 SPCC777.13|vps35gene taxon:4896 >>>> 20040625 GeneDB_Spombe >>>> GeneDB_Spombe SPCC777.13 vps35 GO:0006886 >>>> PMID:16622069 IMP P retromer complex subunit >>>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>>> 20040625 GeneDB_Spombe >>>> GeneDB_Spombe SPCC777.13 vps35 GO:0042147 >>>> PMID:16622069 IMP P retromer complex subunit >>>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>>> 20060424 GeneDB_Spombe >>>> GeneDB_Spombe SPCC777.13 vps35 GO:0030437 >>>> PMID:15189449 IMP P retromer complex subunit >>>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>>> 20040625 GeneDB_Spombe >>>> GeneDB_Spombe SPCC777.13 vps35 GO:0005829 >>>> PMID:16823372 IDA C retromer complex subunit >>>> Vps35 SPCC777.13|vps35 gene taxon:4896 >>>> 20060724 GeneDB_Spombe >>>> >>>> - is there a rule (I couldn't find one) that says the synonyms >>>> should not repeat the DB_Object_ID and DB_Object_Symbol, or should >>>> there be? Would it save any space in the file sizes? >>>> >>>> Cheers, >>>> Gavin >>>> ________________________________________________________ >>>> >>>> Gavin Sherlock >>>> Dept. of Genetics >>>> S201A, Grant Building, >>>> Stanford University Medical School, >>>> Stanford, >>>> CA 94305-5120 >>>> >>>> Tel: 650 498 6012 >>>> Fax: 650 724 3701 >>>> >>>> >>> >> > > > From jane at ebi.ac.uk Fri Aug 10 06:02:14 2007 From: jane at ebi.ac.uk (Jane Lomax) Date: Fri, 10 Aug 2007 14:02:14 +0100 Subject: [go] InterPro survey Message-ID: <46BC61D6.4040301@ebi.ac.uk> Hello, We would be grateful if you could take a few minutes to complete the InterPro 'User Survey' (link provided below). InterPro is a database of protein families and domains, and the results of this survey will be important for the future development and direction of InterPro, as well as to enable us to provide a better service. http://www.surveymonkey.com/s.aspx?sm=pgDLDJYGfFewgqFhOKLmbQ_3d_3d With thanks - The InterPro Team. From hjd at informatics.jax.org Fri Aug 10 07:05:08 2007 From: hjd at informatics.jax.org (Harold Drabkin) Date: Fri, 10 Aug 2007 10:05:08 -0400 Subject: [go] 'regulation of gene expression' In-Reply-To: References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> <46B8953C.4070600@ebi.ac.uk> <46B8CC5E.7000306@informatics.jax.org> <46B8CD36.4070106@northwestern.edu> <46B8CF40.1050301@informatics.jax.org> Message-ID: <46BC7094.8060106@informatics.jax.org> I think "directly regulates" may be worse. It would have to be VERY restrictive eg, mutation in gene_product x results in increased appearance of gene_product y. Is X involved in the regulation of the expression of gene Y? That is the question. How might X directly regulate the expression of Y? Is X a transcription factor? Is X a specific protease? Does X actually work on A, and A then works on B, and B then works on Y? It is the same problem as the "biosynthesis" terms. what is meant by "gene expression"? transcription? translation of an mRNA? turnover of the mRNA or of the protein? very fuzzy I can also see that the term would invoke requestions for "expression of X" and "expression of Y", like the biosynthesis terms did. Perhaps if used, one needs to restrict the evidence code used with it to IEP and perhaps also allow a With field dbx ref for a gene_id if known or suspected -Harold Chris Mungall wrote: > > I agree with Karen - > > I sympathise with Harold's concerns. I wonder if the problem here may > be with "regulation". What if, say, the definition was changed to > "directly regulates". I don't have a clear answer yet for how we > should determine direct vs indirect. But it seems this may go some way > to addressing Harold's concerns. This is something we have to tackle > anyway for the "regulates" relation. > > In addition, we could have computable rules such as "IMP evidence for > regulation of gene expression is suspect". > > On Aug 9, 2007, at 12:42 PM, Karen Christie wrote: > >> >> >> I agree with you Harold that 'gene expression' isn't tremendously >> informative as a GO annotation, but then I don't think that 'cell' is >> particularly meaningful as a GO annotation either and we already have >> that term. However, I don't think usefulness in annotation is the >> only consideration for whether or not to add the term. >> >> I am in favor of adding this term because I think it does increase >> the accuracy of GO's representation of the biology. >> >> In addition, if I remember correctly from earlier discussions, it was >> also PATO people who wanted this term, and I can see that if PATO is >> leveraging off of GO to describe phenotypes, that it may be >> meaningful to say that a mutation of gene X affects the 'gene >> expression' of gene Y in a way that is broad enough that it does not >> specify how this is occurring. >> >> -Karen >> >> >> On Tue, 7 Aug 2007, Harold Drabkin wrote: >> >>> IMHO, no; >>> Some "experiments" do not give real information other than to help >>> you design what experiments will. >>> Again, I am signing off on this with grave reservations. >>> >>> hjd >>> >>> Pascale Gaudet wrote: >>>> But isn't it already more information than annotating to the root >>>> term? >>>> Harold Drabkin wrote: >>>>> Moi??? >>>>> Well; my feeling hasn't changed but everyone else thinks it's a >>>>> meaningful term. I like it a bit better than the " biosynthesis >>>>> of" terms. I still feel it doesn't say much if you say gene >>>>> product x is involved in the regulation of gene expression based >>>>> on an observation that doing something to x causes the levels of >>>>> other gene products to change, without knowing actually what is >>>>> happening. Still seems like to me like making a term to use for >>>>> incomplete or ill-defined experiments. Just my take. >>>>> hjd >>>>> Jane Lomax wrote: >>>>>> Actually, I was just waiting for Harold's blessing on this item... >>>>>> Jane >>>>>> Midori Harris wrote: >>>>>>> So did I! I think what happened is simply that the SF item is >>>>>>> assigned to Jane, and she's been so busy with various2go >>>>>>> mappings and advocacy stuff that she hasn't had a chance to work >>>>>>> on her other SF things. >>>>>>> m >>>>>>> On Mon, 6 Aug 2007, David Hill wrote: >>>>>>>> I thought it was going to be implemented. >>>>>>>> David >>>>>>>> Karen Christie wrote: >>>>>>>>> Hi Tanya, >>>>>>>>> There seems to be a stalled SF item on this topic: >>>>>>>>> [ 1418820 ] gene expression >>>>>>>>> https://sourceforge.net/tracker/index.php?func=detail&aid=1418820&group_id=36855&atid=440764 >>>>>>>>> -Karen >>>>>>>>> On Mon, 6 Aug 2007, Tanya Berardini wrote: >>>>>>>>>> A (long) while back, we'd talked about having a term for >>>>>>>>>> 'regulation of gene expression'. I've been searching the >>>>>>>>>> email archives without much luck hoping to find out what >>>>>>>>>> happened with respect to that item. The last I found was an >>>>>>>>>> action item from the Sept. 2002 (!) meeting. >>>>>>>>>> Collective memory, please help out! >>>>>>>>>> Thanks, >>>>>>>>>> Tanya >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------------------ >>>>>>>>>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu >>>>>>>>>> The Arabidopsis Information Resource FAX: (650) 325-6857 >>>>>>>>>> Carnegie Institution of Washington Tel: (650) 325-1521 >>>>>>>>>> ext. 325 >>>>>>>>>> Department of Plant Biology URL: http://arabidopsis.org/ >>>>>>>>>> 260 Panama St. >>>>>>>>>> Stanford, CA 94305 >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------------------ >>>>>>>>>> >>> >> > From pj37 at cornell.edu Fri Aug 10 07:15:39 2007 From: pj37 at cornell.edu (Pankaj Jaiswal) Date: Fri, 10 Aug 2007 10:15:39 -0400 Subject: [go] 'regulation of gene expression' In-Reply-To: <46BC7094.8060106@informatics.jax.org> References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> <46B8953C.4070600@ebi.ac.uk> <46B8CC5E.7000306@informatics.jax.org> <46B8CD36.4070106@northwestern.edu> <46B8CF40.1050301@informatics.jax.org> <46BC7094.8060106@informatics.jax.org> Message-ID: <46BC730B.6090304@cornell.edu> Harold Drabkin wrote: > Perhaps if used, one needs to restrict the evidence code used with it to > IEP and perhaps also allow a With field dbx ref for a gene_id if known > or suspected > With field filling should be 'a must' because the regulation of expression is always in context to another gene (product). However the code can also be IPI, IGI, IMP besides IEP. Pankaj From edimmer at ebi.ac.uk Fri Aug 10 07:42:13 2007 From: edimmer at ebi.ac.uk (E Dimmer) Date: Fri, 10 Aug 2007 15:42:13 +0100 Subject: [go] 'regulation of gene expression' In-Reply-To: <46BC730B.6090304@cornell.edu> References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> <46B8953C.4070600@ebi.ac.uk> <46B8CC5E.7000306@informatics.jax.org> <46B8CD36.4070106@northwestern.edu> <46B8CF40.1050301@informatics.jax.org> <46BC7094.8060106@informatics.jax.org> <46BC730B.6090304@cornell.edu> Message-ID: <46BC7945.70208@ebi.ac.uk> Hi, There is a large number of terms in GO which deserve more information regarding the 'target' of its gene product's activity, for instance: -- negative regulation of ubiquitination -- protein kinase activity -- regulation of DNA binding -- regulation of phosphorylation -for all of these examples I have information on the specific target of a protein's action. While I strongly feel that this target information should be captured -- is the 'with' the right place? We seem to have tried to be very strict on the contents of the 'with', so the type of value used is closely related to the type of evidence code applied in the annotation. I'm not sure that regulation of expression be an exception to this... Emily Pankaj Jaiswal wrote: > > > Harold Drabkin wrote: > >> Perhaps if used, one needs to restrict the evidence code used with it >> to IEP and perhaps also allow a With field dbx ref for a gene_id if >> known or suspected >> > > With field filling should be 'a must' because the regulation of > expression is always in context to another gene (product). However the > code can also be IPI, IGI, IMP besides IEP. > > Pankaj -- ************************************ Emily Dimmer GOA Coordinator EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD, U.K. Tel: +44 1223 494654 Fax: +44 1223 494468 email: edimmer at ebi.ac.uk ************************************ From dph at informatics.jax.org Fri Aug 10 07:45:30 2007 From: dph at informatics.jax.org (David Hill) Date: Fri, 10 Aug 2007 10:45:30 -0400 Subject: [go] 'regulation of gene expression' In-Reply-To: <46BC7945.70208@ebi.ac.uk> References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> <46B8953C.4070600@ebi.ac.uk> <46B8CC5E.7000306@informatics.jax.org> <46B8CD36.4070106@northwestern.edu> <46B8CF40.1050301@informatics.jax.org> <46BC7094.8060106@informatics.jax.org> <46BC730B.6090304@cornell.edu> <46BC7945.70208@ebi.ac.uk> Message-ID: <46BC7A0A.3080204@informatics.jax.org> We capture this info in our structure notes. I think it will be very valuable information. David E Dimmer wrote: > Hi, > > There is a large number of terms in GO which deserve more information > regarding the 'target' of its gene product's activity, for instance: > -- negative regulation of ubiquitination > -- protein kinase activity > -- regulation of DNA binding > -- regulation of phosphorylation > -for all of these examples I have information on the specific target > of a protein's action. > > While I strongly feel that this target information should be captured > -- is the 'with' the right place? We seem to have tried to be very > strict on the contents of the 'with', so the type of value used is > closely related to the type of evidence code applied in the > annotation. I'm not sure that regulation of expression be an exception > to this... > > Emily > > > Pankaj Jaiswal wrote: >> >> >> Harold Drabkin wrote: >> >>> Perhaps if used, one needs to restrict the evidence code used with >>> it to IEP and perhaps also allow a With field dbx ref for a gene_id >>> if known or suspected >>> >> >> With field filling should be 'a must' because the regulation of >> expression is always in context to another gene (product). However >> the code can also be IPI, IGI, IMP besides IEP. >> >> Pankaj > > From jimhu at tamu.edu Fri Aug 10 08:04:24 2007 From: jimhu at tamu.edu (Jim Hu) Date: Fri, 10 Aug 2007 10:04:24 -0500 Subject: [go] 'regulation of gene expression' In-Reply-To: <46BC7945.70208@ebi.ac.uk> References: <46B76CF3.3010501@acoma.stanford.edu> <46B79BAB.2010302@informatics.jax.org> <46B8953C.4070600@ebi.ac.uk> <46B8CC5E.7000306@informatics.jax.org> <46B8CD36.4070106@northwestern.edu> <46B8CF40.1050301@informatics.jax.org> <46BC7094.8060106@informatics.jax.org> <46BC730B.6090304@cornell.edu> <46BC7945.70208@ebi.ac.uk> Message-ID: I'm relatively new to the group, but I think the term is useful if and only if the target (direct or indirect) is specified. From the pov of an experimentalist, I'd like to mine the annotations to find associations that could be pushed down to lower nodes if more data was available. In other words, the ability of GO terms closer to the root to identify gaps in the experimental literature is a good thing, since it suggests places where more experiments are needed. A while ago I put up an SF query about whether terms that specify the target in the term definition were too specific. e.g. GO:0046011 ! regulation of oskar mRNA translation and it's two children would be lumped into regulation of mRNA translation and given oskar mRNA as the target. It sounds to me like the discussion is coming around to my pov on this (I hope?!). I can see why with may not be appropriate, if the regulator is part of a complex that does the regulation, then it seems like you could be getting into having two kinds of with, the partner and the target. Adding a new field presumably opens another can of worms, but it sounds like maybe one is needed. But here I'm wading into waters best left to more experienced GO people. Jim On Aug 10, 2007, at 9:42 AM, E Dimmer wrote: > Hi, > > There is a large number of terms in GO which deserve more > information regarding the 'target' of its gene product's activity, > for instance: > -- negative regulation of ubiquitination > -- protein kinase activity > -- regulation of DNA binding > -- regulation of phosphorylation > -for all of these examples I have information on the specific > target of a protein's action. > > While I strongly feel that this target information should be > captured -- is the 'with' the right place? We seem to have tried to > be very strict on the contents of the 'with', so the type of value > used is closely related to the type of evidence code applied in the > annotation. I'm not sure that regulation of expression be an > exception to this... > > Emily > > > Pankaj Jaiswal wrote: >> >> >> Harold Drabkin wrote: >> >>> Perhaps if used, one needs to restrict the evidence code used >>> with it to IEP and perhaps also allow a With field dbx ref for a >>> gene_id if known or suspected >>> >> >> With field filling should be 'a must' because the regulation of >> expression is always in context to another gene (product). However >> the code can also be IPI, IGI, IMP besides IEP. >> >> Pankaj > > > -- > ************************************ > Emily Dimmer > GOA Coordinator > EMBL-EBI > Wellcome Trust Genome Campus > Hinxton > Cambridge CB10 1SD, U.K. > Tel: +44 1223 494654 > Fax: +44 1223 494468 > email: edimmer at ebi.ac.uk > ************************************ > ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From kchris at genome.Stanford.EDU Fri Aug 10 10:06:14 2007 From: kchris at genome.Stanford.EDU (Karen Christie) Date: Fri, 10 Aug 2007 10:06:14 -0700 (PDT) Subject: [go] mapping between DB_Object_ID and DB_Object_Sym