From dbarrell at ebi.ac.uk Thu Nov 5 02:22:20 2009 From: dbarrell at ebi.ac.uk (Daniel Barrell) Date: Thu, 05 Nov 2009 10:22:20 +0000 Subject: [Gofriends] EBI Go database down? In-Reply-To: <4AEA9D75.7060103@ebi.ac.uk> References: <4AE9BF88.2070903@ebi.ac.uk> <4AEA9D75.7060103@ebi.ac.uk> Message-ID: <4AF2A75C.9060108@ebi.ac.uk> Dear All, Sorry for the interruptions in service last week, and thanks to Tony for staying on top of the changes to the refresh script. We have indeed found some problems with the way the DB is using temp tables. This is now fixed and we are working on making this mirror more robust. Cheers Dan Tony Sawford wrote: > Yes, but unfortunately it's looking like we're going to have to wait > until Dan returns next week to get this sorted out. > > We have all the latest scripts, so I don't think that's the problem - it > seems like our server just doesn't want to talk to anybody at the moment... > > Tony > > Jane Lomax wrote: >> Dan is on leave this week - I think Tony is trying to work this out >> with Seth though... >> >> Suzanna Lewis wrote: >>> Yes, but the EBI installation itself is a mirror. >>> >>> Midori or Dan, do you know if the refresh scripts at the EBI mirror >>> have been updated? There was a bug in the last update that would >>> cause a failure. >>> >>> -S >>> >>> On Oct 29, 2009, at 8:42 AM, Mark wrote: >>> >>>> On Thu, 29 Oct 2009 02:55:35 -0700, Midori Harris >>>> wrote: >>>> >>>> >>>>> If you are not using the go_db_install.pl script, or if updating it >>>>> does >>>>> not fix your problem, please let us know via the GO Helpdesk >>>>> . >>>> >>>> >>>> we are not using a local installation - we are connecting directly >>>> to the public EBI mySQL port.... so I don't think this solution is >>>> relevant for us. >>>> >>>> Other ideas? >>>> >>>> Mark >>>> _______________________________________________ >>>> Gofriends mailing list >>>> Gofriends at geneontology.org >>>> http://fafner.stanford.edu/mailman/listinfo/gofriends >>>> >>> >>> _______________________________________________ >>> Gofriends mailing list >>> Gofriends at geneontology.org >>> http://fafner.stanford.edu/mailman/listinfo/gofriends >> >> > > _______________________________________________ > Gofriends mailing list > Gofriends at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/gofriends -- Daniel Barrell EMBL - The EBI Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD Phone: +44 (0)1223 492551 Email: dbarrell at ebi.ac.uk From midori at ebi.ac.uk Thu Nov 5 05:17:44 2009 From: midori at ebi.ac.uk (Midori Harris) Date: Thu, 5 Nov 2009 13:17:44 +0000 (GMT) Subject: [Gofriends] Announcement: Introduction of cross-product definitions in extended GO on January 18, 2010 Message-ID: Dear GO Friends, We are pleased to announce that we are about to introduce cross-product definitions, in the form of intersection_of tags, into the "extended" version of GO (ontology/obo_format_1_2/gene_ontology_ext.obo). The target date for this addition is January 18, 2010. This addition may have considerable impact on your tools and analyses because: 1. Loading scripts that use gene_ontology_ext.obo will need to be modified to take the intersection_of tags into account by January 18, 2010. 2. GO analysis tools that use gene_ontology_ext.obo will need to be modified to take the intersection_of tags into account by January 18, 2010. 3. Some intersection_of tags will use new relationship types that will not be used elsewhere in the GO, and scripts and tools will have to be modified to take these new relationship types into account. The first cross-products in GO will be "internal"; that is, they reference only terms within the GO. For example, the Biological Process (BP) ontology term "regulation of mitotic recombination" references two other BP terms, "biological regulation" and "mitotic recombination". Internal cross-products may also link branches of the GO, as in "nucleotide-excision repair complex", a Cellular Component (CC) term that references a CC term ("protein complex") and a BP term ("nucleotide-excision repair"). The addition of cross-product definitions to GO will enabel many ontology maintenance tasks to be automated, allowing curators to make additions and corrections more quickly and accurately. In the future, cross-products will also support improved querying and visualization of the ontology, and can be used to increase annotation coverage in GO through alignment with pathway databases and probabilistic inference. The principles of cross-products are documented at: http://www.geneontology.org/GO.ontology.structure.shtml#xp The intersection_of tag is documented in the OBO Format specifications: http://www.geneontology.org/GO.format.obo-1_2.shtml Please let us know if you have questions or comments about this plan. On behalf of the GO Consortium, Midori Harris ============================ Midori A. Harris, Ph.D. GO Editor EMBL - EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK Tel: +44 (0) 1223 494667 Fax: +44 (0) 1223 494468 Email: midori at ebi.ac.uk From pkhatri at stanford.edu Fri Nov 6 15:41:45 2009 From: pkhatri at stanford.edu (Purvesh Khatri) Date: Fri, 6 Nov 2009 15:41:45 -0800 (PST) Subject: [Gofriends] Number of annotated gene products Message-ID: <490729408.1143941257550905110.JavaMail.root@zm09.stanford.edu> Hi, I am trying to count the number of gene products currently annotated in GOA for human. I imported "go_200911-assocdb-tables.tar.gz" in a local database and used the following two queries to count the number of unique gene products with and without "IEA" evidence code: select count(distinct g.symbol) from association a, gene_product g, species s where a.gene_product_id = g.id and g.species_id = s.id and s.ncbi_taxa_id = 9606; select count(distinct g.symbol) from association a, evidence e, gene_product g, species s where a.gene_product_id = g.id and e.association_id = a.id and g.species_id = s.id and s.ncbi_taxa_id = 9606 and e.code != 'IEA'; My question is whether these queries are correct or not. The reason for asking the question is that using the first query, I get 18098 gene products as being annotated. However, the "Current annotations" page on GO website ( http://geneontology.org/GO.current.annotations.shtml?all ) lists the number of annotated gene products as 18587. Thank you for you help. Best regards, Purvesh Khatri, Ph.D. Postdoc (Butte/Sarwal labs) Stanford University Center for Biomedical Information Research (BMIR) 251 Campus Dr., Stanford, CA 94305 Phone: (313) 433-2836 Division of Nephrology Department of Pediatrics Stanford Medical School 300 Pasteur Dr., Room G327 Stanford, CA 94305 Phone: (650) 724-3765 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjm at berkeleybop.org Fri Nov 6 18:32:20 2009 From: cjm at berkeleybop.org (Chris Mungall) Date: Fri, 6 Nov 2009 18:32:20 -0800 Subject: [Gofriends] Number of annotated gene products In-Reply-To: <490729408.1143941257550905110.JavaMail.root@zm09.stanford.edu> References: <490729408.1143941257550905110.JavaMail.root@zm09.stanford.edu> Message-ID: Hi Purvesh, The reason for the lower number is that you are counting gene symbols, not gene IDs. Try this: select count(distinct g.dbxref_id) from association a, gene_product g, species s where a.gene_product_id = g.id and g.species_id = s.id and s.ncbi_taxa_id = 9606; It gives the right number (18587) Shouldn't symbols be unique with a species you might ask? We can take a look: select g1.symbol, x1.xref_dbname, x1.xref_key, x2.xref_dbname, x2.xref_key from gene_product g1, gene_product g2, dbxref x1, dbxref x2, species s where g1.species_id = s.id and g2.species_id = s.id and g1.dbxref_id = x1.id and g2.dbxref_id = x2.id and s.ncbi_taxa_id = 9606 and g1.symbol=g2.symbol and g1.id != g2.id; as you can see a subset of these are due to alternate isoforms of a generic protein sharing the same symbol. This is an area we're actively looking into. In other cases we have what appear to be different proteins sharing the same symbol: | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot | Q9BXR3 | | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot | Q9WJR5 | | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot | Q7LDI9 | | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot | Q69383 | | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot | Q69384 | I haven't looked at these and I have to head off just now, but I can get back to you later. Cheers Chris On Nov 6, 2009, at 3:41 PM, Purvesh Khatri wrote: > Hi, > > I am trying to count the number of gene products currently annotated > in GOA for human. I imported "go_200911-assocdb-tables.tar.gz" in a > local database and used the following two queries to count the > number of unique gene products with and without "IEA" evidence code: > > select count(distinct g.symbol) from association a, gene_product g, > species s > where a.gene_product_id = g.id and g.species_id = s.id > and s.ncbi_taxa_id = 9606; > > select count(distinct g.symbol) from association a, evidence e, > gene_product g, species s > where a.gene_product_id = g.id and e.association_id = a.id and > g.species_id = s.id > and s.ncbi_taxa_id = 9606 and e.code != 'IEA'; > > My question is whether these queries are correct or not. The reason > for asking the question is that using the first query, I get 18098 > gene products as being annotated. However, the "Current annotations" > page on GO website (http://geneontology.org/GO.current.annotations.shtml?all > ) lists the number of annotated gene products as 18587. > > Thank you for you help. > > Best regards, > > Purvesh Khatri, Ph.D. > Postdoc (Butte/Sarwal labs) > Stanford University > Center for Biomedical Information Research (BMIR) > 251 Campus Dr., Stanford, CA 94305 > Phone: (313) 433-2836 > > Division of Nephrology > Department of Pediatrics > Stanford Medical School > 300 Pasteur Dr., Room G327 > Stanford, CA 94305 > Phone: (650) 724-3765 > > _______________________________________________ > Gofriends mailing list > Gofriends at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/gofriends From cjm at berkeleybop.org Sat Nov 7 09:58:00 2009 From: cjm at berkeleybop.org (Chris Mungall) Date: Sat, 7 Nov 2009 09:58:00 -0800 Subject: [Gofriends] Number of annotated gene products In-Reply-To: References: <490729408.1143941257550905110.JavaMail.root@zm09.stanford.edu> Message-ID: <586E3FF7-759F-4BD7-B7B6-F0AC0DD25D52@berkeleybop.org> On Nov 7, 2009, at 6:05 AM, Gabriel Berriz wrote: > Purvesh, in addition to Chris's comments let me add that I've found > it much easier to work with the gene_association.* flat files > (available from http://www.geneontology.org/GO.current.annotations.shtml) > than with the data in the go_*-assocdb-tables.tar.gz. > > It is my understanding (Chris, please correct me if I'm wrong) that > these flat files are the primary sources from which the > corresponding data in the go_*-assocdb-tables.tar.gz files are > generated. This is correct > This means that working with these flat files is not only simpler > (certainly relative to the complexity of the go_*-assocdb- > tables.tar.gz schema), but also is closer to the original data. There should be no loss of information in the database build. Regarding whether to work with flat files or with SQL, it's a matter of taste, expertise and the nature of the question. For a seasoned unix hacker it's often quicker and more intuitive to chain together a pipe such as the one below, or to write a perl script for more complex tasks. For less techy users, the flat files offer the advantage of easy loading into Excel. For complex queries SQL can often be faster to execute and faster to write. The GOOSE SQL interface comes with a number of pre-canned queries: http://www.berkeleybop.org/goose In addition, the database has the advantage of combining ontology information, term information as well as other information such as the NCBI taxonomy. Of course, if there is a strong demand to provide this in a pre-canned tab delimited format, we may be able to do that. Of course, another way to see this particular information which may be better for the average user is to use a web interface such as AmiGO, and query for human: http://amigo.geneontology.org/cgi-bin/amigo/browse.cgi?open_1=all&ont=all&speciesdb=all&taxid=9606&tree_view=full&action=filter Although this is answering a slightly different question - the number of gene products in human with a positive annotation to one or more GO terms. There are 6 human gene products with only negative annotations, which accounts for the discrepancy. > BTW, you'll not be surprised to see that the output of the following > agrees with what the "Current annotations" page states: > > zcat gene_associations.goa_human.gz | grep -v '^!' | cut -f2 | sort - > u | wc -l > 18587 > > ...and the one from the following matches what you found: > > zcat gene_associations.goa_human.gz | grep -v '^!' | cut -f3 | sort - > u | wc -l > 18098 > > In these commands I use the latest version of the > gene_associations.goa_human.gz file: > > !CVS Version: Revision: 1.129 $ > !GOC Validation Date: 10/31/2009 $ > !Submission Date: 10/8/2009 > > > Cheers, > > Gabriel Berriz Thanks for the tips! Cheers Chris > > > On 091106F, at 18:41, Purvesh Khatri wrote: > >> Hi, >> >> I am trying to count the number of gene products currently >> annotated in GOA for human. I imported "go_200911-assocdb- >> tables.tar.gz" in a local database and used the following two >> queries to count the number of unique gene products with and >> without "IEA" evidence code: >> >> select count(distinct g.symbol) from association a, gene_product g, >> species s >> where a.gene_product_id = g.id and g.species_id = s.id >> and s.ncbi_taxa_id = 9606; >> >> select count(distinct g.symbol) from association a, evidence e, >> gene_product g, species s >> where a.gene_product_id = g.id and e.association_id = a.id and >> g.species_id = s.id >> and s.ncbi_taxa_id = 9606 and e.code != 'IEA'; >> >> My question is whether these queries are correct or not. The reason >> for asking the question is that using the first query, I get 18098 >> gene products as being annotated. However, the "Current >> annotations" page on GO website (http://geneontology.org/GO.current.annotations.shtml?all >> ) lists the number of annotated gene products as 18587. >> >> Thank you for you help. >> >> Best regards, >> >> Purvesh Khatri, Ph.D. >> Postdoc (Butte/Sarwal labs) >> Stanford University >> Center for Biomedical Information Research (BMIR) >> 251 Campus Dr., Stanford, CA 94305 >> Phone: (313) 433-2836 >> >> Division of Nephrology >> Department of Pediatrics >> Stanford Medical School >> 300 Pasteur Dr., Room G327 >> Stanford, CA 94305 >> Phone: (650) 724-3765 >> >> > > > > ============================================================= > Gabriel F. Berriz, PhD > Senior Bioinformatics Developer > Roth Lab > Biological Chemistry and Molecular Pharmacology -- Harvard Medical > School > Seeley G. Mudd Building 322B > Boston, MA 02115-5701 > Telephone: 617.432.3555 > Fax: 617.432.3557 > > > > > _______________________________________________ > Gofriends mailing list > Gofriends at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/gofriends From pkhatri at stanford.edu Sat Nov 7 12:05:41 2009 From: pkhatri at stanford.edu (Purvesh Khatri) Date: Sat, 7 Nov 2009 12:05:41 -0800 (PST) Subject: [Gofriends] Number of annotated gene products In-Reply-To: <175054008.1234131257624265789.JavaMail.root@zm09.stanford.edu> Message-ID: <885835044.1234241257624341481.JavaMail.root@zm09.stanford.edu> Hi Chris, Thank you for the quick reply. That explains the discrepancy. However, this leads me to another question. Running the same query (i.e., counting the number of annotated genes) on October 2008 GO assocdb returns the number of genes as 36,726 versus 18,587 genes in November 2009 release. The number of annotated genes is essentially reduced by almost 50% between October 2008 and November 2009. The number of associations for human in the same period have gone down from 197,411 to 159,303. What is the reason for such a dramatic reduction in the number of associations (and the corresponding reduction in the number of annotated genes)? Once again, thank you for your help. Cheers, Purvesh ----- Original Message ----- From: "Chris Mungall" To: "Purvesh Khatri" Cc: "gofriends" , "Daniel Barrell" Sent: Friday, November 6, 2009 6:32:20 PM GMT -08:00 US/Canada Pacific Subject: Re: [Gofriends] Number of annotated gene products Hi Purvesh, The reason for the lower number is that you are counting gene symbols, not gene IDs. Try this: select count(distinct g.dbxref_id) from association a, gene_product g, species s where a.gene_product_id = g.id and g.species_id = s.id and s.ncbi_taxa_id = 9606; It gives the right number (18587) Shouldn't symbols be unique with a species you might ask? We can take a look: select g1.symbol, x1.xref_dbname, x1.xref_key, x2.xref_dbname, x2.xref_key from gene_product g1, gene_product g2, dbxref x1, dbxref x2, species s where g1.species_id = s.id and g2.species_id = s.id and g1.dbxref_id = x1.id and g2.dbxref_id = x2.id and s.ncbi_taxa_id = 9606 and g1.symbol=g2.symbol and g1.id != g2.id; as you can see a subset of these are due to alternate isoforms of a generic protein sharing the same symbol. This is an area we're actively looking into. In other cases we have what appear to be different proteins sharing the same symbol: | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot | Q9BXR3 | | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot | Q9WJR5 | | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot | Q7LDI9 | | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot | Q69383 | | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot | Q69384 | I haven't looked at these and I have to head off just now, but I can get back to you later. Cheers Chris On Nov 6, 2009, at 3:41 PM, Purvesh Khatri wrote: > Hi, > > I am trying to count the number of gene products currently annotated > in GOA for human. I imported "go_200911-assocdb-tables.tar.gz" in a > local database and used the following two queries to count the > number of unique gene products with and without "IEA" evidence code: > > select count(distinct g.symbol) from association a, gene_product g, > species s > where a.gene_product_id = g.id and g.species_id = s.id > and s.ncbi_taxa_id = 9606; > > select count(distinct g.symbol) from association a, evidence e, > gene_product g, species s > where a.gene_product_id = g.id and e.association_id = a.id and > g.species_id = s.id > and s.ncbi_taxa_id = 9606 and e.code != 'IEA'; > > My question is whether these queries are correct or not. The reason > for asking the question is that using the first query, I get 18098 > gene products as being annotated. However, the "Current annotations" > page on GO website (http://geneontology.org/GO.current.annotations.shtml?all > ) lists the number of annotated gene products as 18587. > > Thank you for you help. > > Best regards, > > Purvesh Khatri, Ph.D. > Postdoc (Butte/Sarwal labs) > Stanford University > Center for Biomedical Information Research (BMIR) > 251 Campus Dr., Stanford, CA 94305 > Phone: (313) 433-2836 > > Division of Nephrology > Department of Pediatrics > Stanford Medical School > 300 Pasteur Dr., Room G327 > Stanford, CA 94305 > Phone: (650) 724-3765 > > _______________________________________________ > Gofriends mailing list > Gofriends at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/gofriends -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkhatri at stanford.edu Sat Nov 7 14:35:14 2009 From: pkhatri at stanford.edu (Purvesh Khatri) Date: Sat, 7 Nov 2009 14:35:14 -0800 (PST) Subject: [Gofriends] Number of annotated gene products In-Reply-To: <1420390670.1244461257633278352.JavaMail.root@zm09.stanford.edu> Message-ID: <18007562.1244541257633314602.JavaMail.root@zm09.stanford.edu> Hi Gabriel, Thank you for the tips. As Chris mentioned, I am using the MySQL database export primarily because of the nature of the question. I am trying to find how the number of annotations and the number of annotated genes have changed over the last year. The MySQL database export archive, available for every month, is an appropriate resource than the flat files. I used the flat files for the Onto-Express database :) As I mentioned in the previous email, the number of annotations and the number of annotated genes have reduced dramatically, something that I wasn't expecting. There are two possible reasons that I can think of for this to happen: Reason 1: The number of known genes has reduced dramatically. Reason 2: The method of assigning annotations with "IEA" code has changed. The reason I suspect the reduction in the number of annotations is related to IEA code is that, while the total number of annotations (including all codes) has reduced between October 2008 and November 2009, the number of non-IEA annotations has increased from 58,109 to 65,741. Any help from the community that explains this is greatly appreciated. Thank you, Purvesh ----- Original Message ----- From: "Chris Mungall" To: "Gabriel Berriz" Cc: "Purvesh Khatri" , "gofriends" Sent: Saturday, November 7, 2009 9:58:00 AM GMT -08:00 US/Canada Pacific Subject: Re: [Gofriends] Number of annotated gene products On Nov 7, 2009, at 6:05 AM, Gabriel Berriz wrote: > Purvesh, in addition to Chris's comments let me add that I've found > it much easier to work with the gene_association.* flat files > (available from http://www.geneontology.org/GO.current.annotations.shtml) > than with the data in the go_*-assocdb-tables.tar.gz. > > It is my understanding (Chris, please correct me if I'm wrong) that > these flat files are the primary sources from which the > corresponding data in the go_*-assocdb-tables.tar.gz files are > generated. This is correct > This means that working with these flat files is not only simpler > (certainly relative to the complexity of the go_*-assocdb- > tables.tar.gz schema), but also is closer to the original data. There should be no loss of information in the database build. Regarding whether to work with flat files or with SQL, it's a matter of taste, expertise and the nature of the question. For a seasoned unix hacker it's often quicker and more intuitive to chain together a pipe such as the one below, or to write a perl script for more complex tasks. For less techy users, the flat files offer the advantage of easy loading into Excel. For complex queries SQL can often be faster to execute and faster to write. The GOOSE SQL interface comes with a number of pre-canned queries: http://www.berkeleybop.org/goose In addition, the database has the advantage of combining ontology information, term information as well as other information such as the NCBI taxonomy. Of course, if there is a strong demand to provide this in a pre-canned tab delimited format, we may be able to do that. Of course, another way to see this particular information which may be better for the average user is to use a web interface such as AmiGO, and query for human: http://amigo.geneontology.org/cgi-bin/amigo/browse.cgi?open_1=all&ont=all&speciesdb=all&taxid=9606&tree_view=full&action=filter Although this is answering a slightly different question - the number of gene products in human with a positive annotation to one or more GO terms. There are 6 human gene products with only negative annotations, which accounts for the discrepancy. > BTW, you'll not be surprised to see that the output of the following > agrees with what the "Current annotations" page states: > > zcat gene_associations.goa_human.gz | grep -v '^!' | cut -f2 | sort - > u | wc -l > 18587 > > ...and the one from the following matches what you found: > > zcat gene_associations.goa_human.gz | grep -v '^!' | cut -f3 | sort - > u | wc -l > 18098 > > In these commands I use the latest version of the > gene_associations.goa_human.gz file: > > !CVS Version: Revision: 1.129 $ > !GOC Validation Date: 10/31/2009 $ > !Submission Date: 10/8/2009 > > > Cheers, > > Gabriel Berriz Thanks for the tips! Cheers Chris > > > On 091106F, at 18:41, Purvesh Khatri wrote: > >> Hi, >> >> I am trying to count the number of gene products currently >> annotated in GOA for human. I imported "go_200911-assocdb- >> tables.tar.gz" in a local database and used the following two >> queries to count the number of unique gene products with and >> without "IEA" evidence code: >> >> select count(distinct g.symbol) from association a, gene_product g, >> species s >> where a.gene_product_id = g.id and g.species_id = s.id >> and s.ncbi_taxa_id = 9606; >> >> select count(distinct g.symbol) from association a, evidence e, >> gene_product g, species s >> where a.gene_product_id = g.id and e.association_id = a.id and >> g.species_id = s.id >> and s.ncbi_taxa_id = 9606 and e.code != 'IEA'; >> >> My question is whether these queries are correct or not. The reason >> for asking the question is that using the first query, I get 18098 >> gene products as being annotated. However, the "Current >> annotations" page on GO website (http://geneontology.org/GO.current.annotations.shtml?all >> ) lists the number of annotated gene products as 18587. >> >> Thank you for you help. >> >> Best regards, >> >> Purvesh Khatri, Ph.D. >> Postdoc (Butte/Sarwal labs) >> Stanford University >> Center for Biomedical Information Research (BMIR) >> 251 Campus Dr., Stanford, CA 94305 >> Phone: (313) 433-2836 >> >> Division of Nephrology >> Department of Pediatrics >> Stanford Medical School >> 300 Pasteur Dr., Room G327 >> Stanford, CA 94305 >> Phone: (650) 724-3765 >> >> > > > > ============================================================= > Gabriel F. Berriz, PhD > Senior Bioinformatics Developer > Roth Lab > Biological Chemistry and Molecular Pharmacology -- Harvard Medical > School > Seeley G. Mudd Building 322B > Boston, MA 02115-5701 > Telephone: 617.432.3555 > Fax: 617.432.3557 > > > > > _______________________________________________ > Gofriends mailing list > Gofriends at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/gofriends -------------- next part -------------- An HTML attachment was scrubbed... URL: From huntley at ebi.ac.uk Mon Nov 9 02:24:39 2009 From: huntley at ebi.ac.uk (Rachael Huntley) Date: Mon, 09 Nov 2009 10:24:39 +0000 Subject: [Gofriends] Number of annotated gene products In-Reply-To: <885835044.1234241257624341481.JavaMail.root@zm09.stanford.edu> References: <885835044.1234241257624341481.JavaMail.root@zm09.stanford.edu> Message-ID: <4AF7EDE7.8050503@ebi.ac.uk> Hi Purvesh, The reason you are seeing such a dramatic decrease in annotations and gene products is that in February 2009 we stopped using the International Protein Index human protein set to make the human gene association file and started using the complete human proteome from UniProtKB/Swiss-Prot. The sharp decrease is due to us no longer providing electronic annotations to UniProtKB/TrEMBL proteins in the human file. Here is the news release we sent out on February 6th 2009 describing this change; > Please note that the gene_association.,goa_human file provided in the > next GOA release will no longer be made using the IPI non-redundant > human protein set. (http://www.ebi.ac.uk/IPI/ > ). Instead, the next version of > this file will now use the complete human proteome now available in > UniProtKB/Swiss-Prot (http://www.uniprot.org/news/2008/09/02/release). > This change will enable us to provide a non-redundant set of > annotations for the human proteome, therefore please expect a sharp > drop in both the number of distinct sequence identifiers and in the > total number of electronic annotations in the new file. > > The name and format of this human file will remain the same, however > annotations will be assigned to proteins only from the 'UniProtKB' > (column 1) database source. Human IPI identifiers will continue to be > included in column 11 of annotations. > > In addition, the cross-references file for human IPI set > (human.xrefs.gz), will no longer be provided. Instead, identifier > mapping will be possible using the UniProt ID mapping file, available > from: > ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping.dat.gz > > > idmapping.dat.gz is a tab-delimited table, which includes mappings for > 20 different sequence identifier types (and will be expanded in time > for the next file release to include IPI identifiers). > > A readme for this file is available from: > ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/idmapping/README I hope that helps. Best wishes, Rachael. Purvesh Khatri wrote: > Hi Chris, > > Thank you for the quick reply. That explains the discrepancy. However, > this leads me to another question. > > Running the same query (i.e., counting the number of annotated genes) > on October 2008 GO assocdb returns the number of genes as 36,726 > versus 18,587 genes in November 2009 release. The number of annotated > genes is essentially reduced by almost 50% between October 2008 and > November 2009. The number of associations for human in the same period > have gone down from 197,411 to 159,303. What is the reason for such a > dramatic reduction in the number of associations (and the > corresponding reduction in the number of annotated genes)? > > Once again, thank you for your help. > > Cheers, > > Purvesh > > > ----- Original Message ----- > From: "Chris Mungall" > To: "Purvesh Khatri" > Cc: "gofriends" , "Daniel Barrell" > > Sent: Friday, November 6, 2009 6:32:20 PM GMT -08:00 US/Canada Pacific > Subject: Re: [Gofriends] Number of annotated gene products > > > Hi Purvesh, > > The reason for the lower number is that you are counting gene symbols, > not gene IDs. Try this: > > select count(distinct g.dbxref_id) from association a, gene_product g, > species s > where a.gene_product_id = g.id and g.species_id = s.id > and s.ncbi_taxa_id = 9606; > > It gives the right number (18587) > > Shouldn't symbols be unique with a species you might ask? We can take > a look: > > select > g1.symbol, > x1.xref_dbname, > x1.xref_key, > x2.xref_dbname, > x2.xref_key > from > gene_product g1, > gene_product g2, > dbxref x1, > dbxref x2, > species s > where > g1.species_id = s.id and > g2.species_id = s.id and > g1.dbxref_id = x1.id and > g2.dbxref_id = x2.id and > s.ncbi_taxa_id = 9606 and > g1.symbol=g2.symbol and > g1.id != g2.id; > > as you can see a subset of these are due to alternate isoforms of a > generic protein sharing the same symbol. This is an area we're > actively looking into. > > In other cases we have what appear to be different proteins sharing > the same symbol: > > | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot > | Q9BXR3 | > | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot > | Q9WJR5 | > | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot > | Q7LDI9 | > | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot > | Q69383 | > | ERVK6 | UniProtKB/Swiss-Prot | Q9Y6I0 | UniProtKB/Swiss-Prot > | Q69384 | > > I haven't looked at these and I have to head off just now, but I can > get back to you later. > > Cheers > Chris > > On Nov 6, 2009, at 3:41 PM, Purvesh Khatri wrote: > > > Hi, > > > > I am trying to count the number of gene products currently annotated > > in GOA for human. I imported "go_200911-assocdb-tables.tar.gz" in a > > local database and used the following two queries to count the > > number of unique gene products with and without "IEA" evidence code: > > > > select count(distinct g.symbol) from association a, gene_product g, > > species s > > where a.gene_product_id = g.id and g.species_id = s.id > > and s.ncbi_taxa_id = 9606; > > > > select count(distinct g.symbol) from association a, evidence e, > > gene_product g, species s > > where a.gene_product_id = g.id and e.association_id = a.id and > > g.species_id = s.id > > and s.ncbi_taxa_id = 9606 and e.code != 'IEA'; > > > > My question is whether these queries are correct or not. The reason > > for asking the question is that using the first query, I get 18098 > > gene products as being annotated. However, the "Current annotations" > > page on GO website > (http://geneontology.org/GO.current.annotations.shtml?all > > ) lists the number of annotated gene products as 18587. > > > > Thank you for you help. > > > > Best regards, > > > > Purvesh Khatri, Ph.D. > > Postdoc (Butte/Sarwal labs) > > Stanford University > > Center for Biomedical Information Research (BMIR) > > 251 Campus Dr., Stanford, CA 94305 > > Phone: (313) 433-2836 > > > > Division of Nephrology > > Department of Pediatrics > > Stanford Medical School > > 300 Pasteur Dr., Room G327 > > Stanford, CA 94305 > > Phone: (650) 724-3765 > > > > _______________________________________________ > > Gofriends mailing list > > Gofriends at geneontology.org > > http://fafner.stanford.edu/mailman/listinfo/gofriends > > ------------------------------------------------------------------------ > > _______________________________________________ > Gofriends mailing list > Gofriends at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/gofriends > -- GOA and IntAct Curator European Bioinformatics Institute Welcome Trust Genome Campus Hinxton Cambridge, CB10 1SD UK Tel: 01223 492515 Fax: 01223 494468 Email: huntley at ebi.ac.uk GOA: http://www.ebi.ac.uk/GOA IntAct: http://www.ebi.ac.uk/intact From cherry at stanford.edu Mon Nov 9 06:26:38 2009 From: cherry at stanford.edu (Mike Cherry) Date: Mon, 9 Nov 2009 06:26:38 -0800 Subject: [Gofriends] Number of annotated gene products In-Reply-To: <973BF68D-73D0-4332-8EEB-22859E3E6B84@hms.harvard.edu> References: <18007562.1244541257633314602.JavaMail.root@zm09.stanford.edu> <973BF68D-73D0-4332-8EEB-22859E3E6B84@hms.harvard.edu> Message-ID: Gabriel, All previous GAF files since 2004 are available from CVS. Original submission: http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/submission/ Validated and filtered: http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/ -Mike On Nov 9, 2009, at 5:38 AM, Gabriel Berriz wrote: > > On 091107S, at 17:35, Purvesh Khatri wrote: > >> I am trying to find how the number of annotations and the number of >> annotated genes have changed over the last year. The MySQL database >> export archive, available for every month, is an appropriate >> resource than the flat files. > > > > In general, it would be very useful if there were an archive of all > the gene-association flat files to date. If it doesn't exist > already, I would like to add this to the wishlist. > > > Gabriel Berriz > > > > ============================================================= > Gabriel F. Berriz, PhD > Senior Bioinformatics Developer > Roth Lab > Biological Chemistry and Molecular Pharmacology -- Harvard Medical > School > Seeley G. Mudd Building 322B > Boston, MA 02115-5701 > Telephone: 617.432.3555 > Fax: 617.432.3557 > > > > > _______________________________________________ > Gofriends mailing list > Gofriends at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/gofriends From gberriz at hms.harvard.edu Mon Nov 9 06:35:56 2009 From: gberriz at hms.harvard.edu (Gabriel Berriz) Date: Mon, 9 Nov 2009 09:35:56 -0500 Subject: [Gofriends] Number of annotated gene products In-Reply-To: References: <18007562.1244541257633314602.JavaMail.root@zm09.stanford.edu> <973BF68D-73D0-4332-8EEB-22859E3E6B84@hms.harvard.edu> Message-ID: <96627504-1F83-4BE8-86B3-118476848411@hms.harvard.edu> That's great to know. Thanks! G. On 091109M, at 09:26, Mike Cherry wrote: > Gabriel, > > All previous GAF files since 2004 are available from CVS. > > Original submission: http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/submission/ > > Validated and filtered: http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/ > > -Mike > > > On Nov 9, 2009, at 5:38 AM, Gabriel Berriz wrote: > >> >> On 091107S, at 17:35, Purvesh Khatri wrote: >> >>> I am trying to find how the number of annotations and the number of >>> annotated genes have changed over the last year. The MySQL database >>> export archive, available for every month, is an appropriate >>> resource than the flat files. >> >> >> >> In general, it would be very useful if there were an archive of all >> the gene-association flat files to date. If it doesn't exist >> already, I would like to add this to the wishlist. >> >> >> Gabriel Berriz >> >> >> >> ============================================================= >> Gabriel F. Berriz, PhD >> Senior Bioinformatics Developer >> Roth Lab >> Biological Chemistry and Molecular Pharmacology -- Harvard Medical >> School >> Seeley G. Mudd Building 322B >> Boston, MA 02115-5701 >> Telephone: 617.432.3555 >> Fax: 617.432.3557 >> >> >> >> >> _______________________________________________ >> Gofriends mailing list >> Gofriends at geneontology.org >> http://fafner.stanford.edu/mailman/listinfo/gofriends > From dbarrell at ebi.ac.uk Mon Nov 9 06:39:35 2009 From: dbarrell at ebi.ac.uk (Daniel Barrell) Date: Mon, 09 Nov 2009 14:39:35 +0000 Subject: [Gofriends] Number of annotated gene products In-Reply-To: References: <18007562.1244541257633314602.JavaMail.root@zm09.stanford.edu> <973BF68D-73D0-4332-8EEB-22859E3E6B84@hms.harvard.edu> Message-ID: <4AF829A7.9070903@ebi.ac.uk> Hi Gabriel, The GOA group also has an archive of all previous releases. Here's the archive for human: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/HUMAN/ Regards Dan Mike Cherry wrote: > Gabriel, > > All previous GAF files since 2004 are available from CVS. > > Original submission: > http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/submission/ > > > Validated and filtered: > http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/ > > -Mike > > > On Nov 9, 2009, at 5:38 AM, Gabriel Berriz wrote: > >> >> On 091107S, at 17:35, Purvesh Khatri wrote: >> >>> I am trying to find how the number of annotations and the number of >>> annotated genes have changed over the last year. The MySQL database >>> export archive, available for every month, is an appropriate resource >>> than the flat files. >> >> >> >> In general, it would be very useful if there were an archive of all >> the gene-association flat files to date. If it doesn't exist already, >> I would like to add this to the wishlist. >> >> >> Gabriel Berriz >> >> >> >> ============================================================= >> Gabriel F. Berriz, PhD >> Senior Bioinformatics Developer >> Roth Lab >> Biological Chemistry and Molecular Pharmacology -- Harvard Medical School >> Seeley G. Mudd Building 322B >> Boston, MA 02115-5701 >> Telephone: 617.432.3555 >> Fax: 617.432.3557 >> >> >> >> >> _______________________________________________ >> Gofriends mailing list >> Gofriends at geneontology.org >> http://fafner.stanford.edu/mailman/listinfo/gofriends > > _______________________________________________ > Gofriends mailing list > Gofriends at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/gofriends -- Daniel Barrell EMBL - The EBI Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD Phone: +44 (0)1223 492551 Email: dbarrell at ebi.ac.uk From adrian.paschke at gmx.de Tue Nov 10 01:58:10 2009 From: adrian.paschke at gmx.de (Adrian Paschke) Date: Tue, 10 Nov 2009 10:58:10 +0100 Subject: [Gofriends] CfPart - SWAT4LS Semantic Web Applications and Tools for Life Sciences Message-ID: <012501ca61ec$4fb134b0$ef139e10$@paschke@gmx.de> *** apologies for multiple postings *** Registration is now open for SWAT4LS Semantic Web Applications and Tools for Life Sciences to be held in Amsterdam, Science Park, 20th November 2009 Registration: http://www.swat4ls.org/2009/rform.php#rform ------------------------------ Provisional scientific program: Keynotes: * Alan Ruttenberg: Semantic Web Technology to Support Studying the Relation of HLA Structure Variation to Disease * Barend Mons: (provisional) CWA: The meta-analysed Semantic Web , getting rid of ambiguity and redundancy * Michael Schroeder: Prediction of drug-target interactions from literature by context similarity Accepted papers: * OBO & OWL: Roundtrip Ontology Transformations Syed Hamid Tirmizi, Stuart Aitken, Dilvan Moreira, Chris Mungall, Juan Sequeda, Nigam H. Shah and Daniel P. Miranker. * Mining Semantic Networks of Bioinformatics e-Resources from Literature Hammad Afzal, James Eales, Robert Stevens and Goran Nenadic. * Linking Open Drug Data to Cheminformatics and Proteochemometrics Egon Willighagen and Jarl Wikberg. * TIM: A Semantic Web Application for the Specification of Metadata Items in Clinical Research Matthias L?be, Magnus Knuth and Roland M?cke. * Semantics-Based Composition of EMBOSS Services with Bio-jETI Anna-Lena Lamprecht, Stefan Naujokat, Tiziana Margaria and Bernhard Steffen. * Towards the Ontology-based Classification of Lymphoma Patients using Semantic Image Annotations Sonja Zillner. Short communications: * Weekend Triple Billionaire Jerven Bolleman and Thomas Kappler. * Towards a Logic-based Assessment of the compatibility of UMLS sources Ernesto Jimenez-Ruiz, Bernardo Cuenca Grau, Rafael Berlanga and Ian Horrocks. * Using the NCBO Web Services for Concept Recognition and Ontology Annotation of Expression Datasets Simon Twigger, Joey Geiger and Jennifer Smith. Demos: * A system for repairing missing is-a structure in ontologies Patrick Lambrix, Qiang Liu and He Tan. * NeuroLex.org - A semantic wiki for neuroinformatics based on the NIF Standard Ontology Stephen Larson, Sarah Maynard, Fahim Imam and Maryann Martone. * ContentCVS: A CVS-based Collaborative ONTology ENgineering Tool Ernesto Jimenez-Ruiz, Bernardo Cuenca Grau, Ian Horrocks and Rafael Berlanga. * DC-THERA Directory, a Knowledge Management System for the support of the European Dendritic Cell Immunology Community Marco Brandizi, Michaela G?ndel, Ciro Scognamiglio and Andrea Splendiani. Panel discussion: Theme: TBD Plus poster sessions (authors of posters are not listed in this call, but will be listed on the website) An updates version of the program will be available at http://www.swat4ls.org/2009/progr.php Early registrations are very appreciated. ---------------------------------------- Registrations costs: full registration: 50 euros social dinner (optional): 35 euros PhD students: free (subject to availability) To register please visit: http://www.swat4ls.org/2009/rform.php#rform Organization * M. Scott Marshall, Leiden University Medical Center / University of Amsterdam, The Netherlands * Albert Burger, School of Mathematical and Computer Sciences, Heriot-Watt University, and Human Genetics Unit, Medical Research Council, Edinburgh, Scotland, United Kingdom * Adrian Paschke, Corporate Semantic Web, Freie Universitaet Berlin, Germany * Paolo Romano, Bioinformatics, National Cancer Research Institute, Genova, Italy * Andrea Splendiani, Biomathematics and Bioinformatics dept., Rothamsted Research, UK More information * http://www.swat4ls.org/2009/ * http://swat4ls.blogspot.com * info at swat4ls.org ------------ We wish to thank once again the review panel that made this possible: * Christopher J. O. Baker, Department of Computer Science and Applied Statistics, University of Brunswick, Canada * Pedro Barahona, Department of Informatics, New University of Lisboa, Lisboa, Portugal * Liliana Barrio-Alvers, Transinsight GmbH, Dresden, Germany * Olivier Bodenreider, National Library of Medicine, Bethesda, MD, United States of America * Matt-Mouley Bouamrane, School of Computer Science, University of Machester, manchester, United Kingdom * Werner Ceusters, NY CoE in Bioinformatics and Life Sciences, University at Buffalo, Buffalo, NY, United States of America * Kei Cheung, Center for Medical Informatics, Yale University School of Medicine, New Haven, United States of America * Tim Clark, Massachusetts General Hospital and Harvard Medical School, Boston MA, United States of America * Marie-Dominique Devignes, LORIA, Vandoeuvre les Nancy, France * Olivier Dameron, INSERM U936, University of Rennes 1, France * Michel Dumontier, Carleton University, Ottawa, Ontario, Canada * Huajun Chen, Zhejiang University, China * Duncan Hull, School of Chemistry, University of Manchester, UK * C. Maria Keet, Faculty of Computer Science, Free University of Bozen-Bolzano, Bolzano, Italy * Graham Kemp, Chalmers University of Technology, Sweden * Jacob Tilman Koehler, Department of Molecular Biotechnology, Institute of Medical Biology, University of Troms?, Troms?, Norway * Michael Krauthammer, Department of Pathology, Yale University School of Medicine, United States of America * Martin Kuiper, Department of Pathology, Systems Biology group, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway * Patrick Lambrix, Department of Computer and Information Science, Link?ping University, Link?ping, Sweden * Phillip Lord, School of Computing Science, Newcastle University, Newcastle-upon-Tyne, United Kingdom * M. Scott Marshall, Leiden University Medical Center / University of Amsterdam, The Netherlands * Chris Mungall, Lawrence Berkeley National Laboratories, United States of America * Stephan Philippi, Institute for Software Technology, University of Koblenz-Landau, Koblenz, Germany * Marco Roos, Instituut voor Informatica, University of Amsterdam, Netherlands * Alan Ruttenberg, Science Commons, Cambridge, MA, United States of America * Matthias Samwald, DERI, Galway, Ireland, and Konrad Lorenz Institute for Evolution and Cognition Research, Altenberg, Austria * Nigam Shah, Center for Biomedical Informatics Research, Stanford, United States of America * Michael Schr?der, Biotechnology Centre, TU Dresden, Dresden, Germany * Robert Stevens, School of Computer Science, University of Manchester, Manchester, United Kingdom * Tetsuro Toyoda, Genomic Sciences Center, RIKEN, Yokohama, Japan * Mark D. Wilkinson, iCAPTURE Center, St. Paul Hospital, Vancouver, Canada From tonys at ebi.ac.uk Thu Nov 19 10:22:20 2009 From: tonys at ebi.ac.uk (Tony Sawford) Date: Thu, 19 Nov 2009 10:22:20 +0000 Subject: [Gofriends] Fwd: getGOTerm service In-Reply-To: References: Message-ID: <4B051C5C.5040305@ebi.ac.uk> Hi, Apparently our MySQL instance crashed. It has been restarted, and our systems people are investigating the cause of the crash, with a view to ensuring that it doesn't happen again. Apologies for any inconvenience this may have caused. Cheers, Tony Mark wrote: > Hi Go Friends! > > Please see the message below - whassap? :-) (it may be a problem at > our end, but I want to make sure the mysql interface is up and running > before I spend time trying to trouble-shoot previously functional > code...) > > cheers! > > Mark > > > > ------- Forwarded message ------- > From: "Luke McCarthy" > To: "Mark Wilkinson" > Cc: > Subject: getGOTerm service > Date: Wed, 18 Nov 2009 18:57:00 -0800 > > Despite the message we got last week, your getGOTerm service is still > throwing this error: > > DBI connect('go_latest:mysql.ebi.ac.uk:4085','go_select',...) failed: > Lost > connection to MySQL server at 'reading initial communication packet', > system error: 104 at /var/www/sadi/services/getGOTerm line 77 > > Just FYI, > > Luke > ------------------------------------------------------------------------ > > _______________________________________________ > Gofriends mailing list > Gofriends at geneontology.org > http://fafner.stanford.edu/mailman/listinfo/gofriends