[Go] growth in manual GO annotation
Mike Cherry
cherry at stanford.edu
Fri Apr 11 16:50:16 PDT 2008
Doug,
Here are two tables for all projects by year. One for species in
their filtered gene association file, and one for the gene products.
IEA and ND annotations were excluded before the counts. The numbers
should be similar to those Chris got from the GODB, this is just a
break down by project. I got these from processing the gene
association files, this can be a little different from what was
successfully loaded into GODB.
Note there were no gene association files on January 1, 2002. On
1/1/2003 there was only the TIGR gene index file. Most projects did
not submit a file until something during 2004. A null cell means the
file did not exist at that time, a zero means there were no non-IEA/
non-ND annotations. All numbers are from January 1 on the year
indicated.
-Mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ga-year-geneproducts.xls
Type: application/octet-stream
Size: 1613 bytes
Desc: not available
Url : http://fafner.stanford.edu/pipermail/go/attachments/20080411/747a62fd/attachment.obj
-------------- next part --------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ga-year-species.xls
Type: application/octet-stream
Size: 1240 bytes
Desc: not available
Url : http://fafner.stanford.edu/pipermail/go/attachments/20080411/747a62fd/attachment-0001.obj
-------------- next part --------------
P.S. Has anyone won email bingo yet?
On Apr 10, 2008, at 12:28 PM, Doug howe wrote:
> WOW...that's way more species than I would have predicted....thanks
> Chris.
>
> Chris Mungall wrote:
>>
>> On Apr 10, 2008, at 10:08 AM, Doug howe wrote:
>>
>>> Oopps...let me be more clear..I'm looking for:
>>>
>>> 1. The number of distinct gene products (across all species)
>>> annotated using NON-IEA, NON-ND evidence on 1/1 of each year from
>>> 2002-2008.
>>
>> SELECT count(DISTINCT gene_product_id) AS num_gps
>> FROM association INNER JOIN evidence ON
>> (evidence.association_id=association.id)
>> WHERE code != 'IEA' AND code != 'ND';
>>
>> go_old_20030101
>> num_gps
>> 42746
>>
>> go_old_20040101
>> num_gps
>> 99116
>>
>> go_old_20050101
>> num_gps
>> 144635
>>
>> go_old_20060101
>> num_gps
>> 136734
>>
>> go_old_20070101
>> num_gps
>> 140370
>>
>> go_old_20080101
>> num_gps
>> 192535
>>
>>
>>> 2. The number of distinct species with any NON-IEA, NON-ND GO
>>> annotation on 1/1 of each year from 2002-2008.
>>
>> SELECT count(DISTINCT species_id) AS num_species
>> FROM gene_product
>> INNER JOIN association ON
>> (gene_product.id=association.gene_product_id)
>> INNER JOIN evidence ON (evidence.association_id=association.id)
>> WHERE code != 'IEA' AND code != 'ND';
>> go_old_20030101
>> num_species
>> 207
>>
>> go_old_20040101
>> num_species
>> 375
>>
>> go_old_20050101
>> num_species
>> 533
>>
>> go_old_20060101
>> num_species
>> 638
>>
>> go_old_20070101
>> num_species
>> 884
>>
>> go_old_20080101
>> num_species
>> 930
>>
>> (yep, these numbers are correct, there is a lot of non-MOD
>> annotations
>> to GO)
>>
>>>
>>> Doug howe wrote:
>>>> Thanks Chris those are very useful numbers. If you don't mind
>>>> running two more queries, it won't be necessary to open the older
>>>> stuff to Goose.
>>>> I'd be interested to see:
>>>> 1. The number of distinct gene products (across all species)
>>>> annotated on 1/1 of each year from 2002-2008.
>>>> 2. The number of distinct species with any GO annotation on 1/1 of
>>>> each year from 2002-2008.
>>>>
>>>> -Thanks!
>>>> -Doug
>>>>
>>>>
>>>> Chris Mungall wrote:
>>>>
>>>>> On Apr 8, 2008, at 9:48 AM, Doug howe wrote:
>>>>>
>>>>>
>>>>>> Does anyone have, or know how to get, historical stats on the
>>>>>> number of
>>>>>> GO annotations that have been contributed to the GOC over
>>>>>> time? I'm
>>>>>> looking for the number of non-IEA, non-ND GO annotations that
>>>>>> existed
>>>>>> for each year from 2002-2008.
>>>>>>
>>>>>> Midori provided me with the following numbers of GO terms for
>>>>>> that
>>>>>> period if anyone is interested:
>>>>>> date total obsolete
>>>>>> 1/1/2002 10305 152
>>>>>> 1/1/2003 13339 383
>>>>>> 1/1/2004 16771 725
>>>>>> 1/1/2005 18219 969
>>>>>> 1/1/2006 20348 992
>>>>>> 1/1/2007 22928 1011
>>>>>> 1/1/2008 25758 1137
>>>>>>
>>>>> We have historical go dbs mirrored here - we can open these to
>>>>> GOOSE if you like, or you can just request queries.
>>>>>
>>>>> This is what you're after:
>>>>>
>>>>> SELECT count(*) AS num_annots
>>>>> FROM association INNER JOIN evidence ON
>>>>> (evidence.association_id=association.id)
>>>>> WHERE code != 'IEA' AND code != 'ND';
>>>>> go_old_20030101
>>>>> num_annots
>>>>> 133699
>>>>>
>>>>> go_old_20040101
>>>>> num_annots
>>>>> 386339
>>>>>
>>>>> go_old_20050101
>>>>> num_annots
>>>>> 416224
>>>>>
>>>>> go_old_20060101
>>>>> num_annots
>>>>> 469107
>>>>>
>>>>> go_old_20070101
>>>>> num_annots
>>>>> 489402
>>>>>
>>>>> go_old_20080101
>>>>> num_annots
>>>>> 580052
>>>>>
>>>>>
>>>>> This one may also be informative: the number of terms used
>>>>> directly
>>>>> in annotations (all):
>>>>>
>>>>> SELECT count(DISTINCT term_id) AS num_terms_used_directly
>>>>> FROM association;
>>>>> go_old_20030101
>>>>> num_terms_used_directly
>>>>> 7116
>>>>>
>>>>> go_old_20040101
>>>>> num_terms_used_directly
>>>>> 9008
>>>>>
>>>>> go_old_20050101
>>>>> num_terms_used_directly
>>>>> 10134
>>>>>
>>>>> go_old_20060101
>>>>> num_terms_used_directly
>>>>> 11113
>>>>>
>>>>> go_old_20070101
>>>>> num_terms_used_directly
>>>>> 12340
>>>>>
>>>>> go_old_20080101
>>>>> num_terms_used_directly
>>>>> 13812
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -Doug
>>>>>>
>>>>>> _______________________________________________
>>>>>> Go mailing list
>>>>>> Go at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> Go mailing list
>>>> Go at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>
>>>>
>>>
>>
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go
More information about the Go
mailing list