[Go] growth in manual GO annotation
Emily Dimmer
edimmer at ebi.ac.uk
Fri Apr 18 04:52:40 PDT 2008
Hi Doug,
Looking at statistics from the GOA files - the number of species with
any annotation in our GOA UniProt files is currently 154,730.
And then 932 taxons have non-IEA or non-ND annotations. However, within
this latter group there are only 16 taxons with more than 1000
annotations (non-IEA, non-ND), and only 54 with more than 100
annotations. Looking at a couple of taxons it is obvious for many that
are not model organism species - over 90% of annotations are due to ISS
statements.
Hope this helps!
Cheers,
Emily
Doug howe wrote:
> Agreed.
>
> Judith Blake wrote:
>
>> Doug,
>> I would check with Emily about the species and also the evidence codes
>> for those species. There are several mechanisms that I can think of
>> where annotations would be other than IEA.
>>
>> Judy
>>
>> Doug howe wrote:
>>
>>> WOW...that's way more species than I would have predicted....thanks
>>> Chris.
>>>
>>> Chris Mungall wrote:
>>>
>>>
>>>> On Apr 10, 2008, at 10:08 AM, Doug howe wrote:
>>>>
>>>>
>>>>
>>>>> Oopps...let me be more clear..I'm looking for:
>>>>>
>>>>> 1. The number of distinct gene products (across all species)
>>>>> annotated using NON-IEA, NON-ND evidence on 1/1 of each year from
>>>>> 2002-2008.
>>>>>
>>>>>
>>>> SELECT count(DISTINCT gene_product_id) AS num_gps
>>>> FROM association INNER JOIN evidence ON
>>>> (evidence.association_id=association.id)
>>>> WHERE code != 'IEA' AND code != 'ND';
>>>>
>>>> go_old_20030101
>>>> num_gps
>>>> 42746
>>>>
>>>> go_old_20040101
>>>> num_gps
>>>> 99116
>>>>
>>>> go_old_20050101
>>>> num_gps
>>>> 144635
>>>>
>>>> go_old_20060101
>>>> num_gps
>>>> 136734
>>>>
>>>> go_old_20070101
>>>> num_gps
>>>> 140370
>>>>
>>>> go_old_20080101
>>>> num_gps
>>>> 192535
>>>>
>>>>
>>>>
>>>>
>>>>> 2. The number of distinct species with any NON-IEA, NON-ND GO
>>>>> annotation on 1/1 of each year from 2002-2008.
>>>>>
>>>>>
>>>> SELECT count(DISTINCT species_id) AS num_species
>>>> FROM gene_product
>>>> INNER JOIN association ON
>>>> (gene_product.id=association.gene_product_id)
>>>> INNER JOIN evidence ON (evidence.association_id=association.id)
>>>> WHERE code != 'IEA' AND code != 'ND';
>>>> go_old_20030101
>>>> num_species
>>>> 207
>>>>
>>>> go_old_20040101
>>>> num_species
>>>> 375
>>>>
>>>> go_old_20050101
>>>> num_species
>>>> 533
>>>>
>>>> go_old_20060101
>>>> num_species
>>>> 638
>>>>
>>>> go_old_20070101
>>>> num_species
>>>> 884
>>>>
>>>> go_old_20080101
>>>> num_species
>>>> 930
>>>>
>>>> (yep, these numbers are correct, there is a lot of non-MOD
>>>> annotations to GO)
>>>>
>>>>
>>>>
>>>>> Doug howe wrote:
>>>>>
>>>>>
>>>>>> Thanks Chris those are very useful numbers. If you don't mind
>>>>>> running two more queries, it won't be necessary to open the older
>>>>>> stuff to Goose.
>>>>>> I'd be interested to see:
>>>>>> 1. The number of distinct gene products (across all species)
>>>>>> annotated on 1/1 of each year from 2002-2008.
>>>>>> 2. The number of distinct species with any GO annotation on 1/1
>>>>>> of each year from 2002-2008.
>>>>>>
>>>>>> -Thanks!
>>>>>> -Doug
>>>>>>
>>>>>>
>>>>>> Chris Mungall wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Apr 8, 2008, at 9:48 AM, Doug howe wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Does anyone have, or know how to get, historical stats on the
>>>>>>>> number of
>>>>>>>> GO annotations that have been contributed to the GOC over time?
>>>>>>>> I'm
>>>>>>>> looking for the number of non-IEA, non-ND GO annotations that
>>>>>>>> existed
>>>>>>>> for each year from 2002-2008.
>>>>>>>>
>>>>>>>> Midori provided me with the following numbers of GO terms for that
>>>>>>>> period if anyone is interested:
>>>>>>>> date total obsolete
>>>>>>>> 1/1/2002 10305 152
>>>>>>>> 1/1/2003 13339 383
>>>>>>>> 1/1/2004 16771 725
>>>>>>>> 1/1/2005 18219 969
>>>>>>>> 1/1/2006 20348 992
>>>>>>>> 1/1/2007 22928 1011
>>>>>>>> 1/1/2008 25758 1137
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> We have historical go dbs mirrored here - we can open these to
>>>>>>> GOOSE if you like, or you can just request queries.
>>>>>>>
>>>>>>> This is what you're after:
>>>>>>>
>>>>>>> SELECT count(*) AS num_annots
>>>>>>> FROM association INNER JOIN evidence ON
>>>>>>> (evidence.association_id=association.id)
>>>>>>> WHERE code != 'IEA' AND code != 'ND';
>>>>>>> go_old_20030101
>>>>>>> num_annots
>>>>>>> 133699
>>>>>>>
>>>>>>> go_old_20040101
>>>>>>> num_annots
>>>>>>> 386339
>>>>>>>
>>>>>>> go_old_20050101
>>>>>>> num_annots
>>>>>>> 416224
>>>>>>>
>>>>>>> go_old_20060101
>>>>>>> num_annots
>>>>>>> 469107
>>>>>>>
>>>>>>> go_old_20070101
>>>>>>> num_annots
>>>>>>> 489402
>>>>>>>
>>>>>>> go_old_20080101
>>>>>>> num_annots
>>>>>>> 580052
>>>>>>>
>>>>>>>
>>>>>>> This one may also be informative: the number of terms used
>>>>>>> directly in annotations (all):
>>>>>>>
>>>>>>> SELECT count(DISTINCT term_id) AS num_terms_used_directly
>>>>>>> FROM association;
>>>>>>> go_old_20030101
>>>>>>> num_terms_used_directly
>>>>>>> 7116
>>>>>>>
>>>>>>> go_old_20040101
>>>>>>> num_terms_used_directly
>>>>>>> 9008
>>>>>>>
>>>>>>> go_old_20050101
>>>>>>> num_terms_used_directly
>>>>>>> 10134
>>>>>>>
>>>>>>> go_old_20060101
>>>>>>> num_terms_used_directly
>>>>>>> 11113
>>>>>>>
>>>>>>> go_old_20070101
>>>>>>> num_terms_used_directly
>>>>>>> 12340
>>>>>>>
>>>>>>> go_old_20080101
>>>>>>> num_terms_used_directly
>>>>>>> 13812
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -Doug
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Go mailing list
>>>>>>>> Go at geneontology.org
>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> Go mailing list
>>>>>> Go at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>> _______________________________________________
>>> Go mailing list
>>> Go at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>
>>>
>>
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go
>
--
Do you need any additional GO annotation resources?
Which proteins would you like annotated with GO?
Let us know in the GOA User Survey, available at: http://www.ebi.ac.uk/GOA/contactus.html
------------------------------------------------------------------
Emily Dimmer Ph.D.
GOA Coordinator
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD, U.K.
Tel: +44 1223 494654
Fax: +44 1223 494468
email: edimmer at ebi.ac.uk
URL: http://www.ebi.ac.uk/goa
More information about the Go
mailing list