[Go] growth in manual GO annotation

Emily Dimmer edimmer at ebi.ac.uk
Fri Apr 18 04:52:40 PDT 2008


Hi Doug,

Looking at statistics from the GOA files - the number of species with 
any annotation in our GOA UniProt files  is currently 154,730.
And then 932 taxons have non-IEA or non-ND annotations. However, within 
this latter group there are only 16 taxons with more than 1000 
annotations (non-IEA, non-ND), and only 54 with more than 100 
annotations. Looking at a couple of taxons it is obvious for many that 
are not model organism species - over 90% of annotations are due to ISS 
statements.
Hope this helps!

Cheers,
Emily

Doug howe wrote:
> Agreed.
>
> Judith Blake wrote:
>   
>> Doug,
>> I would check with Emily about the species and also the evidence codes 
>> for those species.  There are several mechanisms that I can think of 
>> where annotations would be other than IEA.
>>
>> Judy
>>
>> Doug howe wrote:
>>     
>>> WOW...that's way more species than I would have predicted....thanks 
>>> Chris.
>>>
>>> Chris Mungall wrote:
>>>  
>>>       
>>>> On Apr 10, 2008, at 10:08 AM, Doug howe wrote:
>>>>
>>>>    
>>>>         
>>>>> Oopps...let me be more clear..I'm looking for:
>>>>>
>>>>> 1.  The number of distinct gene products (across all species) 
>>>>> annotated using NON-IEA, NON-ND evidence on 1/1 of each year from 
>>>>> 2002-2008.
>>>>>       
>>>>>           
>>>> SELECT count(DISTINCT gene_product_id) AS num_gps
>>>> FROM association INNER JOIN evidence ON 
>>>> (evidence.association_id=association.id)
>>>> WHERE code != 'IEA' AND code != 'ND';
>>>>
>>>> go_old_20030101
>>>> num_gps
>>>> 42746
>>>>
>>>> go_old_20040101
>>>> num_gps
>>>> 99116
>>>>
>>>> go_old_20050101
>>>> num_gps
>>>> 144635
>>>>
>>>> go_old_20060101
>>>> num_gps
>>>> 136734
>>>>
>>>> go_old_20070101
>>>> num_gps
>>>> 140370
>>>>
>>>> go_old_20080101
>>>> num_gps
>>>> 192535
>>>>
>>>>
>>>>    
>>>>         
>>>>> 2.  The number of distinct species with any NON-IEA, NON-ND GO 
>>>>> annotation on 1/1 of each year from 2002-2008.
>>>>>       
>>>>>           
>>>> SELECT count(DISTINCT species_id) AS num_species
>>>> FROM gene_product
>>>>  INNER JOIN association ON 
>>>> (gene_product.id=association.gene_product_id)
>>>>  INNER JOIN evidence ON (evidence.association_id=association.id)
>>>> WHERE code != 'IEA' AND code != 'ND';
>>>> go_old_20030101
>>>> num_species
>>>> 207
>>>>
>>>> go_old_20040101
>>>> num_species
>>>> 375
>>>>
>>>> go_old_20050101
>>>> num_species
>>>> 533
>>>>
>>>> go_old_20060101
>>>> num_species
>>>> 638
>>>>
>>>> go_old_20070101
>>>> num_species
>>>> 884
>>>>
>>>> go_old_20080101
>>>> num_species
>>>> 930
>>>>
>>>> (yep, these numbers are correct, there is a lot of non-MOD 
>>>> annotations to GO)
>>>>
>>>>    
>>>>         
>>>>> Doug howe wrote:
>>>>>      
>>>>>           
>>>>>> Thanks Chris those are very useful numbers.  If you don't mind 
>>>>>> running two more queries, it won't be necessary to open the older 
>>>>>> stuff to Goose.
>>>>>> I'd be interested to see:
>>>>>> 1.  The number of distinct gene products (across all species) 
>>>>>> annotated on 1/1 of each year from 2002-2008.
>>>>>> 2.  The number of distinct species with any GO annotation on 1/1 
>>>>>> of each year from 2002-2008.
>>>>>>
>>>>>> -Thanks!
>>>>>> -Doug
>>>>>>
>>>>>>
>>>>>> Chris Mungall wrote:
>>>>>>
>>>>>>        
>>>>>>             
>>>>>>> On Apr 8, 2008, at 9:48 AM, Doug howe wrote:
>>>>>>>
>>>>>>>
>>>>>>>          
>>>>>>>               
>>>>>>>> Does anyone have, or know how to get, historical stats on the 
>>>>>>>> number of
>>>>>>>> GO annotations that have been contributed to the GOC over time?  
>>>>>>>> I'm
>>>>>>>> looking for the number of non-IEA, non-ND GO annotations that 
>>>>>>>> existed
>>>>>>>> for each year from 2002-2008.
>>>>>>>>
>>>>>>>> Midori provided me with the following numbers of GO terms for that
>>>>>>>> period if anyone is interested:
>>>>>>>> date            total       obsolete
>>>>>>>> 1/1/2002    10305    152
>>>>>>>> 1/1/2003    13339    383
>>>>>>>> 1/1/2004    16771    725
>>>>>>>> 1/1/2005    18219    969
>>>>>>>> 1/1/2006    20348    992
>>>>>>>> 1/1/2007    22928    1011
>>>>>>>> 1/1/2008    25758    1137
>>>>>>>>
>>>>>>>>             
>>>>>>>>                 
>>>>>>> We have historical go dbs mirrored here - we can open these to 
>>>>>>> GOOSE if you like, or you can just request queries.
>>>>>>>
>>>>>>> This is what you're after:
>>>>>>>
>>>>>>> SELECT count(*) AS num_annots
>>>>>>> FROM association INNER JOIN evidence ON 
>>>>>>> (evidence.association_id=association.id)
>>>>>>> WHERE code != 'IEA' AND code != 'ND';
>>>>>>> go_old_20030101
>>>>>>> num_annots
>>>>>>> 133699
>>>>>>>
>>>>>>> go_old_20040101
>>>>>>> num_annots
>>>>>>> 386339
>>>>>>>
>>>>>>> go_old_20050101
>>>>>>> num_annots
>>>>>>> 416224
>>>>>>>
>>>>>>> go_old_20060101
>>>>>>> num_annots
>>>>>>> 469107
>>>>>>>
>>>>>>> go_old_20070101
>>>>>>> num_annots
>>>>>>> 489402
>>>>>>>
>>>>>>> go_old_20080101
>>>>>>> num_annots
>>>>>>> 580052
>>>>>>>
>>>>>>>
>>>>>>> This one may also be informative: the number of terms used 
>>>>>>> directly in annotations (all):
>>>>>>>
>>>>>>> SELECT count(DISTINCT term_id) AS num_terms_used_directly
>>>>>>> FROM association;
>>>>>>> go_old_20030101
>>>>>>> num_terms_used_directly
>>>>>>> 7116
>>>>>>>
>>>>>>> go_old_20040101
>>>>>>> num_terms_used_directly
>>>>>>> 9008
>>>>>>>
>>>>>>> go_old_20050101
>>>>>>> num_terms_used_directly
>>>>>>> 10134
>>>>>>>
>>>>>>> go_old_20060101
>>>>>>> num_terms_used_directly
>>>>>>> 11113
>>>>>>>
>>>>>>> go_old_20070101
>>>>>>> num_terms_used_directly
>>>>>>> 12340
>>>>>>>
>>>>>>> go_old_20080101
>>>>>>> num_terms_used_directly
>>>>>>> 13812
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>          
>>>>>>>               
>>>>>>>> -Doug
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Go mailing list
>>>>>>>> Go at geneontology.org
>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>                 
>>>>>> _______________________________________________
>>>>>> Go mailing list
>>>>>> Go at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>>>
>>>>>>
>>>>>>         
>>>>>>             
>>> _______________________________________________
>>> Go mailing list
>>> Go at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>   
>>>       
>>     
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go
>   


-- 


Do you need any additional GO annotation resources?
Which proteins would you like annotated with GO?

Let us know in the GOA User Survey, available at: http://www.ebi.ac.uk/GOA/contactus.html

------------------------------------------------------------------ 

    Emily Dimmer Ph.D.
    GOA Coordinator
    EMBL-EBI
    Wellcome Trust Genome Campus
    Hinxton
    Cambridge CB10 1SD, U.K.
    Tel:     +44 1223 494654
    Fax:    +44 1223 494468
    email:  edimmer at ebi.ac.uk
    URL:    http://www.ebi.ac.uk/goa
    



More information about the Go mailing list