[go] Paper of potential interest to you

camon at ebi.ac.uk camon at ebi.ac.uk
Wed Aug 8 06:05:04 PDT 2007


Hi Val,

The GOA group ( and therefore UniProtKB) do not integrate ISS or IEA from
other GOC members at the moment. UniProtKB/Swiss-Prot shows all manual GO
annotation (minus ND), filtered by source, UniProtKB shows manual and IEA
annotations.

One of our colleagues made some comments on this paper:

'the real question that need to be asked is if manual curation capable of
keeping up with the growth in biological knowlege (not the growth in
sequences).  If a curator annotates a protein in a model organism, or an
InterPro family, or a protien with a novel function, they are doing so in
the belief that they are making (at some level) a generic statement, not
just annotating one of the billions of sequences that happen to exist. 
Observing the continued existence of unannotated (and frequently,
according to the current scientific knowlege, unannotable) things is of
very little importance: what matters is how much real, transferrable
knowledge is recorded in the databases.

and they don't even consider that an annotation may or may not be correct,
and may or may not be useful.  it would be easy to increase metrics of
coverage by adding wrong and/or high level GO terms to every protein'

It's a shame we were not contacted before publication, im not sure that
these papers help the curation effort already hugely understaffed.

Evelyn


> Mike Cherry wrote:
>
>> Manual curation is not sufficient for annotation of genomic databases
>> William A. Baumgartner, Jr, K. Bretonnel Cohen, Lynne M. Fox, George
>> Acquaah-Mensah, and Lawrence Hunter
>> Bioinformatics 2007 23: i41-i48.
>>
>> http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/13/
>> i41?etoc
>>
>>
>>
>
>
> This was interesting. Before we all decide its a losing battle, it's not
> quite so doom and gloom as this analysis suggests.
>
> By using mouse and fly they chose the 2 models with the single greatest
> volume of data. It would have been nice to see the combined progress of
> the GO curated organisms vs. non GO curated organisms (rather than
> mouse, fly and then the entire Uniprot knowledge base)
>
> Using this criteria (at least one GO annotation) they would have
> identified a 'best case scenario' (left graph of figure one') for both
> budding and fission yeasts.
>
> However, using these methods, they would never show a 'best case
> scenario' of GO annotation for ANY organism because they extracted the
> GO data from the Uniprot records (at least this is what they say in the
> methods), and Uniprot don't include ISS/IC/NAS/TAS/ or most importantly
> for this analysis ND (I think that is correct isn't it Emily?)
>
> And as they mention one reviewer pointed out, it is impossible here to
> differentiate between a rate limiting factor of the rate of annotation
> and the rate of discovery, or the relative contributions of either.
>
> As an evaluation of GO coverage it would have been more informative if
> they had used all the GO data. But its difficult to provide an analysis
> of curation completion unless you know what is known.....
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
>





More information about the Go mailing list