[Go] Change for GOA UniProtKB GAF file./ A suggestion
Jim Hu
jimhu at tamu.edu
Fri May 8 08:52:57 PDT 2009
Hi Mike,
We were planning to use the UniProtKB file from the submissions
directory to pull some IEAs into GONUTS. Will there still be a way to
do that?
Thanks
Jim
On May 8, 2009, at 8:49 AM, Mike Cherry wrote:
> This change is all about allows IEAs from all the GAF files into
> AmiGO, except the IEAs that are in the UniProtKB file. Their are
> too many IEAs in UniProtKB file for AmiGO and the GO database to
> provide a reasonable return. Actually this is not all about the
> IEAs. A big part of this is getting the UniProtKB file out of CVS
> as its too big for that system. With the change the GO DB loading
> can use all the GAFs in CVS and load all their annotations,
> including IEAs
>
> -Mike
>
>
> On May 8, 2009, at 5:12 AM, Valerie Wood wrote:
>
>> Correction, there are only 44104 IEA mappings for pombe not 55939
>> but all of the other numbers are correct (my taxon ID is a
>> substring of other taxon IDs....).
>>
>> Valerie Wood wrote:
>>
>>>
>>> Slightly related, what is the long term strategy for getting IEA
>>> data into AmiGO?
>>> A the main problem is the volume of annotations I have a suggestion:
>>>
>>> For pombe we only include the IEA mappings in the data set
>>> provided to GO when they are non redundant with existing
>>> annotations.
>>> In 2006 there were ~30000 electronic mappings, and ~15000 were
>>> retained
>>> Today there are 55939 mappings and 4686 are retained.
>>>
>>> For example tim44 has the following mappings:
>>> From IPR007379
>>> Process GO:0006886 intracellular protein transport
>>> Function GO:0015450 P-P-bond-hydrolysis-driven protein
>>> transmembrane transporter activity
>>> Component GO:0005744 mitochondrial inner membrane presequence
>>> translocase complex
>>> From IPR005682
>>> GO:0006886 intracellular protein transport
>>> Function GO:0015450 P-P-bond-hydrolysis-driven protein
>>> transmembrane transporter activity
>>> Component GO:0005744 mitochondrial inner membrane presequence
>>> translocase complex
>>> From SP-KW
>>> intracellular protein transmembrane transport
>>> ATP binding
>>>
>>> Only the mapping to ATP binding is retained as all of the others
>>> are covered by the manual annotation
>>>
>>> Other genes have many more redundent mappings, for example top2
>>> has 60 mappings including 7 Interpro domains mapping to the same
>>> GO:0003677. These 60 mappings are fully represented by the 12
>>> manual experimental annotation.
>>>
>>> This procedure has a number of advantages
>>>
>>> i) Clearer for Users
>>> It removes a massive over-presentation of data to the user.
>>> I cannot see any major advantage in presenting redundant mappings
>>> to the user.
>>>
>>> ii) Quality control.
>>> Because the curator is not presented with so many mappings, and
>>> complete annotation should, in theory, cover the mappings (except
>>> in a minority of cased, it should be possible to make an ISS to a
>>> characterised ortholog).
>>> By following this annotation protocol, spurious mappings are
>>> easily identified and can be filtered and fixed.
>>> Many 100's of mappings have been fixed in this way
>>> http://sourceforge.net/tracker/?atid=605890&group_id=36855&func=browse
>>> This also alters to problems in the ontology files, if a parent
>>> is accidently removed, and this parent contains a valid mapping,
>>> the annotation will 'reappear', alterting the curator to problems
>>> with the ontology (this doesn't happen very often but it does
>>> provide an addition layer of QC)
>>>
>>> ii) Space
>>> It would generally reduce the size of the mapping file.
>>> I have no idea of the size reduction. The reduction for pombe is >
>>> 90% but this is because the annotation coverage is high.
>>> However, even un-annotated organisms could have an associated
>>> reduction in mappings, if only the most granular mapping is
>>> retained.
>>> The number of IEAs will increase, as you can see above the pombe
>>> mappings have doubled in the past couple of yeaars, but most
>>> mappings do not add any new information to the annotation.
>>>
>>> Just a suggestion,
>>> Val
>>>
>>>
>>> Mike Cherry wrote:
>>>
>>>> This afternoon the software group agreed to changing how we store
>>>> the goa_uniprot GAF file. The large file will still be removed
>>>> from CVS. This is okay because the file is too big for CVS and
>>>> cannot currently be retrieved. This file will still be
>>>> available via to GO FTP and from the EBI FTP. Both the
>>>> submitted and filtered goa_uniprot files will be removed from
>>>> CVS. A new filtered file will be created that has all the IEA
>>>> annotations removed and this file will be in the CVS
>>>> repository. Suggestions for this new file's name are welcome, we
>>>> were thinking of : gene_association.goa_uniprot_noiea.gz
>>>>
>>>> Removing the files from CVS will happen almost immediately as
>>>> mentioned above you cannot get it from CVS anyway. The older
>>>> version of the goa_uniprot file are available from the EBI FTP
>>>> site. The files will still be available via HTTP and FTP at www.geneontology.org
>>>> . This change simply means they will not be obtainable via a
>>>> checkout from CVS. I'll work on creating the new noiea file and
>>>> add it to CVS next week.
>>>>
>>>> -Mike
>>>>
>>>> _______________________________________________
>>>> Go mailing list
>>>> Go at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome Research
>> Limited, a charity registered in England with number 1021457 and a
>> company registered in England with number 2742969, whose registered
>> office is 215 Euston Road, London, NW1 2BE.
>
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go
=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://fafner.stanford.edu/pipermail/go/attachments/20090508/8ab35c0b/attachment.html>
More information about the Go
mailing list