[go] Re: dictyBase gp2protein file
Mike Cherry
cherry at stanford.edu
Fri Jan 18 16:14:48 PST 2008
Pascale,
** every group should check the list below **
I've figured out this problem. The double entry has nothing to do
with the gp2protein file. The problem was the filtering script. This
script uses a list of taxonomy IDs that it excludes unless from a
particular source. The current list is below so others can check that
everything is still current.
The Q54IT9 entry was getting into the database from the goa_uniprot
file because they state the taxonomy ID as taxon:352472 (Dictyostelium
discoideum AX4). I only had taxon:44689 (Dictyostelium discoideum)
and taxon:5782 (Dictyostelium) being tagged for dictyBase.
The script requires every taxonomy ID to be explicitly stated. I've
updated the script and am reprocessing the goa_uniprot file now.
-Mike
'taxon:5476'=>'cgd',
'taxon:352472'=>'dictyBase',
'taxon:44689'=>'dictyBase',
'taxon:5782'=>'dictyBase',
'taxon:7227'=>'fb',
'taxon:5664'=>'GeneDB_Lmajor',
'taxon:5833'=>'GeneDB_Pfalciparum',
'taxon:4896'=>'GeneDB_Spombe',
'taxon:185431'=>'GeneDB_Tbrucei',
'taxon:37546'=>'GeneDB_tsetse',
'taxon:9031'=>'goa_chicken',
'taxon:9913'=>'goa_cow',
'taxon:9606'=>'goa_human',
'taxon:110450'=>'gramene_oryza',
'taxon:110451'=>'gramene_oryza',
'taxon:127571'=>'gramene_oryza',
'taxon:29689'=>'gramene_oryza',
'taxon:29690'=>'gramene_oryza',
'taxon:364099'=>'gramene_oryza',
'taxon:364100'=>'gramene_oryza',
'taxon:39946'=>'gramene_oryza',
'taxon:39947'=>'gramene_oryza',
'taxon:40148'=>'gramene_oryza',
'taxon:40149'=>'gramene_oryza',
'taxon:4528'=>'gramene_oryza',
'taxon:4529'=>'gramene_oryza',
'taxon:4530'=>'gramene_oryza',
'taxon:4532'=>'gramene_oryza',
'taxon:4533'=>'gramene_oryza',
'taxon:4534'=>'gramene_oryza',
'taxon:4535'=>'gramene_oryza',
'taxon:4536'=>'gramene_oryza',
'taxon:4537'=>'gramene_oryza',
'taxon:4538'=>'gramene_oryza',
'taxon:4539'=>'gramene_oryza',
'taxon:52545'=>'gramene_oryza',
'taxon:63629'=>'gramene_oryza',
'taxon:65489'=>'gramene_oryza',
'taxon:65491'=>'gramene_oryza',
'taxon:77588'=>'gramene_oryza',
'taxon:83307'=>'gramene_oryza',
'taxon:83308'=>'gramene_oryza',
'taxon:83309'=>'gramene_oryza',
'taxon:10090'=>'mgi',
'taxon:10116'=>'rgd',
'taxon:285006'=>'sgd',
'taxon:307796'=>'sgd',
'taxon:41870'=>'sgd',
'taxon:4932'=>'sgd',
'taxon:3702'=>'tair',
'taxon:212042'=>'tigr_Aphagocytophilum',
'taxon:198094'=>'tigr_Banthracis',
'taxon:227377'=>'tigr_Cburnetii',
'taxon:246194'=>'tigr_Chydrogenoformans',
'taxon:195099'=>'tigr_Cjejuni',
'taxon:195103'=>'tigr_Cperfringens',
'taxon:167879'=>'tigr_Cpsychrerythraea',
'taxon:243164'=>'tigr_Dethenogenes',
'taxon:205920'=>'tigr_Echaffeensis',
'taxon:243231'=>'tigr_Gsulfurreducens',
'taxon:228405'=>'tigr_Hneptunium',
'taxon:265669'=>'tigr_Lmonocytogenes',
'taxon:243233'=>'tigr_Mcapsulatus',
'taxon:222891'=>'tigr_Nsennetsu',
'taxon:220664'=>'tigr_Pfluorescens',
'taxon:223283'=>'tigr_Psyringae',
'taxon:264730'=>'tigr_Psyringae_phaseolicola',
'taxon:211586'=>'tigr_Soneidensis',
'taxon:246200'=>'tigr_Spomeroyi',
'taxon:5691'=>'tigr_Tbrucei_chr2',
'taxon:686'=>'tigr_Vcholerae',
'taxon:6239'=>'wb',
'taxon:7955'=>'zfin',
On Jan 18, 2008, at 10:28 AM, Pascale Gaudet wrote:
> Well, is this what you want? The dataflow is: we send data to
> GenBank, then Uniprot integrates it. Does it make sense for you to
> have both version of the same sequence?
>
> Stan Dong wrote:
>>
>> From the two fasta headers, one sequence is from dicty and another
>> from uniprot. Is this a problem or just result of submission from
>> two distinct sources?
>>
>> >DICTYBASE|DDB0191090 symbol:sadA species:44689
>> >UNIPROTKB|Q54IT9 symbol:Q54IT9_DICDI species:352472
>>
>> -Stan
>>
>> On Jan 18, 2008, at 9:51 AM, Pascale Gaudet wrote:
>>
>>> Hello,
>>>
>>> I have another question about our gp2protein file. It looks like
>>> now our sequences have been loaded in the GO database :) but some
>>> have been loaded in duplicates. For example:
>>> http://amigo.geneontology.org/cgi-bin/amigo/go.cgi?search_constraint=gp&view=details&session_id=1067b1200677806&gp=DDB0191090
>>> http://amigo.geneontology.org/cgi-bin/amigo/go.cgi?search_constraint=gp&view=details&session_id=1067b1200677806&gp=Q54IT9
>>>
>>> Anything we can do to prevent that?
>>>
More information about the Go
mailing list