[go] mapping between DB_Object_ID and DB_Object_Symbol
Gavin Sherlock
sherlock at genome.Stanford.EDU
Thu Aug 9 19:38:24 PDT 2007
Hi all,
An issue came up with GO::TermFinder, because it chokes on files
where the relationship between DB_Object_ID and DB_Object_Symbol is
not 1:1, and there are a number of files that have for instance a 1:2
relationship between these columns, e.g.:
GeneDB_Spombe: SPCC777.13 maps to SPCC777.13, vps35
pseudocap: PA5429 maps to aspA, adhA
RGD: RGD:1359623 maps to Tuba4a, Tuba4
WB: WBGene00000386 maps to cdc-25.1, cdc25.1
My question is, should this be a 1:1 relationship, and the annotation
files checking script needs to reject files that deviate from that
(presumably these additional names would become synonyms instead), or
is a 1:2 or more relationship allowed between those columns, in which
case, I'll have to modify GO::TermFinder appropriately.
As an additional data point, the pombe file actually lists both
SPCC777.13 and vps35 as synonyms for the gene too :
whitbread 1001 % grep 'SPCC777.13' gene_association.GeneDB_Spombe
GeneDB_Spombe SPCC777.13 SPCC777.13 GO:
0003674 GO_REF:0000015 ND F
gene taxon:4896 20070711GeneDB_Spombe
GeneDB_Spombe SPCC777.13 vps35 GO:0005768 PMID:
16622069 IMP C retromer complex subunit Vps35
SPCC777.13|vps35 gene taxon:4896 20060424
GeneDB_Spombe
GeneDB_Spombe SPCC777.13 vps35 GO:0030904 PMID:
16622069 IMP C retromer complex subunit Vps35
SPCC777.13|vps35 gene taxon:4896 20040625
GeneDB_Spombe
GeneDB_Spombe SPCC777.13 vps35 GO:0030904 PMID:
16622069 ISS SGD:S000003690 C retromer complex subunit
Vps35 SPCC777.13|vps35gene taxon:4896 20040625
GeneDB_Spombe
GeneDB_Spombe SPCC777.13 vps35 GO:0006886 PMID:
16622069 IMP P retromer complex subunit Vps35
SPCC777.13|vps35 gene taxon:4896 20040625
GeneDB_Spombe
GeneDB_Spombe SPCC777.13 vps35 GO:0042147 PMID:
16622069 IMP P retromer complex subunit Vps35
SPCC777.13|vps35 gene taxon:4896 20060424
GeneDB_Spombe
GeneDB_Spombe SPCC777.13 vps35 GO:0030437 PMID:
15189449 IMP P retromer complex subunit Vps35
SPCC777.13|vps35 gene taxon:4896 20040625
GeneDB_Spombe
GeneDB_Spombe SPCC777.13 vps35 GO:0005829 PMID:
16823372 IDA C retromer complex subunit Vps35
SPCC777.13|vps35 gene taxon:4896 20060724
GeneDB_Spombe
- is there a rule (I couldn't find one) that says the synonyms should
not repeat the DB_Object_ID and DB_Object_Symbol, or should there
be? Would it save any space in the file sizes?
Cheers,
Gavin
________________________________________________________
Gavin Sherlock
Dept. of Genetics
S201A, Grant Building,
Stanford University Medical School,
Stanford,
CA 94305-5120
Tel: 650 498 6012
Fax: 650 724 3701
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://fafner.stanford.edu/pipermail/go/attachments/20070809/c0331bce/attachment.html
More information about the Go
mailing list