[go] mapping between DB_Object_ID and DB_Object_Symbol

Gavin Sherlock sherlock at genome.Stanford.EDU
Thu Aug 9 19:38:24 PDT 2007


Hi all,

An issue came up with GO::TermFinder, because it chokes on files  
where the relationship between DB_Object_ID and DB_Object_Symbol is  
not 1:1, and there are a number of files that have for instance a 1:2  
relationship between these columns, e.g.:

GeneDB_Spombe: SPCC777.13 maps to SPCC777.13, vps35
pseudocap: PA5429 maps to aspA, adhA
RGD: RGD:1359623 maps to Tuba4a, Tuba4
WB: WBGene00000386 maps to cdc-25.1, cdc25.1

My question is, should this be a 1:1 relationship, and the annotation  
files checking script needs to reject files that deviate from that  
(presumably these additional names would become synonyms instead), or  
is a 1:2 or more relationship allowed between those columns, in which  
case, I'll have to modify GO::TermFinder appropriately.

As an additional data point, the pombe file actually lists both  
SPCC777.13 and vps35 as synonyms for the gene too :

whitbread 1001 % grep 'SPCC777.13' gene_association.GeneDB_Spombe
GeneDB_Spombe   SPCC777.13      SPCC777.13              GO: 
0003674      GO_REF:0000015  ND              F                        
gene    taxon:4896      20070711GeneDB_Spombe
GeneDB_Spombe   SPCC777.13      vps35           GO:0005768      PMID: 
16622069  IMP              C       retromer complex subunit Vps35   
SPCC777.13|vps35       gene     taxon:4896      20060424         
GeneDB_Spombe
GeneDB_Spombe   SPCC777.13      vps35           GO:0030904      PMID: 
16622069  IMP              C       retromer complex subunit Vps35   
SPCC777.13|vps35       gene     taxon:4896      20040625         
GeneDB_Spombe
GeneDB_Spombe   SPCC777.13      vps35           GO:0030904      PMID: 
16622069  ISS      SGD:S000003690  C       retromer complex subunit  
Vps35  SPCC777.13|vps35gene    taxon:4896      20040625         
GeneDB_Spombe
GeneDB_Spombe   SPCC777.13      vps35           GO:0006886      PMID: 
16622069  IMP              P       retromer complex subunit Vps35   
SPCC777.13|vps35       gene     taxon:4896      20040625         
GeneDB_Spombe
GeneDB_Spombe   SPCC777.13      vps35           GO:0042147      PMID: 
16622069  IMP              P       retromer complex subunit Vps35   
SPCC777.13|vps35       gene     taxon:4896      20060424         
GeneDB_Spombe
GeneDB_Spombe   SPCC777.13      vps35           GO:0030437      PMID: 
15189449  IMP              P       retromer complex subunit Vps35   
SPCC777.13|vps35       gene     taxon:4896      20040625         
GeneDB_Spombe
GeneDB_Spombe   SPCC777.13      vps35           GO:0005829      PMID: 
16823372  IDA              C       retromer complex subunit Vps35   
SPCC777.13|vps35       gene     taxon:4896      20060724         
GeneDB_Spombe

- is there a rule (I couldn't find one) that says the synonyms should  
not repeat the DB_Object_ID and DB_Object_Symbol, or should there  
be?  Would it save any space in the file sizes?

Cheers,
Gavin
________________________________________________________

Gavin Sherlock
Dept. of Genetics
S201A, Grant Building,
Stanford University Medical School,
Stanford,
CA 94305-5120

Tel: 650 498 6012
Fax: 650 724 3701


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://fafner.stanford.edu/pipermail/go/attachments/20070809/c0331bce/attachment.html 


More information about the Go mailing list