[go] mapping between DB_Object_ID and DB_Object_Symbol

Valerie Wood val at sanger.ac.uk
Fri Aug 10 02:59:30 PDT 2007


>
>
> I think it's best not to repeat symbols as synonyms, as you lead  
> people to believe that these will always be present, which may  
> potentially lead to them implementing buggy software (if they are  
> extremely sloppy). Those writing software correctly have to  
> defensively implement some kind of filter, if they want to avoid  
> reporting back (mildly confusing) duplicates to their users.  
> Consistency is always a good thing.
>

Martin will fix the problems with our
DB_Object_ID and DB_Object_Symbol columns which should happen with the 
next update.

But I'd like to clarify about the synonyms.

 From the documentation
2 DB_Object_ID required S000000296
3 DB_Object_Symbol required PHO3
11 DB_Object_Synonym (|Synonym) optional YBR092C

BUT if your gene doesn't have a given name, the systematic ID has to go 
in column 3. If your gene does have a given name the systematic ID goes 
in the synonym column (11).

This means if the user comes with a list of
systematic IDs they aren't always in the same field. I think this is 
why  with GeneDB we put ALL IDs in the synonyms column.

I can raise a ticket so that we don't put any duplicates in the synonyms 
file.
But don't we need a single field which contains all the systematic  IDs 
for any organism? These are the IDs which researchers usually use for 
handling large datasets. Wouldn't it be better to require that the 
object symbol was the systematic ID rather than a mixture of systematic 
ids and primary names, and everything else was a synonym?

This is the same identifier issue I mentioned on AWG recently, i.e why 
its a problem to use a list of IDs  and search on a single field, 
because the ID types are necessarily split between different fields.

The other problem with the existing file structure, is that IDs swap 
between fields as genes are 'named'.



Val












-- 
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE.



More information about the Go mailing list