[go] mapping between DB_Object_ID and DB_Object_Symbol

Karen Christie kchris at genome.Stanford.EDU
Fri Aug 10 10:06:14 PDT 2007



SGD does exactly the same thing. For features like YKR023W, that are 
relatively uncharacterized and that have not been given a gene name, e.g. 
ACT1, we must put the systematic name YKR023W in column 3. However, so 
that users who want to obtain the systematic name from a single consistent 
column, we always put the systematic name as the first synonym in column 
11, even if it was also put into column 3.

-Karen


On Fri, 10 Aug 2007, Valerie Wood wrote:

>> 
>> 
>> I think it's best not to repeat symbols as synonyms, as you lead  people to 
>> believe that these will always be present, which may  potentially lead to 
>> them implementing buggy software (if they are  extremely sloppy). Those 
>> writing software correctly have to  defensively implement some kind of 
>> filter, if they want to avoid  reporting back (mildly confusing) duplicates 
>> to their users.  Consistency is always a good thing.
>> 
>
> Martin will fix the problems with our
> DB_Object_ID and DB_Object_Symbol columns which should happen with the next 
> update.
>
> But I'd like to clarify about the synonyms.
>
> From the documentation
> 2 DB_Object_ID required S000000296
> 3 DB_Object_Symbol required PHO3
> 11 DB_Object_Synonym (|Synonym) optional YBR092C
>
> BUT if your gene doesn't have a given name, the systematic ID has to go in 
> column 3. If your gene does have a given name the systematic ID goes in the 
> synonym column (11).
>
> This means if the user comes with a list of
> systematic IDs they aren't always in the same field. I think this is why 
> with GeneDB we put ALL IDs in the synonyms column.
>
> I can raise a ticket so that we don't put any duplicates in the synonyms 
> file.
> But don't we need a single field which contains all the systematic  IDs for 
> any organism? These are the IDs which researchers usually use for handling 
> large datasets. Wouldn't it be better to require that the object symbol was 
> the systematic ID rather than a mixture of systematic ids and primary names, 
> and everything else was a synonym?
>
> This is the same identifier issue I mentioned on AWG recently, i.e why its a 
> problem to use a list of IDs  and search on a single field, because the ID 
> types are necessarily split between different fields.
>
> The other problem with the existing file structure, is that IDs swap between 
> fields as genes are 'named'.
>
>
>
> Val
>
>
>
>
>
>
>
>
>
>
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a 
> charity registered in England with number 1021457 and a company registered in 
> England with number 2742969, whose registered office is 215 Euston Road, 
> London, NW1 2BE.
>



More information about the Go mailing list