[go] mapping between DB_Object_ID and DB_Object_Symbol

Valerie Wood val at sanger.ac.uk
Thu Aug 23 06:21:10 PDT 2007


Martin fixed this problem with different symbols used for teh same gene 
product, for pombe with todays update.
Sorry about that,


Did anybody have any further thoughts on  whether we should continue to 
duplicate systematic ID in the synonym colum, as GeneDB  and SGD do?  
(for the same reason as described by me and Karen)

Val



Valerie Wood wrote:

>>
>>
>> I think it's best not to repeat symbols as synonyms, as you lead  
>> people to believe that these will always be present, which may  
>> potentially lead to them implementing buggy software (if they are  
>> extremely sloppy). Those writing software correctly have to  
>> defensively implement some kind of filter, if they want to avoid  
>> reporting back (mildly confusing) duplicates to their users.  
>> Consistency is always a good thing.
>>
>
> Martin will fix the problems with our
> DB_Object_ID and DB_Object_Symbol columns which should happen with the 
> next update.
>
> But I'd like to clarify about the synonyms.
>
> From the documentation
> 2 DB_Object_ID required S000000296
> 3 DB_Object_Symbol required PHO3
> 11 DB_Object_Synonym (|Synonym) optional YBR092C
>
> BUT if your gene doesn't have a given name, the systematic ID has to 
> go in column 3. If your gene does have a given name the systematic ID 
> goes in the synonym column (11).
>
> This means if the user comes with a list of
> systematic IDs they aren't always in the same field. I think this is 
> why  with GeneDB we put ALL IDs in the synonyms column.
>
> I can raise a ticket so that we don't put any duplicates in the 
> synonyms file.
> But don't we need a single field which contains all the systematic  
> IDs for any organism? These are the IDs which researchers usually use 
> for handling large datasets. Wouldn't it be better to require that the 
> object symbol was the systematic ID rather than a mixture of 
> systematic ids and primary names, and everything else was a synonym?
>
> This is the same identifier issue I mentioned on AWG recently, i.e why 
> its a problem to use a list of IDs  and search on a single field, 
> because the ID types are necessarily split between different fields.
>
> The other problem with the existing file structure, is that IDs swap 
> between fields as genes are 'named'.
>
>
>
> Val
>
>
>
>
>
>
>
>
>
>
>
>



-- 
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE.



More information about the Go mailing list