[go] mapping between DB_Object_ID and DB_Object_Symbol
Karen Christie
kchris at genome.Stanford.EDU
Fri Aug 10 10:06:14 PDT 2007
SGD does exactly the same thing. For features like YKR023W, that are
relatively uncharacterized and that have not been given a gene name, e.g.
ACT1, we must put the systematic name YKR023W in column 3. However, so
that users who want to obtain the systematic name from a single consistent
column, we always put the systematic name as the first synonym in column
11, even if it was also put into column 3.
-Karen
On Fri, 10 Aug 2007, Valerie Wood wrote:
>>
>>
>> I think it's best not to repeat symbols as synonyms, as you lead people to
>> believe that these will always be present, which may potentially lead to
>> them implementing buggy software (if they are extremely sloppy). Those
>> writing software correctly have to defensively implement some kind of
>> filter, if they want to avoid reporting back (mildly confusing) duplicates
>> to their users. Consistency is always a good thing.
>>
>
> Martin will fix the problems with our
> DB_Object_ID and DB_Object_Symbol columns which should happen with the next
> update.
>
> But I'd like to clarify about the synonyms.
>
> From the documentation
> 2 DB_Object_ID required S000000296
> 3 DB_Object_Symbol required PHO3
> 11 DB_Object_Synonym (|Synonym) optional YBR092C
>
> BUT if your gene doesn't have a given name, the systematic ID has to go in
> column 3. If your gene does have a given name the systematic ID goes in the
> synonym column (11).
>
> This means if the user comes with a list of
> systematic IDs they aren't always in the same field. I think this is why
> with GeneDB we put ALL IDs in the synonyms column.
>
> I can raise a ticket so that we don't put any duplicates in the synonyms
> file.
> But don't we need a single field which contains all the systematic IDs for
> any organism? These are the IDs which researchers usually use for handling
> large datasets. Wouldn't it be better to require that the object symbol was
> the systematic ID rather than a mixture of systematic ids and primary names,
> and everything else was a synonym?
>
> This is the same identifier issue I mentioned on AWG recently, i.e why its a
> problem to use a list of IDs and search on a single field, because the ID
> types are necessarily split between different fields.
>
> The other problem with the existing file structure, is that IDs swap between
> fields as genes are 'named'.
>
>
>
> Val
>
>
>
>
>
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a
> charity registered in England with number 1021457 and a company registered in
> England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE.
>
More information about the Go
mailing list