[go] mapping between DB_Object_ID and DB_Object_Symbol
Valerie Wood
val at sanger.ac.uk
Fri Aug 10 02:59:30 PDT 2007
>
>
> I think it's best not to repeat symbols as synonyms, as you lead
> people to believe that these will always be present, which may
> potentially lead to them implementing buggy software (if they are
> extremely sloppy). Those writing software correctly have to
> defensively implement some kind of filter, if they want to avoid
> reporting back (mildly confusing) duplicates to their users.
> Consistency is always a good thing.
>
Martin will fix the problems with our
DB_Object_ID and DB_Object_Symbol columns which should happen with the
next update.
But I'd like to clarify about the synonyms.
From the documentation
2 DB_Object_ID required S000000296
3 DB_Object_Symbol required PHO3
11 DB_Object_Synonym (|Synonym) optional YBR092C
BUT if your gene doesn't have a given name, the systematic ID has to go
in column 3. If your gene does have a given name the systematic ID goes
in the synonym column (11).
This means if the user comes with a list of
systematic IDs they aren't always in the same field. I think this is
why with GeneDB we put ALL IDs in the synonyms column.
I can raise a ticket so that we don't put any duplicates in the synonyms
file.
But don't we need a single field which contains all the systematic IDs
for any organism? These are the IDs which researchers usually use for
handling large datasets. Wouldn't it be better to require that the
object symbol was the systematic ID rather than a mixture of systematic
ids and primary names, and everything else was a synonym?
This is the same identifier issue I mentioned on AWG recently, i.e why
its a problem to use a list of IDs and search on a single field,
because the ID types are necessarily split between different fields.
The other problem with the existing file structure, is that IDs swap
between fields as genes are 'named'.
Val
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Go
mailing list