[go] mapping between DB_Object_ID and DB_Object_Symbol
David Hill
dph at informatics.jax.org
Fri Aug 10 05:50:23 PDT 2007
O.K. my misunderstanding. Sorry.
Valerie Wood wrote:
>
>
>
> Hi David,
>
> Chris is refering to our (GeneDB) odd practice of repeating the names
> in the synonyms column, so the gene name might be repeated in the
> synonym field for a single gene) Rather than between genes (which is
> OK for synonyms).
>
> The reason we did this is explained in later e-mail
>
> Val
>
>
>
>
>
> David Hill wrote:
>
>>
>>>
>>>
>>> I think it's best not to repeat symbols as synonyms, as you lead
>>> people to believe that these will always be present, which may
>>> potentially lead to them implementing buggy software (if they are
>>> extremely sloppy).
>>
>> But, synonyms for gene symbols are harvested directly from the
>> literature. Unfortunately, bench scientists don't often consider
>> whether the 'handle' they are using for their gene is unique. This is
>> a huge issue in mouse and often a lot of the work of a curator is to
>> determine which gene an author is actually talking about. However,
>> every official gene symbol should only correspond to one database
>> gene ID. All other uses of symbols than the official symbol for a
>> gene should go in the 'synonyms' field.
>>
>>> Those writing software correctly have to defensively implement some
>>> kind of filter, if they want to avoid reporting back (mildly
>>> confusing) duplicates to their users. Consistency is always a good
>>> thing.
>>>
>>> I think the 1:1 violation is more serious though
>>>
>>> On Aug 9, 2007, at 7:38 PM, Gavin Sherlock wrote:
>>>
>>>> Hi all,
>>>>
>>>> An issue came up with GO::TermFinder, because it chokes on files
>>>> where the relationship between DB_Object_ID and DB_Object_Symbol is
>>>> not 1:1, and there are a number of files that have for instance a
>>>> 1:2 relationship between these columns, e.g.:
>>>>
>>>> GeneDB_Spombe: SPCC777.13 maps to SPCC777.13, vps35
>>>> pseudocap: PA5429 maps to aspA, adhA
>>>> RGD: RGD:1359623 maps to Tuba4a, Tuba4
>>>> WB: WBGene00000386 maps to cdc-25.1, cdc25.1
>>>>
>>>> My question is, should this be a 1:1 relationship, and the
>>>> annotation files checking script needs to reject files that deviate
>>>> from that (presumably these additional names would become synonyms
>>>> instead), or is a 1:2 or more relationship allowed between those
>>>> columns, in which case, I'll have to modify GO::TermFinder
>>>> appropriately.
>>>>
>>>> As an additional data point, the pombe file actually lists both
>>>> SPCC777.13 and vps35 as synonyms for the gene too :
>>>>
>>>> whitbread 1001 % grep 'SPCC777.13' gene_association.GeneDB_Spombe
>>>> GeneDB_Spombe SPCC777.13 SPCC777.13
>>>> GO:0003674 GO_REF:0000015 ND
>>>> F gene taxon:4896 20070711GeneDB_Spombe
>>>> GeneDB_Spombe SPCC777.13 vps35 GO:0005768
>>>> PMID:16622069 IMP C retromer complex subunit
>>>> Vps35 SPCC777.13|vps35 gene taxon:4896
>>>> 20060424 GeneDB_Spombe
>>>> GeneDB_Spombe SPCC777.13 vps35 GO:0030904
>>>> PMID:16622069 IMP C retromer complex subunit
>>>> Vps35 SPCC777.13|vps35 gene taxon:4896
>>>> 20040625 GeneDB_Spombe
>>>> GeneDB_Spombe SPCC777.13 vps35 GO:0030904
>>>> PMID:16622069 ISS SGD:S000003690 C retromer complex
>>>> subunit Vps35 SPCC777.13|vps35gene taxon:4896
>>>> 20040625 GeneDB_Spombe
>>>> GeneDB_Spombe SPCC777.13 vps35 GO:0006886
>>>> PMID:16622069 IMP P retromer complex subunit
>>>> Vps35 SPCC777.13|vps35 gene taxon:4896
>>>> 20040625 GeneDB_Spombe
>>>> GeneDB_Spombe SPCC777.13 vps35 GO:0042147
>>>> PMID:16622069 IMP P retromer complex subunit
>>>> Vps35 SPCC777.13|vps35 gene taxon:4896
>>>> 20060424 GeneDB_Spombe
>>>> GeneDB_Spombe SPCC777.13 vps35 GO:0030437
>>>> PMID:15189449 IMP P retromer complex subunit
>>>> Vps35 SPCC777.13|vps35 gene taxon:4896
>>>> 20040625 GeneDB_Spombe
>>>> GeneDB_Spombe SPCC777.13 vps35 GO:0005829
>>>> PMID:16823372 IDA C retromer complex subunit
>>>> Vps35 SPCC777.13|vps35 gene taxon:4896
>>>> 20060724 GeneDB_Spombe
>>>>
>>>> - is there a rule (I couldn't find one) that says the synonyms
>>>> should not repeat the DB_Object_ID and DB_Object_Symbol, or should
>>>> there be? Would it save any space in the file sizes?
>>>>
>>>> Cheers,
>>>> Gavin
>>>> ________________________________________________________
>>>>
>>>> Gavin Sherlock
>>>> Dept. of Genetics
>>>> S201A, Grant Building,
>>>> Stanford University Medical School,
>>>> Stanford,
>>>> CA 94305-5120
>>>>
>>>> Tel: 650 498 6012
>>>> Fax: 650 724 3701
>>>>
>>>>
>>>
>>
>
>
>
More information about the Go
mailing list