[Go] odd characters in term names
Chris Mungall
cjm at berkeleybop.org
Thu Jan 8 15:08:21 PST 2009
I think that even when OE becomes unicode-ready, GO should continue to
use ascii only for the ontology until we really need to switch to full
unicode. I can see a few cases where it might be nice (degrees symbols
for example) but so long as we are English-biased there's no strong
reason to switch.
On Jan 8, 2009, at 12:57 PM, Suzanna Lewis wrote:
> So when OBO-Edit becomes ISO compliant (hopefully some time this
> year), what will the ramifications be for the rest of the software?
>
> On Jan 8, 2009, at 11:33 AM, Chris Mungall wrote:
>
>>
>> FlyBase used to use SGML in gene names, not any more I believe.
>>
>> We generally strip errant latin-1 characters from the obo file
>> prior to loading into the main database. It sounds like it may
>> benefit others if this is done further upstream, between copying
>> the editors file to the public area.
>>
>> And whilst I agree that single apostrophes are fine for
>> computational processing, I think for human purposes they should be
>> avoided in term names, except where usage is either standard
>> English usage or extremely well known across species and part of
>> standard chemical nomenclature (e.g 5', 3').
>>
>> On Jan 8, 2009, at 10:06 AM, Jim Hu wrote:
>>
>>> I've run into a bunch of odd characters before when I was
>>> converting the GO obo file and gene associations to XML for
>>> loading GONUTS. There used to be a bunch of SBML greek letters in
>>> some of the files. I also used to see curly quotes that were
>>> probably from things pasted in from Word. Primes aren't so bad...
>>> the worst was a right arrow in a sugar chemical name.
>>>
>>> Jim
>>>
>>> On Jan 8, 2009, at 9:58 AM, Benjamin Hitz wrote:
>>>
>>>> objection withdrawn.
>>>>
>>>> On Jan 7, 2009, at 1:15 PM, Benjamin Hitz wrote:
>>>>
>>>>>
>>>>> So, we recently had two terms added under "Protein Complexes"
>>>>> GO: 32221, Clr6 histone deacetylase complex II'
>>>>> and
>>>>> GO: 33698, Clr6 histone deacetylase complex I/I'
>>>>>
>>>>> I understand that the ' is part of the name, but it's sort of an
>>>>> unpleasant character to deal with. These broke the SGD slim
>>>>> mapper. Is there a set of characters that are not allowed in
>>>>> term names? Should there be? Or should I just man up and make
>>>>> sure to escape them.
>>>>>
>>>>> Ben
>>>>>
>>>>> --
>>>>> Ben Hitz
>>>>> Senior Scientific Programmer ** Saccharomyces Genome Database **
>>>>> GO Consortium
>>>>> Stanford University ** hitz at genome.stanford.edu
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Go mailing list
>>>>> Go at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>
>>>> --
>>>> Ben Hitz
>>>> Senior Scientific Programmer ** Saccharomyces Genome Database **
>>>> GO Consortium
>>>> Stanford University ** hitz at genome.stanford.edu
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Go mailing list
>>>> Go at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>
>>> =====================================
>>> Jim Hu
>>> Associate Professor
>>> Dept. of Biochemistry and Biophysics
>>> 2128 TAMU
>>> Texas A&M Univ.
>>> College Station, TX 77843-2128
>>> 979-862-4054
>>>
>>>
>>> _______________________________________________
>>> Go mailing list
>>> Go at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/go
>>
>> _______________________________________________
>> Go mailing list
>> Go at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/go
>>
>
>
More information about the Go
mailing list