[Go] odd characters in term names

Chris Mungall cjm at berkeleybop.org
Thu Jan 8 15:08:21 PST 2009


I think that even when OE becomes unicode-ready, GO should continue to  
use ascii only for the ontology until we really need to switch to full  
unicode. I can see a few cases where it might be nice (degrees symbols  
for example) but so long as we are English-biased there's no strong  
reason to switch.

On Jan 8, 2009, at 12:57 PM, Suzanna Lewis wrote:

> So when OBO-Edit becomes ISO compliant (hopefully some time this  
> year), what will the ramifications be for the rest of the software?
>
> On Jan 8, 2009, at 11:33 AM, Chris Mungall wrote:
>
>>
>> FlyBase used to use SGML in gene names, not any more I believe.
>>
>> We generally strip errant latin-1 characters from the obo file  
>> prior to loading into the main database. It sounds like it may  
>> benefit others if this is done further upstream, between copying  
>> the editors file to the public area.
>>
>> And whilst I agree that single apostrophes are fine for  
>> computational processing, I think for human purposes they should be  
>> avoided in term names, except where usage is either standard  
>> English usage or extremely well known across species and part of  
>> standard chemical nomenclature (e.g 5', 3').
>>
>> On Jan 8, 2009, at 10:06 AM, Jim Hu wrote:
>>
>>> I've run into a bunch of odd characters before when I was  
>>> converting the GO obo file and gene associations to XML for  
>>> loading GONUTS.  There used to be a bunch of SBML greek letters in  
>>> some of the files.  I also used to see curly quotes that were  
>>> probably from things pasted in from Word.  Primes aren't so bad...  
>>> the worst was a right arrow in a sugar chemical name.
>>>
>>> Jim
>>>
>>> On Jan 8, 2009, at 9:58 AM, Benjamin Hitz wrote:
>>>
>>>> objection withdrawn.
>>>>
>>>> On Jan 7, 2009, at 1:15 PM, Benjamin Hitz wrote:
>>>>
>>>>>
>>>>> So, we recently had two terms added under "Protein Complexes"
>>>>> GO: 32221, Clr6 histone deacetylase complex II'
>>>>> and
>>>>> GO: 33698, Clr6 histone deacetylase complex I/I'
>>>>>
>>>>> I understand that the ' is part of the name, but it's sort of an  
>>>>> unpleasant character to deal with.  These broke the SGD slim  
>>>>> mapper.  Is there a set of characters that are not allowed in  
>>>>> term names?  Should there be?  Or should I just man up and make  
>>>>> sure to escape them.
>>>>>
>>>>> Ben
>>>>>
>>>>> --
>>>>> Ben Hitz
>>>>> Senior Scientific Programmer ** Saccharomyces Genome Database **  
>>>>> GO Consortium
>>>>> Stanford University ** hitz at genome.stanford.edu
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Go mailing list
>>>>> Go at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>
>>>> --
>>>> Ben Hitz
>>>> Senior Scientific Programmer ** Saccharomyces Genome Database **  
>>>> GO Consortium
>>>> Stanford University ** hitz at genome.stanford.edu
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Go mailing list
>>>> Go at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>
>>> =====================================
>>> Jim Hu
>>> Associate Professor
>>> Dept. of Biochemistry and Biophysics
>>> 2128 TAMU
>>> Texas A&M Univ.
>>> College Station, TX 77843-2128
>>> 979-862-4054
>>>
>>>
>>> _______________________________________________
>>> Go mailing list
>>> Go at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/go
>>
>> _______________________________________________
>> Go mailing list
>> Go at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/go
>>
>
>



More information about the Go mailing list