[Go] odd characters in term names
Suzanna Lewis
suzi at berkeleybop.org
Thu Jan 8 15:10:53 PST 2009
okey-dokey.
On Jan 8, 2009, at 3:08 PM, Chris Mungall wrote:
>
> I think that even when OE becomes unicode-ready, GO should continue
> to use ascii only for the ontology until we really need to switch to
> full unicode. I can see a few cases where it might be nice (degrees
> symbols for example) but so long as we are English-biased there's no
> strong reason to switch.
>
> On Jan 8, 2009, at 12:57 PM, Suzanna Lewis wrote:
>
>> So when OBO-Edit becomes ISO compliant (hopefully some time this
>> year), what will the ramifications be for the rest of the software?
>>
>> On Jan 8, 2009, at 11:33 AM, Chris Mungall wrote:
>>
>>>
>>> FlyBase used to use SGML in gene names, not any more I believe.
>>>
>>> We generally strip errant latin-1 characters from the obo file
>>> prior to loading into the main database. It sounds like it may
>>> benefit others if this is done further upstream, between copying
>>> the editors file to the public area.
>>>
>>> And whilst I agree that single apostrophes are fine for
>>> computational processing, I think for human purposes they should
>>> be avoided in term names, except where usage is either standard
>>> English usage or extremely well known across species and part of
>>> standard chemical nomenclature (e.g 5', 3').
>>>
>>> On Jan 8, 2009, at 10:06 AM, Jim Hu wrote:
>>>
>>>> I've run into a bunch of odd characters before when I was
>>>> converting the GO obo file and gene associations to XML for
>>>> loading GONUTS. There used to be a bunch of SBML greek letters
>>>> in some of the files. I also used to see curly quotes that were
>>>> probably from things pasted in from Word. Primes aren't so
>>>> bad... the worst was a right arrow in a sugar chemical name.
>>>>
>>>> Jim
>>>>
>>>> On Jan 8, 2009, at 9:58 AM, Benjamin Hitz wrote:
>>>>
>>>>> objection withdrawn.
>>>>>
>>>>> On Jan 7, 2009, at 1:15 PM, Benjamin Hitz wrote:
>>>>>
>>>>>>
>>>>>> So, we recently had two terms added under "Protein Complexes"
>>>>>> GO: 32221, Clr6 histone deacetylase complex II'
>>>>>> and
>>>>>> GO: 33698, Clr6 histone deacetylase complex I/I'
>>>>>>
>>>>>> I understand that the ' is part of the name, but it's sort of
>>>>>> an unpleasant character to deal with. These broke the SGD slim
>>>>>> mapper. Is there a set of characters that are not allowed in
>>>>>> term names? Should there be? Or should I just man up and make
>>>>>> sure to escape them.
>>>>>>
>>>>>> Ben
>>>>>>
>>>>>> --
>>>>>> Ben Hitz
>>>>>> Senior Scientific Programmer ** Saccharomyces Genome Database
>>>>>> ** GO Consortium
>>>>>> Stanford University ** hitz at genome.stanford.edu
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Go mailing list
>>>>>> Go at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>>
>>>>> --
>>>>> Ben Hitz
>>>>> Senior Scientific Programmer ** Saccharomyces Genome Database **
>>>>> GO Consortium
>>>>> Stanford University ** hitz at genome.stanford.edu
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Go mailing list
>>>>> Go at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>
>>>> =====================================
>>>> Jim Hu
>>>> Associate Professor
>>>> Dept. of Biochemistry and Biophysics
>>>> 2128 TAMU
>>>> Texas A&M Univ.
>>>> College Station, TX 77843-2128
>>>> 979-862-4054
>>>>
>>>>
>>>> _______________________________________________
>>>> Go mailing list
>>>> Go at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>
>>> _______________________________________________
>>> Go mailing list
>>> Go at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>
>>
>>
>
>
More information about the Go
mailing list