[Ontology-editors] [Chebi-ontology] Inference of GO BP relationships from CHEBI

Chris Mungall cjm at berkeleybop.org
Fri Jun 26 13:03:13 PDT 2009


On Jun 26, 2009, at 5:11 AM, Jane Lomax wrote:

> Hi Chris - I guess I thought that in the situation like the  
> glutamine/amine one you describe we would actually change our  
> classes to reflect ChEBI - i.e. we would obsolete (or merge/rename)  
> 'cellular amine metabolism' rather than trying to model what we  
> already had in a rather awkward way. So we might have:
>
> carboxylic acid metabolism
> ---[i] amino acid metabolism
> ------[i] glutamine metabolism

In fact we already have this as one is_a path in GO. We could  
certainly get rid of the path to 'cellular amine metabolic process' by  
obsoleting the term, it would decrease the tangling in GO. But if we  
applied this across the board we could potentially lose many valuable  
grouping terms and change the results of enrichment analyses etc.

A google search for "cellular amine metabolism" returns one result,  
and all results for "cellular amine metabolic process" all lead to GO.  
So in this case the term is perhaps suspect. But without the  
"cellular" prefix it's a common term so I think we have to account for  
it at least as a synonym.

> or whatever classes we chose from ChEBI - a ChEBI slim might be  
> useful here actually.

I had imagined the CHEBI slim would be generated automatically based  
on what terms we selected for our (GO's) logical definitions.

> Not sure what we do about multiple is_a inheritance in ChEBI though.  
> Maybe it would be possible to use a single-inheritance slim?

Multiple inheritance isn't a problem per se. What's important is that  
there is a single asserted is_a hierarchy (which you get by having  
genus-differentia definitions). You can still keep grouping terms,  
with these being entailed by the logical definitions.

Starting with a CHEBI single inheritance slim is a great idea. Of  
course, it would be better to have the fully refactored ontology with  
computable genus-differentia definitions but this is a lot of work, so  
starting with a single inheritance slim is a good start.

To get the ball rolling here, is my chemistry-naive first pass.

In the long run CHEBI would have the following asserted is_a hierarchy:

    is_a CHEBI:50860 ! organic molecular entity DIFF: has_part carbob
     is_a CHEBI:**new** ! organic acid DIFF: has_quality acidic
      is_a CHEBI:33575 ! carboxylic acid DIFF: has_part carboxyl_group
       is_a CHEBI:33709 ! amino acid DIFF: has_part amino_group
        is_a CHEBI:28300 ! glutamine

(I am giving the classes my own proposed differentia. The tree  
notation is a bit odd. You should read entries as "amino acid is_a  
carboxylic acid that has_part some amino_group")

Note that I am not suggesting CHEBI eliminate their other useful  
grouping classes. These would become part of the inferred hierarchy.  
But for the slim approximation, we would simply remove multiple- 
inheritance causing terms like oxoacid, organic amino compound, and a  
bunch of intermediate terms from the slim.

GO would keep the following as the backbone hierarchy:

   is_a GO:0044237 ! cellular metabolic process
     is_a GO:0006082 ! organic acid metabolic process
      is_a GO:0019752 ! carboxylic acid metabolic process
       is_a GO:0006520 ! cellular amino acid metabolic process
        is_a GO:0009064 ! glutamine family amino acid metabolic process
         is_a GO:0006541 ! glutamine metabolic process

With the obvious CHEBI-based logical definitions. I'm ignoring the  
"cellular" qualifier on 6520 for now. Note the need for the new CHEBI  
term.

* The above could be the 'default' path shown when looking at this  
term in GO. This could be computed from the asserted single-asserted- 
is_a CHEBI hierarchy
* GO would keep terms such as "amine metabolism". The is_a link could  
be inferred based on a to-be-added relationship in CHEBI, such as  
"amino acid has_part amine" (see my email below for details).

That's a rough sketch of the approach, there's undoubtedly a few  
mistakes.

To get the first version of the CHEBI slim, I suggest

- doing a very cursory vetting of the existing  
biological_process_xp_chebi logical definitions, taking a small  
manageable subset
- generating a CHEBI slim by extracting only the terms referenced in  
the above subset, then recalculating the is_a links
- Editing the final slim in oboedit, removing additional terms until  
we have something close to what would be a good backbone is_a hierarchy
- coordinating this with the ongoing work CHEBI are doing to assign  
textual genus-differentia definitions to terms

Sound reasonable?

>
> Jane
>
>
>
>
>
> Chris Mungall wrote:
>> Ideally we could reconstitute many of the existing asserted links  
>> in  GO from CHEBI, and ultimately rely entirely on CHEBI. At the  
>> moment  we're missing some relationships to do this.
>>
>> For example, currently we have:
>>
>> GO:glutamine_metabolism =def GO:metabolism that has_participant   
>> CHEBI:glutamine
>> GO:cellular_amine_metabolism =def GO:metabolism that  
>> has_participant  CHEBI:amine
>>
>> GO also asserts that glutamine_metabolism is_a   
>> cellular_amine_metabolism. We should in principle be able to  
>> remove  this link and re-infer it from the logical definitions and  
>> the  relationships in CHEBI. This doesn't work, because glutamine  
>> is not a  subclass of amine in CHEBI.
>>
>> We can probably do a little better than the logical definitions above
>>
>> * We should use CHEBI:L-glutamine rather than glutamine
>> * We can have a logical definition of cellular amine metabolic  
>> process  that better reflects the text definition:
>>
>> GO:0009308 ! cellular amine metabolic process ***  [DEF: "The  
>> chemical  reactions and pathways involving any organic compound  
>> that is weakly  basic in character and contains an amino or a  
>> substituted amino group,  as carried out by individual cells.  
>> Amines are called primary,  secondary, or tertiary according to  
>> whether one, two, or three carbon  atoms are attached to the  
>> nitrogen atom."]
>>
>> Taking the text definition literrally suggests that  
>> CHEBI:amino_group  is the class to use. We could assign a logical  
>> definition to  cellular_amine_metabolism
>> 	GO:metabolism that has_participant (anything that has_part   
>> CHEBI:amino_group)
>>
>> Or we can simply use:
>> 	GO:metabolism that has_participant CHEBI:amino_group
>>
>> With has_participant being transitive over has_part
>>
>> However, this is still insufficient to recapitulate the asserted   
>> relationship in GO, because there is nothing in CHEBI that tells  
>> me  that L-glutamine (or any amino acid) has_part (or any other   
>> relationship to) amine or amino group.
>>
>> I think in this case we need a has_part relationship added to  
>> CHEBI,  we can then recapitulate the relationship in GO.
>>
>> This is just one single example though. There are plenty of  
>> others,  some will be less straightforward. See for example:
>> http://wiki.geneontology.org/index.php/XP:biological_process_xp_chebi#Misalignments_and_reasoner_results
>>
>>
>> 	
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Chebi-ontology mailing list
>> Chebi-ontology at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/chebi-ontology
>>
>
>
> -- 
> Dr Jane Lomax
> GO Editorial Office
> EMBL-EBI
> Wellcome Trust Genome Campus
> Hinxton
> Cambridgeshire, UK
> CB10 1SD
>
> p: +44 1223 492516
> f: +44 1223 494468
>
>



More information about the Ontology-editors mailing list