[Ontology-editors] small molecule metabolism

Chris Mungall cjm at berkeleybop.org
Thu Apr 9 11:19:52 PDT 2009


On Apr 9, 2009, at 9:35 AM, Valerie Wood wrote:

> Tanya Berardini wrote:
>
>>
>>
>> On Thu, Apr 9, 2009 at 8:48 AM, Chris Mungall <cjm at berkeleybop.org <mailto:cjm at berkeleybop.org 
>> >> wrote:
>>
>>
>>    On Apr 9, 2009, at 3:35 AM, Valerie Wood wrote:
>>
>>        It seems like there is a gap in the terminology of biology to
>>        decribe "everything that is not a macromolecule molecule".
>>        Maybe we should make one up....
>>        Perhaps "small molecule metabolism" would be acceptable if it
>>        is defined as "everything that is not a macromolucule" but
>>        that is not an acceptable way of defining something is it?
>>
>>
>>    Do we really need a term for it? Why not just ask for non-X
>>    metabolism any time you're interested in metabolism of Ys where Ys
>>    are not Xs
>>
>>    Granted tools can't do this yet but it's not hard given the
>>    correct structures in the ontology, and we should perhaps be
>>    working towards a situation where tools do support this
>>
>>
>> I am partial to this approach.  Defining 'small molecule  
>> metabolism' as everything that is not 'macromolecule metabolism'  
>> violates the ontology design principle of positivity.  Why not just  
>> combine the annotations from the terms that do cover what is  
>> desired and then analyze those results?
>>
>> Tanya
>>
>
> Its quite difficult  to do this during enrichment analysis.
> I have seen  a number of times that   terms which would be  
> classically termed "biochemical pathways" are enriched, bacause I  
> see the annotations  in my data individually.
> The enrichment tools don't show this  because the number of  
> annotations to the individual  terms are not large enough. tThe  
> parent term "cellular metabolic process" is not enriched because the  
> effect is masked  by all of the other 3000 annotations to this term  
> generated  mainly by the variouse types of macromolecule metabolic  
> process i.e.DNA metabolic process, protein metabolic process etc.
>
> When you are analysisng whole genome datasets it isn't really  
> practical to add and subtract processes and repeat enrichment (it  
> gets way too complicated to process and report the results, as you  
> would have to fiddle with the P-values for everything you did  
> manually and then reintegrate it into your whole genome analysis)
>
> This isn't really a problem for me now because I worked around it,  
> but I could only do this because I know what the problem was.
>
> I thought that other users may appreciate some sort of grouping term  
> here for similar analyses- it just seems that there should be a term  
> to group these processes in the same way that macromolecular  
> metabolic processes are grouped. The fact is that if you have a  
> bunch of genes enriched for low numbers of  various small metabolism/ 
> canonical biochemical pathway terms, this  enrichment would  most  
> likely be overlooked.

I agree with Tanya but understand the practical need for enrichment  
analysis.

I am envisioning a partial solution along the following lines:

Just as we have goslims, we can have gofats. a gofat would live  
outside the ontology and contain statements like

GOFAT:1 small molecule metabolism = metabolism and not has_participant  
chebi:macromolecule

The reasoner would compute is_a parentage to the fat terms and create  
a derived obo file. If the tool accepts OBOFs+GAFs then just give the  
tool this obo file instead of the regular one.

(note this only works for tools that allow you to input an obo file.  
Some web-based tools may not give you the flexibility to substitute  
anything other than the regular GO)

This should work in your particular case, no need to keep re-analyzing  
once you've defined your fat. We could even make the fat-derived obo  
files available on the website, as we do for slims

This solution isn't perfect as it requires the analyzer to know a  
priori which may be useful grouping categories. Really the tool should  
be able to do this. For example, for a particular dataset, "metabolism  
with a molecule with an X side chain, missing a Y" may be enriched.  
There are strategies for dealing with this - rule mining, or pre- 
computing every possible cross-product. This will require a little  
more know-how from tools developers.


>
> val
>
>
>
>
>
>
>
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research  
> Limited, a charity registered in England with number 1021457 and a  
> company registered in England with number 2742969, whose registered  
> office is 215 Euston Road, London, NW1 2BE.



More information about the Ontology-editors mailing list