CC to MF links (was Re: [go] contributes_to question)

Ben Hitz hitz at genome.Stanford.EDU
Mon Aug 20 09:51:28 PDT 2007


Chris -

I agree this is how it should work.  But there are some "gotchas"  
from the software/database side that need to be addressed (not  
necessarily at this instant).

Say I want a list of all genes "directly involved" in the histone  
deacetylase activity.    Now, whether or not this should include SIR2  
might be a matter of debate - but lets say that I at least want all  
members of the complex.  The _software_ has to infer backwards that  
when I say "show me these genes" I also want an exhuastive list of  
members of the complex.

Maybe this is obvious, but I think much software exists which doesn't  
make any inferences.

Ben

On Aug 17, 2007, at 5:01 PM, Chris Mungall wrote:

> Related to the contributes_to question and the relations between  
> proteins, protein complexes and molecular functions:
>
> Currently in GO there is no explicitly asserted link between:
>
> CC - GO:0000118 histone deacetylase complex
> MF - GO:0004407 histone deacetylase activity
> BP - GO:0016575 histone deacetylation
>
> Clearly the function, process and components denoted by these terms
> are inter-related: the CC executes the MF, the MF catalyses the BP
>
> The parts of a whole do not necessarily inherit the function of the
> whole; the whole does not inherit the function of the parts; and the
> sibling parts of a whole do not necessarily share the same
> function. These kinds of rules can be stated formally so that there is
> less room for confusion (just like the true path rule).
>
> I suspect that one reason annotators may be tempted to make the
> erroneous transitive inference and transfer the function of the whole
> (complex) to the part (gene product) is because there is a perceived
> loss of information in *not* doing so.
>
> For example, if correct curation protocol is followed, then SIF2
> should not be annotated to HD Activity (MF), only to HD complex
> (CC). Searches for the MF "HD Activity" will exclude SIF2. This is
> correct behavior. However, it may be useful to have some intuitive way
> of navigating from a search on "HD activity" to SIF2, by means of the
> complex, so long as it is obvious that SIF2 does not inherit the
> function of the complex.
>
> Using the latest results from Obol, we can now link terms across GO
> ontologies. Links between CC and MF the relation would be labeled
> something like 'executes' or simply 'has function'. In a tree-type
> display we might show:
>
>    [i] GO:0019213 deacetylase activity
>      [i] GO:0033558 protein deacetylase activity
>       [i] GO:0004407 histone deacetylase activity    [RPD3]
>        [X] GO:0000118 histone deacetylase complex    [SIF2,  
> SPCC1235.09]
>         [i] GO:0000508 Rpd3L complex                 [RPD3]
>         [i] GO:0000509 Rpd3S complex
>         [i] GO:0032221 Clr6 histone deacetylase complex
>         ...
>
> This display correctly represents the biology, but the danger here is
> that over the years we have built up an expectation in our users that
> the relation label can be ignored and gene products can be propagated
> up the DAG, willy-nilly. The correct way to read the DAG above is:
>
>   SIF2 is localized_to HD complex,
>   HD complex has_function HD activity
>
> And we can infer
>
>   SIF2 is localized_to some complex that has_function deacetylase
>   activity
>
> But we *cannot* infer anything about the activity of SIF2 without
> further evidence. We would not propagate SIF2 up in slimmers, term
> enrichment, gene product count summaries or any other graph based
> operation (a curator *may* apply their expertise and decide to make
> contributes_to annotations based on these CC to MF links, but this
> would not be automatic).
>
> This means we have to be careful about how we release these (valuable)
> cross-ontology links to the public, and ensure they are not abused.  
> From
> a software perspective we are almost ready to load these kinds of
> links and start showing them in AmiGO, but we should proceed carefully
> to make sure these kinds of relations are better understood both
> within GO and outside.
>
> This seems to be related to the contributes_to issue. Is this worth
> discussing in the same slot at the GO meeting?
>
> The (unvetted) CC to MF links are in cvs:
>
>   go/scratch/obol_results/ 
> cellular_component_links_to_molecular_function.obo
>
> Cheers
> Chris
>
> On Aug 16, 2007, at 5:19 AM, Valerie Wood wrote:
>
>> It seems we have all used it slightly differently anyway.
>>
>> But here are two 2 examples why it is bad.
>>
>> 1.
>> I had annotated the ortholog of  S. cerevisiae SIF2 (histone  
>> deacetylase complex subunit) to
>> histone deacetylase activity, contributes_to ISS.
>> It is a WD repeat protein  (which doesn't have HD activity, so it  
>> seems odd to attribute this function) the original SGD annotation  
>> is IPI.
>> I am now removing the pombe  annotation.
>>
>> 2.
>> FET3/YMR058W
>> is a copper oxidate involved in iron assimilation by reduction and  
>> transport. it isn't a transporter but it is part of the  
>> transporter complex.
>> This has an iron transporter activity (with contributes to) in  
>> SGD, and has been ISS's to this activity (without contributes_to)  
>> by two drosophila genes (FBgn0032116 and FBgn0039387)
>>
>> I see man many examples of this (too many to give feedback on)
>>
>> Can this go on the agenda for September meeting?
>>
>> Val
>>
>>
>>
>>
>>
>> Pascale Gaudet wrote:
>>
>>> I did mean unessential role; ie, the complex might have the  
>>> activity without the protein you're annotating, but adding it  
>>> enhances the activity (but not a regulator-- that would be  
>>> 'positive regulation of...'). But if adding it does nothing, I  
>>> would annotate to unknown.
>>>
>>> Pascale
>>>
>>> Valerie Wood wrote:
>>>
>>>> It seems so to me too, these are equivalent to process annotations
>>>> But did you mean essential role in the activity ? This is how I  
>>>> would use it.
>>>>
>>>> VAl
>>>>
>>>>
>>>> Pascale Gaudet <pgaudet at northwestern.edu> wrote:
>>>>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
>>>>> <html>
>>>>> <head>
>>>>>  <meta content="text/html;charset=ISO-8859-1" http- 
>>>>> equiv="Content-Type">
>>>>>  <title></title>
>>>>> </head>
>>>>> <body bgcolor="#ffffff" text="#000000">
>>>>> Val, <br>
>>>>> My understanding was that the subunit had to have at least an
>>>>> unessential role in the activity, although the documentation is  
>>>>> very
>>>>> ambiguous. But what you are describing is really capturing  
>>>>> component
>>>>> information with a function annotation. That seems wrong. <br>
>>>>> <br>
>>>>> Pascale<br>
>>>>> <br>
>>>>> <br>
>>>>> Valerie Wood wrote:
>>>>> <blockquote cite="mid:E1ILDTY-0006f2- 
>>>>> Vx at web-2-10.internal.sanger.ac.uk"
>>>>> type="cite">
>>>>>  <pre wrap="">I'm really asking the question why arbitrarily  
>>>>> add these function annotations to the 'unknown' subunits
>>>>> of complexes in the first place, when they are clearly not the  
>>>>> subunit that posseses the catalytic activity, or when they  
>>>>> clearly have another activity.
>>>>>
>>>>> Some of these complexes have
>>>>> ATPase activity,
>>>>> ubiquitin ligase activity
>>>>> acetyltransferase activity
>>>>> etc.
>>>>>
>>>>> so if this type of annotation was valid (or useful)  then we  
>>>>> would (presumably) add all these annotations to all subunits  
>>>>> for completion?
>>>>>
>>>>> Wouldn't users rather see  which subunits had known function  
>>>>> and which had 'unknown function'.
>>>>> It just seems that the qualifier is being used much more  
>>>>> liberally than was originally intended (i.e as a filler to  
>>>>> avoid adding an 'unknown' annotation)
>>>>>
>>>>> and it skews functional predictions/genome comparisons.
>>>>>
>>>>>
>>>>> val
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Chris Mungall <a class="moz-txt-link-rfc2396E"  
>>>>> href="mailto:cjm at fruitfly.org">&lt;cjm at fruitfly.org&gt;</a>  
>>>>> wrote:  </pre>
>>>>>  <blockquote type="cite">
>>>>>    <pre wrap="">..which like many such recommendations will be  
>>>>> ignored by the  majority of implementations (in this case it is  
>>>>> forgivable if we  issue the recommendation at this late stage..)
>>>>>
>>>>> Perhaps any association qualified in any way should be omitted  
>>>>> from  the default annotations we provide. We would of course  
>>>>> also provide  the full annotation set but it would be made  
>>>>> obvious that this  'advanced' set came with certain caveats
>>>>>
>>>>> On Aug 14, 2007, at 8:00 AM, Midori Harris wrote:
>>>>>
>>>>>    </pre>
>>>>>    <blockquote type="cite">
>>>>>      <pre wrap="">Whatever we decide, I would recommend that  
>>>>> computational analyses  omit 'contributes_to' annotations.
>>>>>
>>>>> m
>>>>>
>>>>> On Mon, 13 Aug 2007, Valerie Wood wrote:
>>>>>
>>>>>      </pre>
>>>>>      <blockquote type="cite">
>>>>>        <pre wrap="">Recently I'm wondering recently why we have  
>>>>> 2 meanings for  contributes_to:
>>>>>
>>>>> When the qualifier was initially implemented, it was so  
>>>>> function  terms could be added to complexes like DNA polymerase  
>>>>> and the F1  Fo ATPase where the function cannot be attributed  
>>>>> to a single  subunit. This seems fine.
>>>>>
>>>>> Increasingly I see annotations to complexes which are described  
>>>>> as  (for example) a histone acetyltransferase complex, and all  
>>>>> of the  subunits are given histone de/acetlytransferase or   
>>>>> methyltransferase activity with contributes_to, even thought  
>>>>> the  other subunits clearly have other functions (I see  
>>>>> ATPases,  ubiquitin ligases actin-like proteins etc, which are  
>>>>> commonly  associated with histone acetyltransferases and  
>>>>> methyltransferases).
>>>>>
>>>>> This seems odd, for a number of reasons.
>>>>> Often these subunits are not required for the activity, but  
>>>>> their  deletion (sometimes, but not always) affects the rate   
>>>>> the activity
>>>>>
>>>>> Primarily I don't understand what this type of  
>>>>> 'contributes_to'  annotation provides  to GO users above a  
>>>>> process annotation to the  histone acetylation (if this has  
>>>>> been shown), a complex  annotation, and a function term to  
>>>>> unknown/root node.  Isn't it  more useful to know that there is  
>>>>> some information about the  process, but the molecular function  
>>>>> is not known?
>>>>>
>>>>> 1) Another problem is that these particular chromatin  
>>>>> associated  complexes often have shared subunits so the  
>>>>> function annotations  aren't so clear-cut (i.e some of these  
>>>>> subunits may be members of  other complexes which do not have  
>>>>> this activity)
>>>>>
>>>>> 2) Also computational analysis using RCA which infer these   
>>>>> 'functions' to similar proteins which, from their domain   
>>>>> composition it is unlikely possess this activity. 3) It makes   
>>>>> cross species comparisons difficult because you get different   
>>>>> numbers of functions to what you would  expect when comparing   
>>>>> annotations between species. For example it is known how many   
>>>>> histone acetyltransferases /methytrasferases etc. pombe has,   
>>>>> compered to S. cerevisiae, but when I compare the 2 the  
>>>>> numbers  are skewed.
>>>>>
>>>>> The documentation clearly allows this (although there is not  
>>>>> an  example of this type of annotation in the documentation, so  
>>>>> I  wonder if this is what we meant?):
>>>>>
>>>>>        </pre>
>>>>>        <blockquote type="cite">
>>>>>          <pre wrap="">From the documentation:
>>>>>          </pre>
>>>>>        </blockquote>
>>>>>        <pre wrap="">
>>>>> Annotating individual gene products according to attributes of  
>>>>> a  complex is especially useful for molecular function  
>>>>> annotations in  cases where a complex has an activity, but not  
>>>>> all of the  individual subunits do. (For example, there may be  
>>>>> a known  catalytic subunit and one or more additional subunits,  
>>>>> or the  activity may only be present when the complex is  
>>>>> assembled.)  Molecular function annotations of complex subunits  
>>>>> that are not  known to possess the activity of the complex must  
>>>>> include the  entry contributes_to in the Qualifier column.
>>>>>
>>>>> Note that contributes_to is not needed to annotate a catalytic   
>>>>> subunit. Furthermore, contributes_to may be used for any non-  
>>>>> catalytic subunit, whether the subunit is essential for the   
>>>>> activity of the complex or not.
>>>>>
>>>>>
>>>>>
>>>>>        </pre>
>>>>>      </blockquote>
>>>>>    </blockquote>
>>>>>  </blockquote>
>>>>>  <pre wrap=""><!---->
>>>>>  </pre>
>>>>> </blockquote>
>>>>> </body>
>>>>> </html>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>>
>> -- 
>> The Wellcome Trust Sanger Institute is operated by Genome Research  
>> Limited, a charity registered in England with number 1021457 and a  
>> company registered in England with number 2742969, whose  
>> registered office is 215 Euston Road, London, NW1 2BE.
>>

--
Ben Hitz
Senior Scientific Programmer ** Saccharomyces Genome Database ** GO  
Consortium
Stanford University ** hitz at genome.stanford.edu






More information about the Go mailing list