CC to MF links (was Re: [go] contributes_to question)

Chris Mungall cjm at fruitfly.org
Fri Aug 17 17:01:15 PDT 2007


Related to the contributes_to question and the relations between  
proteins, protein complexes and molecular functions:

Currently in GO there is no explicitly asserted link between:

CC - GO:0000118 histone deacetylase complex
MF - GO:0004407 histone deacetylase activity
BP - GO:0016575 histone deacetylation

Clearly the function, process and components denoted by these terms
are inter-related: the CC executes the MF, the MF catalyses the BP

The parts of a whole do not necessarily inherit the function of the
whole; the whole does not inherit the function of the parts; and the
sibling parts of a whole do not necessarily share the same
function. These kinds of rules can be stated formally so that there is
less room for confusion (just like the true path rule).

I suspect that one reason annotators may be tempted to make the
erroneous transitive inference and transfer the function of the whole
(complex) to the part (gene product) is because there is a perceived
loss of information in *not* doing so.

For example, if correct curation protocol is followed, then SIF2
should not be annotated to HD Activity (MF), only to HD complex
(CC). Searches for the MF "HD Activity" will exclude SIF2. This is
correct behavior. However, it may be useful to have some intuitive way
of navigating from a search on "HD activity" to SIF2, by means of the
complex, so long as it is obvious that SIF2 does not inherit the
function of the complex.

Using the latest results from Obol, we can now link terms across GO
ontologies. Links between CC and MF the relation would be labeled
something like 'executes' or simply 'has function'. In a tree-type
display we might show:

    [i] GO:0019213 deacetylase activity
      [i] GO:0033558 protein deacetylase activity
       [i] GO:0004407 histone deacetylase activity    [RPD3]
        [X] GO:0000118 histone deacetylase complex    [SIF2,  
SPCC1235.09]
         [i] GO:0000508 Rpd3L complex                 [RPD3]
         [i] GO:0000509 Rpd3S complex
         [i] GO:0032221 Clr6 histone deacetylase complex
         ...

This display correctly represents the biology, but the danger here is
that over the years we have built up an expectation in our users that
the relation label can be ignored and gene products can be propagated
up the DAG, willy-nilly. The correct way to read the DAG above is:

   SIF2 is localized_to HD complex,
   HD complex has_function HD activity

And we can infer

   SIF2 is localized_to some complex that has_function deacetylase
   activity

But we *cannot* infer anything about the activity of SIF2 without
further evidence. We would not propagate SIF2 up in slimmers, term
enrichment, gene product count summaries or any other graph based
operation (a curator *may* apply their expertise and decide to make
contributes_to annotations based on these CC to MF links, but this
would not be automatic).

This means we have to be careful about how we release these (valuable)
cross-ontology links to the public, and ensure they are not abused. From
a software perspective we are almost ready to load these kinds of
links and start showing them in AmiGO, but we should proceed carefully
to make sure these kinds of relations are better understood both
within GO and outside.

This seems to be related to the contributes_to issue. Is this worth
discussing in the same slot at the GO meeting?

The (unvetted) CC to MF links are in cvs:

   go/scratch/obol_results/ 
cellular_component_links_to_molecular_function.obo

Cheers
Chris

On Aug 16, 2007, at 5:19 AM, Valerie Wood wrote:

> It seems we have all used it slightly differently anyway.
>
> But here are two 2 examples why it is bad.
>
> 1.
> I had annotated the ortholog of  S. cerevisiae SIF2 (histone  
> deacetylase complex subunit) to
> histone deacetylase activity, contributes_to ISS.
> It is a WD repeat protein  (which doesn't have HD activity, so it  
> seems odd to attribute this function) the original SGD annotation  
> is IPI.
> I am now removing the pombe  annotation.
>
> 2.
> FET3/YMR058W
> is a copper oxidate involved in iron assimilation by reduction and  
> transport. it isn't a transporter but it is part of the transporter  
> complex.
> This has an iron transporter activity (with contributes to) in SGD,  
> and has been ISS's to this activity (without contributes_to) by two  
> drosophila genes (FBgn0032116 and FBgn0039387)
>
> I see man many examples of this (too many to give feedback on)
>
> Can this go on the agenda for September meeting?
>
> Val
>
>
>
>
>
> Pascale Gaudet wrote:
>
>> I did mean unessential role; ie, the complex might have the  
>> activity without the protein you're annotating, but adding it  
>> enhances the activity (but not a regulator-- that would be  
>> 'positive regulation of...'). But if adding it does nothing, I  
>> would annotate to unknown.
>>
>> Pascale
>>
>> Valerie Wood wrote:
>>
>>> It seems so to me too, these are equivalent to process annotations
>>> But did you mean essential role in the activity ? This is how I  
>>> would use it.
>>>
>>> VAl
>>>
>>>
>>> Pascale Gaudet <pgaudet at northwestern.edu> wrote:
>>>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
>>>> <html>
>>>> <head>
>>>>  <meta content="text/html;charset=ISO-8859-1" http- 
>>>> equiv="Content-Type">
>>>>  <title></title>
>>>> </head>
>>>> <body bgcolor="#ffffff" text="#000000">
>>>> Val, <br>
>>>> My understanding was that the subunit had to have at least an
>>>> unessential role in the activity, although the documentation is  
>>>> very
>>>> ambiguous. But what you are describing is really capturing  
>>>> component
>>>> information with a function annotation. That seems wrong. <br>
>>>> <br>
>>>> Pascale<br>
>>>> <br>
>>>> <br>
>>>> Valerie Wood wrote:
>>>> <blockquote cite="mid:E1ILDTY-0006f2- 
>>>> Vx at web-2-10.internal.sanger.ac.uk"
>>>> type="cite">
>>>>  <pre wrap="">I'm really asking the question why arbitrarily add  
>>>> these function annotations to the 'unknown' subunits
>>>> of complexes in the first place, when they are clearly not the  
>>>> subunit that posseses the catalytic activity, or when they  
>>>> clearly have another activity.
>>>>
>>>> Some of these complexes have
>>>> ATPase activity,
>>>> ubiquitin ligase activity
>>>> acetyltransferase activity
>>>> etc.
>>>>
>>>> so if this type of annotation was valid (or useful)  then we  
>>>> would (presumably) add all these annotations to all subunits for  
>>>> completion?
>>>>
>>>> Wouldn't users rather see  which subunits had known function and  
>>>> which had 'unknown function'.
>>>> It just seems that the qualifier is being used much more  
>>>> liberally than was originally intended (i.e as a filler to avoid  
>>>> adding an 'unknown' annotation)
>>>>
>>>> and it skews functional predictions/genome comparisons.
>>>>
>>>>
>>>> val
>>>>
>>>>
>>>>
>>>>
>>>> Chris Mungall <a class="moz-txt-link-rfc2396E"  
>>>> href="mailto:cjm at fruitfly.org">&lt;cjm at fruitfly.org&gt;</a>  
>>>> wrote:  </pre>
>>>>  <blockquote type="cite">
>>>>    <pre wrap="">..which like many such recommendations will be  
>>>> ignored by the  majority of implementations (in this case it is  
>>>> forgivable if we  issue the recommendation at this late stage..)
>>>>
>>>> Perhaps any association qualified in any way should be omitted  
>>>> from  the default annotations we provide. We would of course  
>>>> also provide  the full annotation set but it would be made  
>>>> obvious that this  'advanced' set came with certain caveats
>>>>
>>>> On Aug 14, 2007, at 8:00 AM, Midori Harris wrote:
>>>>
>>>>    </pre>
>>>>    <blockquote type="cite">
>>>>      <pre wrap="">Whatever we decide, I would recommend that  
>>>> computational analyses  omit 'contributes_to' annotations.
>>>>
>>>> m
>>>>
>>>> On Mon, 13 Aug 2007, Valerie Wood wrote:
>>>>
>>>>      </pre>
>>>>      <blockquote type="cite">
>>>>        <pre wrap="">Recently I'm wondering recently why we have  
>>>> 2 meanings for  contributes_to:
>>>>
>>>> When the qualifier was initially implemented, it was so  
>>>> function  terms could be added to complexes like DNA polymerase  
>>>> and the F1  Fo ATPase where the function cannot be attributed to  
>>>> a single  subunit. This seems fine.
>>>>
>>>> Increasingly I see annotations to complexes which are described  
>>>> as  (for example) a histone acetyltransferase complex, and all  
>>>> of the  subunits are given histone de/acetlytransferase or   
>>>> methyltransferase activity with contributes_to, even thought  
>>>> the  other subunits clearly have other functions (I see  
>>>> ATPases,  ubiquitin ligases actin-like proteins etc, which are  
>>>> commonly  associated with histone acetyltransferases and  
>>>> methyltransferases).
>>>>
>>>> This seems odd, for a number of reasons.
>>>> Often these subunits are not required for the activity, but  
>>>> their  deletion (sometimes, but not always) affects the rate   
>>>> the activity
>>>>
>>>> Primarily I don't understand what this type of 'contributes_to'   
>>>> annotation provides  to GO users above a process annotation to  
>>>> the  histone acetylation (if this has been shown), a complex   
>>>> annotation, and a function term to unknown/root node.  Isn't it   
>>>> more useful to know that there is some information about the   
>>>> process, but the molecular function is not known?
>>>>
>>>> 1) Another problem is that these particular chromatin  
>>>> associated  complexes often have shared subunits so the function  
>>>> annotations  aren't so clear-cut (i.e some of these subunits may  
>>>> be members of  other complexes which do not have this activity)
>>>>
>>>> 2) Also computational analysis using RCA which infer these   
>>>> 'functions' to similar proteins which, from their domain   
>>>> composition it is unlikely possess this activity. 3) It makes   
>>>> cross species comparisons difficult because you get different   
>>>> numbers of functions to what you would  expect when comparing   
>>>> annotations between species. For example it is known how many   
>>>> histone acetyltransferases /methytrasferases etc. pombe has,   
>>>> compered to S. cerevisiae, but when I compare the 2 the numbers   
>>>> are skewed.
>>>>
>>>> The documentation clearly allows this (although there is not an   
>>>> example of this type of annotation in the documentation, so I   
>>>> wonder if this is what we meant?):
>>>>
>>>>        </pre>
>>>>        <blockquote type="cite">
>>>>          <pre wrap="">From the documentation:
>>>>          </pre>
>>>>        </blockquote>
>>>>        <pre wrap="">
>>>> Annotating individual gene products according to attributes of  
>>>> a  complex is especially useful for molecular function  
>>>> annotations in  cases where a complex has an activity, but not  
>>>> all of the  individual subunits do. (For example, there may be a  
>>>> known  catalytic subunit and one or more additional subunits, or  
>>>> the  activity may only be present when the complex is  
>>>> assembled.)  Molecular function annotations of complex subunits  
>>>> that are not  known to possess the activity of the complex must  
>>>> include the  entry contributes_to in the Qualifier column.
>>>>
>>>> Note that contributes_to is not needed to annotate a catalytic   
>>>> subunit. Furthermore, contributes_to may be used for any non-  
>>>> catalytic subunit, whether the subunit is essential for the   
>>>> activity of the complex or not.
>>>>
>>>>
>>>>
>>>>        </pre>
>>>>      </blockquote>
>>>>    </blockquote>
>>>>  </blockquote>
>>>>  <pre wrap=""><!---->
>>>>  </pre>
>>>> </blockquote>
>>>> </body>
>>>> </html>
>>>>
>>>>
>>>
>>>
>
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research  
> Limited, a charity registered in England with number 1021457 and a  
> company registered in England with number 2742969, whose registered  
> office is 215 Euston Road, London, NW1 2BE.
>




More information about the Go mailing list