CC to MF links (was Re: [go] contributes_to question)
Ben Hitz
hitz at genome.Stanford.EDU
Mon Aug 20 09:51:28 PDT 2007
Chris -
I agree this is how it should work. But there are some "gotchas"
from the software/database side that need to be addressed (not
necessarily at this instant).
Say I want a list of all genes "directly involved" in the histone
deacetylase activity. Now, whether or not this should include SIR2
might be a matter of debate - but lets say that I at least want all
members of the complex. The _software_ has to infer backwards that
when I say "show me these genes" I also want an exhuastive list of
members of the complex.
Maybe this is obvious, but I think much software exists which doesn't
make any inferences.
Ben
On Aug 17, 2007, at 5:01 PM, Chris Mungall wrote:
> Related to the contributes_to question and the relations between
> proteins, protein complexes and molecular functions:
>
> Currently in GO there is no explicitly asserted link between:
>
> CC - GO:0000118 histone deacetylase complex
> MF - GO:0004407 histone deacetylase activity
> BP - GO:0016575 histone deacetylation
>
> Clearly the function, process and components denoted by these terms
> are inter-related: the CC executes the MF, the MF catalyses the BP
>
> The parts of a whole do not necessarily inherit the function of the
> whole; the whole does not inherit the function of the parts; and the
> sibling parts of a whole do not necessarily share the same
> function. These kinds of rules can be stated formally so that there is
> less room for confusion (just like the true path rule).
>
> I suspect that one reason annotators may be tempted to make the
> erroneous transitive inference and transfer the function of the whole
> (complex) to the part (gene product) is because there is a perceived
> loss of information in *not* doing so.
>
> For example, if correct curation protocol is followed, then SIF2
> should not be annotated to HD Activity (MF), only to HD complex
> (CC). Searches for the MF "HD Activity" will exclude SIF2. This is
> correct behavior. However, it may be useful to have some intuitive way
> of navigating from a search on "HD activity" to SIF2, by means of the
> complex, so long as it is obvious that SIF2 does not inherit the
> function of the complex.
>
> Using the latest results from Obol, we can now link terms across GO
> ontologies. Links between CC and MF the relation would be labeled
> something like 'executes' or simply 'has function'. In a tree-type
> display we might show:
>
> [i] GO:0019213 deacetylase activity
> [i] GO:0033558 protein deacetylase activity
> [i] GO:0004407 histone deacetylase activity [RPD3]
> [X] GO:0000118 histone deacetylase complex [SIF2,
> SPCC1235.09]
> [i] GO:0000508 Rpd3L complex [RPD3]
> [i] GO:0000509 Rpd3S complex
> [i] GO:0032221 Clr6 histone deacetylase complex
> ...
>
> This display correctly represents the biology, but the danger here is
> that over the years we have built up an expectation in our users that
> the relation label can be ignored and gene products can be propagated
> up the DAG, willy-nilly. The correct way to read the DAG above is:
>
> SIF2 is localized_to HD complex,
> HD complex has_function HD activity
>
> And we can infer
>
> SIF2 is localized_to some complex that has_function deacetylase
> activity
>
> But we *cannot* infer anything about the activity of SIF2 without
> further evidence. We would not propagate SIF2 up in slimmers, term
> enrichment, gene product count summaries or any other graph based
> operation (a curator *may* apply their expertise and decide to make
> contributes_to annotations based on these CC to MF links, but this
> would not be automatic).
>
> This means we have to be careful about how we release these (valuable)
> cross-ontology links to the public, and ensure they are not abused.
> From
> a software perspective we are almost ready to load these kinds of
> links and start showing them in AmiGO, but we should proceed carefully
> to make sure these kinds of relations are better understood both
> within GO and outside.
>
> This seems to be related to the contributes_to issue. Is this worth
> discussing in the same slot at the GO meeting?
>
> The (unvetted) CC to MF links are in cvs:
>
> go/scratch/obol_results/
> cellular_component_links_to_molecular_function.obo
>
> Cheers
> Chris
>
> On Aug 16, 2007, at 5:19 AM, Valerie Wood wrote:
>
>> It seems we have all used it slightly differently anyway.
>>
>> But here are two 2 examples why it is bad.
>>
>> 1.
>> I had annotated the ortholog of S. cerevisiae SIF2 (histone
>> deacetylase complex subunit) to
>> histone deacetylase activity, contributes_to ISS.
>> It is a WD repeat protein (which doesn't have HD activity, so it
>> seems odd to attribute this function) the original SGD annotation
>> is IPI.
>> I am now removing the pombe annotation.
>>
>> 2.
>> FET3/YMR058W
>> is a copper oxidate involved in iron assimilation by reduction and
>> transport. it isn't a transporter but it is part of the
>> transporter complex.
>> This has an iron transporter activity (with contributes to) in
>> SGD, and has been ISS's to this activity (without contributes_to)
>> by two drosophila genes (FBgn0032116 and FBgn0039387)
>>
>> I see man many examples of this (too many to give feedback on)
>>
>> Can this go on the agenda for September meeting?
>>
>> Val
>>
>>
>>
>>
>>
>> Pascale Gaudet wrote:
>>
>>> I did mean unessential role; ie, the complex might have the
>>> activity without the protein you're annotating, but adding it
>>> enhances the activity (but not a regulator-- that would be
>>> 'positive regulation of...'). But if adding it does nothing, I
>>> would annotate to unknown.
>>>
>>> Pascale
>>>
>>> Valerie Wood wrote:
>>>
>>>> It seems so to me too, these are equivalent to process annotations
>>>> But did you mean essential role in the activity ? This is how I
>>>> would use it.
>>>>
>>>> VAl
>>>>
>>>>
>>>> Pascale Gaudet <pgaudet at northwestern.edu> wrote:
>>>>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
>>>>> <html>
>>>>> <head>
>>>>> <meta content="text/html;charset=ISO-8859-1" http-
>>>>> equiv="Content-Type">
>>>>> <title></title>
>>>>> </head>
>>>>> <body bgcolor="#ffffff" text="#000000">
>>>>> Val, <br>
>>>>> My understanding was that the subunit had to have at least an
>>>>> unessential role in the activity, although the documentation is
>>>>> very
>>>>> ambiguous. But what you are describing is really capturing
>>>>> component
>>>>> information with a function annotation. That seems wrong. <br>
>>>>> <br>
>>>>> Pascale<br>
>>>>> <br>
>>>>> <br>
>>>>> Valerie Wood wrote:
>>>>> <blockquote cite="mid:E1ILDTY-0006f2-
>>>>> Vx at web-2-10.internal.sanger.ac.uk"
>>>>> type="cite">
>>>>> <pre wrap="">I'm really asking the question why arbitrarily
>>>>> add these function annotations to the 'unknown' subunits
>>>>> of complexes in the first place, when they are clearly not the
>>>>> subunit that posseses the catalytic activity, or when they
>>>>> clearly have another activity.
>>>>>
>>>>> Some of these complexes have
>>>>> ATPase activity,
>>>>> ubiquitin ligase activity
>>>>> acetyltransferase activity
>>>>> etc.
>>>>>
>>>>> so if this type of annotation was valid (or useful) then we
>>>>> would (presumably) add all these annotations to all subunits
>>>>> for completion?
>>>>>
>>>>> Wouldn't users rather see which subunits had known function
>>>>> and which had 'unknown function'.
>>>>> It just seems that the qualifier is being used much more
>>>>> liberally than was originally intended (i.e as a filler to
>>>>> avoid adding an 'unknown' annotation)
>>>>>
>>>>> and it skews functional predictions/genome comparisons.
>>>>>
>>>>>
>>>>> val
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Chris Mungall <a class="moz-txt-link-rfc2396E"
>>>>> href="mailto:cjm at fruitfly.org"><cjm at fruitfly.org></a>
>>>>> wrote: </pre>
>>>>> <blockquote type="cite">
>>>>> <pre wrap="">..which like many such recommendations will be
>>>>> ignored by the majority of implementations (in this case it is
>>>>> forgivable if we issue the recommendation at this late stage..)
>>>>>
>>>>> Perhaps any association qualified in any way should be omitted
>>>>> from the default annotations we provide. We would of course
>>>>> also provide the full annotation set but it would be made
>>>>> obvious that this 'advanced' set came with certain caveats
>>>>>
>>>>> On Aug 14, 2007, at 8:00 AM, Midori Harris wrote:
>>>>>
>>>>> </pre>
>>>>> <blockquote type="cite">
>>>>> <pre wrap="">Whatever we decide, I would recommend that
>>>>> computational analyses omit 'contributes_to' annotations.
>>>>>
>>>>> m
>>>>>
>>>>> On Mon, 13 Aug 2007, Valerie Wood wrote:
>>>>>
>>>>> </pre>
>>>>> <blockquote type="cite">
>>>>> <pre wrap="">Recently I'm wondering recently why we have
>>>>> 2 meanings for contributes_to:
>>>>>
>>>>> When the qualifier was initially implemented, it was so
>>>>> function terms could be added to complexes like DNA polymerase
>>>>> and the F1 Fo ATPase where the function cannot be attributed
>>>>> to a single subunit. This seems fine.
>>>>>
>>>>> Increasingly I see annotations to complexes which are described
>>>>> as (for example) a histone acetyltransferase complex, and all
>>>>> of the subunits are given histone de/acetlytransferase or
>>>>> methyltransferase activity with contributes_to, even thought
>>>>> the other subunits clearly have other functions (I see
>>>>> ATPases, ubiquitin ligases actin-like proteins etc, which are
>>>>> commonly associated with histone acetyltransferases and
>>>>> methyltransferases).
>>>>>
>>>>> This seems odd, for a number of reasons.
>>>>> Often these subunits are not required for the activity, but
>>>>> their deletion (sometimes, but not always) affects the rate
>>>>> the activity
>>>>>
>>>>> Primarily I don't understand what this type of
>>>>> 'contributes_to' annotation provides to GO users above a
>>>>> process annotation to the histone acetylation (if this has
>>>>> been shown), a complex annotation, and a function term to
>>>>> unknown/root node. Isn't it more useful to know that there is
>>>>> some information about the process, but the molecular function
>>>>> is not known?
>>>>>
>>>>> 1) Another problem is that these particular chromatin
>>>>> associated complexes often have shared subunits so the
>>>>> function annotations aren't so clear-cut (i.e some of these
>>>>> subunits may be members of other complexes which do not have
>>>>> this activity)
>>>>>
>>>>> 2) Also computational analysis using RCA which infer these
>>>>> 'functions' to similar proteins which, from their domain
>>>>> composition it is unlikely possess this activity. 3) It makes
>>>>> cross species comparisons difficult because you get different
>>>>> numbers of functions to what you would expect when comparing
>>>>> annotations between species. For example it is known how many
>>>>> histone acetyltransferases /methytrasferases etc. pombe has,
>>>>> compered to S. cerevisiae, but when I compare the 2 the
>>>>> numbers are skewed.
>>>>>
>>>>> The documentation clearly allows this (although there is not
>>>>> an example of this type of annotation in the documentation, so
>>>>> I wonder if this is what we meant?):
>>>>>
>>>>> </pre>
>>>>> <blockquote type="cite">
>>>>> <pre wrap="">From the documentation:
>>>>> </pre>
>>>>> </blockquote>
>>>>> <pre wrap="">
>>>>> Annotating individual gene products according to attributes of
>>>>> a complex is especially useful for molecular function
>>>>> annotations in cases where a complex has an activity, but not
>>>>> all of the individual subunits do. (For example, there may be
>>>>> a known catalytic subunit and one or more additional subunits,
>>>>> or the activity may only be present when the complex is
>>>>> assembled.) Molecular function annotations of complex subunits
>>>>> that are not known to possess the activity of the complex must
>>>>> include the entry contributes_to in the Qualifier column.
>>>>>
>>>>> Note that contributes_to is not needed to annotate a catalytic
>>>>> subunit. Furthermore, contributes_to may be used for any non-
>>>>> catalytic subunit, whether the subunit is essential for the
>>>>> activity of the complex or not.
>>>>>
>>>>>
>>>>>
>>>>> </pre>
>>>>> </blockquote>
>>>>> </blockquote>
>>>>> </blockquote>
>>>>> <pre wrap=""><!---->
>>>>> </pre>
>>>>> </blockquote>
>>>>> </body>
>>>>> </html>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome Research
>> Limited, a charity registered in England with number 1021457 and a
>> company registered in England with number 2742969, whose
>> registered office is 215 Euston Road, London, NW1 2BE.
>>
--
Ben Hitz
Senior Scientific Programmer ** Saccharomyces Genome Database ** GO
Consortium
Stanford University ** hitz at genome.stanford.edu
More information about the Go
mailing list