CC to MF links (was Re: [go] contributes_to question)
Benjamin Hitz
hitz at genome.Stanford.EDU
Mon Aug 20 13:15:05 PDT 2007
>
> Ben, did you consciously shift the example from SIF2 to SIR2?
No, typo.
Ben
>
> Just to recap:
>
> SIR2 is annotated as having HD activity, and is localised to the
> RENT complex
> (amongst other things)
> SIF2 is annotated as contributing to HD activity, but should not
> be, according to Val
>
> Let's assume the latter is rectified and SIF2 is annotated to HD
> complex but not HD activity (neither contributes_to not direct)
>
> If you want to know the known members of a specific complex in a
> particular species, this is just standard par-for-the-course GO
> queries.
>
> If you want to know the genes involved in HD activity you do a
> normal GO DAG query, but do not traverse any CC-MF links. If you
> specifically want "directly involved in HD activity", it is the
> same query but omitting any annotations with the contributes_to
> qualifier.
>
> I think the tricky question is whether it is a good idea to allow
> queries of the form "show me genes involved in X activity or
> localised to complexes that have X activity", and if so how these
> queries should be presented to a user in a non-confusing way.
>
> I don't think there should be any debate involved on a case-by-case
> basis - we should have rules about how information is propagated.
> I'm not quite following your example about inferring backwards.
>
>> Maybe this is obvious, but I think much software exists which
>> doesn't make any inferences.
>
> I place the external software that allows GO queries or GO based
> analyses into 3 categories:
>
> [1] makes no inferences - ie no DAG traversal whatsoever
> [2] uses the DAG, but ignores the relation, and assumes information
> can be propagated up the DAG regardless
> [3] uses the DAG and the relations in the DAG
>
> There is a scarily high amount of tools and interfaces in [1],
> which is something we have to work on as part of our outreach, but
> can be considered separately from the CC to MF links.
>
> The majority falls into [2], which means the CC-to-MF links should
> be an optional extension to the main GO files. This will ensure [2]
> will continue to work correctly without erroneous inferences, and
> the more advanced providers can consciously use the additional
> links to provide more advanced capabilities.
>
>>
>> Ben
>>
>> On Aug 17, 2007, at 5:01 PM, Chris Mungall wrote:
>>
>>> Related to the contributes_to question and the relations between
>>> proteins, protein complexes and molecular functions:
>>>
>>> Currently in GO there is no explicitly asserted link between:
>>>
>>> CC - GO:0000118 histone deacetylase complex
>>> MF - GO:0004407 histone deacetylase activity
>>> BP - GO:0016575 histone deacetylation
>>>
>>> Clearly the function, process and components denoted by these terms
>>> are inter-related: the CC executes the MF, the MF catalyses the BP
>>>
>>> The parts of a whole do not necessarily inherit the function of the
>>> whole; the whole does not inherit the function of the parts; and the
>>> sibling parts of a whole do not necessarily share the same
>>> function. These kinds of rules can be stated formally so that
>>> there is
>>> less room for confusion (just like the true path rule).
>>>
>>> I suspect that one reason annotators may be tempted to make the
>>> erroneous transitive inference and transfer the function of the
>>> whole
>>> (complex) to the part (gene product) is because there is a perceived
>>> loss of information in *not* doing so.
>>>
>>> For example, if correct curation protocol is followed, then SIF2
>>> should not be annotated to HD Activity (MF), only to HD complex
>>> (CC). Searches for the MF "HD Activity" will exclude SIF2. This is
>>> correct behavior. However, it may be useful to have some
>>> intuitive way
>>> of navigating from a search on "HD activity" to SIF2, by means of
>>> the
>>> complex, so long as it is obvious that SIF2 does not inherit the
>>> function of the complex.
>>>
>>> Using the latest results from Obol, we can now link terms across GO
>>> ontologies. Links between CC and MF the relation would be labeled
>>> something like 'executes' or simply 'has function'. In a tree-type
>>> display we might show:
>>>
>>> [i] GO:0019213 deacetylase activity
>>> [i] GO:0033558 protein deacetylase activity
>>> [i] GO:0004407 histone deacetylase activity [RPD3]
>>> [X] GO:0000118 histone deacetylase complex [SIF2,
>>> SPCC1235.09]
>>> [i] GO:0000508 Rpd3L complex [RPD3]
>>> [i] GO:0000509 Rpd3S complex
>>> [i] GO:0032221 Clr6 histone deacetylase complex
>>> ...
>>>
>>> This display correctly represents the biology, but the danger
>>> here is
>>> that over the years we have built up an expectation in our users
>>> that
>>> the relation label can be ignored and gene products can be
>>> propagated
>>> up the DAG, willy-nilly. The correct way to read the DAG above is:
>>>
>>> SIF2 is localized_to HD complex,
>>> HD complex has_function HD activity
>>>
>>> And we can infer
>>>
>>> SIF2 is localized_to some complex that has_function deacetylase
>>> activity
>>>
>>> But we *cannot* infer anything about the activity of SIF2 without
>>> further evidence. We would not propagate SIF2 up in slimmers, term
>>> enrichment, gene product count summaries or any other graph based
>>> operation (a curator *may* apply their expertise and decide to make
>>> contributes_to annotations based on these CC to MF links, but this
>>> would not be automatic).
>>>
>>> This means we have to be careful about how we release these
>>> (valuable)
>>> cross-ontology links to the public, and ensure they are not
>>> abused. From
>>> a software perspective we are almost ready to load these kinds of
>>> links and start showing them in AmiGO, but we should proceed
>>> carefully
>>> to make sure these kinds of relations are better understood both
>>> within GO and outside.
>>>
>>> This seems to be related to the contributes_to issue. Is this worth
>>> discussing in the same slot at the GO meeting?
>>>
>>> The (unvetted) CC to MF links are in cvs:
>>>
>>> go/scratch/obol_results/
>>> cellular_component_links_to_molecular_function.obo
>>>
>>> Cheers
>>> Chris
>>>
>>> On Aug 16, 2007, at 5:19 AM, Valerie Wood wrote:
>>>
>>>> It seems we have all used it slightly differently anyway.
>>>>
>>>> But here are two 2 examples why it is bad.
>>>>
>>>> 1.
>>>> I had annotated the ortholog of S. cerevisiae SIF2 (histone
>>>> deacetylase complex subunit) to
>>>> histone deacetylase activity, contributes_to ISS.
>>>> It is a WD repeat protein (which doesn't have HD activity, so
>>>> it seems odd to attribute this function) the original SGD
>>>> annotation is IPI.
>>>> I am now removing the pombe annotation.
>>>>
>>>> 2.
>>>> FET3/YMR058W
>>>> is a copper oxidate involved in iron assimilation by reduction
>>>> and transport. it isn't a transporter but it is part of the
>>>> transporter complex.
>>>> This has an iron transporter activity (with contributes to) in
>>>> SGD, and has been ISS's to this activity (without
>>>> contributes_to) by two drosophila genes (FBgn0032116 and
>>>> FBgn0039387)
>>>>
>>>> I see man many examples of this (too many to give feedback on)
>>>>
>>>> Can this go on the agenda for September meeting?
>>>>
>>>> Val
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Pascale Gaudet wrote:
>>>>
>>>>> I did mean unessential role; ie, the complex might have the
>>>>> activity without the protein you're annotating, but adding it
>>>>> enhances the activity (but not a regulator-- that would be
>>>>> 'positive regulation of...'). But if adding it does nothing, I
>>>>> would annotate to unknown.
>>>>>
>>>>> Pascale
>>>>>
>>>>> Valerie Wood wrote:
>>>>>
>>>>>> It seems so to me too, these are equivalent to process
>>>>>> annotations
>>>>>> But did you mean essential role in the activity ? This is how
>>>>>> I would use it.
>>>>>>
>>>>>> VAl
>>>>>>
>>>>>>
>>>>>> Pascale Gaudet <pgaudet at northwestern.edu> wrote:
>>>>>>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
>>>>>>> <html>
>>>>>>> <head>
>>>>>>> <meta content="text/html;charset=ISO-8859-1" http-
>>>>>>> equiv="Content-Type">
>>>>>>> <title></title>
>>>>>>> </head>
>>>>>>> <body bgcolor="#ffffff" text="#000000">
>>>>>>> Val, <br>
>>>>>>> My understanding was that the subunit had to have at least an
>>>>>>> unessential role in the activity, although the documentation
>>>>>>> is very
>>>>>>> ambiguous. But what you are describing is really capturing
>>>>>>> component
>>>>>>> information with a function annotation. That seems wrong. <br>
>>>>>>> <br>
>>>>>>> Pascale<br>
>>>>>>> <br>
>>>>>>> <br>
>>>>>>> Valerie Wood wrote:
>>>>>>> <blockquote cite="mid:E1ILDTY-0006f2-
>>>>>>> Vx at web-2-10.internal.sanger.ac.uk"
>>>>>>> type="cite">
>>>>>>> <pre wrap="">I'm really asking the question why arbitrarily
>>>>>>> add these function annotations to the 'unknown' subunits
>>>>>>> of complexes in the first place, when they are clearly not
>>>>>>> the subunit that posseses the catalytic activity, or when
>>>>>>> they clearly have another activity.
>>>>>>>
>>>>>>> Some of these complexes have
>>>>>>> ATPase activity,
>>>>>>> ubiquitin ligase activity
>>>>>>> acetyltransferase activity
>>>>>>> etc.
>>>>>>>
>>>>>>> so if this type of annotation was valid (or useful) then we
>>>>>>> would (presumably) add all these annotations to all subunits
>>>>>>> for completion?
>>>>>>>
>>>>>>> Wouldn't users rather see which subunits had known function
>>>>>>> and which had 'unknown function'.
>>>>>>> It just seems that the qualifier is being used much more
>>>>>>> liberally than was originally intended (i.e as a filler to
>>>>>>> avoid adding an 'unknown' annotation)
>>>>>>>
>>>>>>> and it skews functional predictions/genome comparisons.
>>>>>>>
>>>>>>>
>>>>>>> val
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Chris Mungall <a class="moz-txt-link-rfc2396E"
>>>>>>> href="mailto:cjm at fruitfly.org"><cjm at fruitfly.org></a>
>>>>>>> wrote: </pre>
>>>>>>> <blockquote type="cite">
>>>>>>> <pre wrap="">..which like many such recommendations will
>>>>>>> be ignored by the majority of implementations (in this case
>>>>>>> it is forgivable if we issue the recommendation at this late
>>>>>>> stage..)
>>>>>>>
>>>>>>> Perhaps any association qualified in any way should be
>>>>>>> omitted from the default annotations we provide. We would of
>>>>>>> course also provide the full annotation set but it would be
>>>>>>> made obvious that this 'advanced' set came with certain caveats
>>>>>>>
>>>>>>> On Aug 14, 2007, at 8:00 AM, Midori Harris wrote:
>>>>>>>
>>>>>>> </pre>
>>>>>>> <blockquote type="cite">
>>>>>>> <pre wrap="">Whatever we decide, I would recommend that
>>>>>>> computational analyses omit 'contributes_to' annotations.
>>>>>>>
>>>>>>> m
>>>>>>>
>>>>>>> On Mon, 13 Aug 2007, Valerie Wood wrote:
>>>>>>>
>>>>>>> </pre>
>>>>>>> <blockquote type="cite">
>>>>>>> <pre wrap="">Recently I'm wondering recently why we
>>>>>>> have 2 meanings for contributes_to:
>>>>>>>
>>>>>>> When the qualifier was initially implemented, it was so
>>>>>>> function terms could be added to complexes like DNA
>>>>>>> polymerase and the F1 Fo ATPase where the function cannot be
>>>>>>> attributed to a single subunit. This seems fine.
>>>>>>>
>>>>>>> Increasingly I see annotations to complexes which are
>>>>>>> described as (for example) a histone acetyltransferase
>>>>>>> complex, and all of the subunits are given histone de/
>>>>>>> acetlytransferase or methyltransferase activity with
>>>>>>> contributes_to, even thought the other subunits clearly have
>>>>>>> other functions (I see ATPases, ubiquitin ligases actin-like
>>>>>>> proteins etc, which are commonly associated with histone
>>>>>>> acetyltransferases and methyltransferases).
>>>>>>>
>>>>>>> This seems odd, for a number of reasons.
>>>>>>> Often these subunits are not required for the activity, but
>>>>>>> their deletion (sometimes, but not always) affects the rate
>>>>>>> the activity
>>>>>>>
>>>>>>> Primarily I don't understand what this type of
>>>>>>> 'contributes_to' annotation provides to GO users above a
>>>>>>> process annotation to the histone acetylation (if this has
>>>>>>> been shown), a complex annotation, and a function term to
>>>>>>> unknown/root node. Isn't it more useful to know that there
>>>>>>> is some information about the process, but the molecular
>>>>>>> function is not known?
>>>>>>>
>>>>>>> 1) Another problem is that these particular chromatin
>>>>>>> associated complexes often have shared subunits so the
>>>>>>> function annotations aren't so clear-cut (i.e some of these
>>>>>>> subunits may be members of other complexes which do not have
>>>>>>> this activity)
>>>>>>>
>>>>>>> 2) Also computational analysis using RCA which infer these
>>>>>>> 'functions' to similar proteins which, from their domain
>>>>>>> composition it is unlikely possess this activity. 3) It
>>>>>>> makes cross species comparisons difficult because you get
>>>>>>> different numbers of functions to what you would expect
>>>>>>> when comparing annotations between species. For example it
>>>>>>> is known how many histone acetyltransferases /
>>>>>>> methytrasferases etc. pombe has, compered to S. cerevisiae,
>>>>>>> but when I compare the 2 the numbers are skewed.
>>>>>>>
>>>>>>> The documentation clearly allows this (although there is not
>>>>>>> an example of this type of annotation in the documentation,
>>>>>>> so I wonder if this is what we meant?):
>>>>>>>
>>>>>>> </pre>
>>>>>>> <blockquote type="cite">
>>>>>>> <pre wrap="">From the documentation:
>>>>>>> </pre>
>>>>>>> </blockquote>
>>>>>>> <pre wrap="">
>>>>>>> Annotating individual gene products according to attributes
>>>>>>> of a complex is especially useful for molecular function
>>>>>>> annotations in cases where a complex has an activity, but
>>>>>>> not all of the individual subunits do. (For example, there
>>>>>>> may be a known catalytic subunit and one or more additional
>>>>>>> subunits, or the activity may only be present when the
>>>>>>> complex is assembled.) Molecular function annotations of
>>>>>>> complex subunits that are not known to possess the activity
>>>>>>> of the complex must include the entry contributes_to in the
>>>>>>> Qualifier column.
>>>>>>>
>>>>>>> Note that contributes_to is not needed to annotate a
>>>>>>> catalytic subunit. Furthermore, contributes_to may be used
>>>>>>> for any non- catalytic subunit, whether the subunit is
>>>>>>> essential for the activity of the complex or not.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> </pre>
>>>>>>> </blockquote>
>>>>>>> </blockquote>
>>>>>>> </blockquote>
>>>>>>> <pre wrap=""><!---->
>>>>>>> </pre>
>>>>>>> </blockquote>
>>>>>>> </body>
>>>>>>> </html>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> The Wellcome Trust Sanger Institute is operated by Genome
>>>> Research Limited, a charity registered in England with number
>>>> 1021457 and a company registered in England with number 2742969,
>>>> whose registered office is 215 Euston Road, London, NW1 2BE.
>>>>
>>
>> --
>> Ben Hitz
>> Senior Scientific Programmer ** Saccharomyces Genome Database **
>> GO Consortium
>> Stanford University ** hitz at genome.stanford.edu
>>
>>
>>
>>
--
Ben Hitz
Senior Scientific Programmer ** Saccharomyces Genome Database ** GO
Consortium
Stanford University ** hitz at genome.stanford.edu
More information about the Go
mailing list