CC to MF links (was Re: [go] contributes_to question)

Benjamin Hitz hitz at genome.Stanford.EDU
Mon Aug 20 13:15:05 PDT 2007


>
> Ben, did you consciously shift the example from SIF2 to SIR2?

No, typo.

Ben
>
> Just to recap:
>
> SIR2 is annotated as having HD activity, and is localised to the  
> RENT complex
> 	(amongst other things)
> SIF2 is annotated as contributing to HD activity, but should not  
> be, according to Val
>
> Let's assume the latter is rectified and SIF2 is annotated to HD  
> complex but not HD activity (neither contributes_to not direct)
>
> If you want to know the known members of a specific complex in a  
> particular species, this is just standard par-for-the-course GO  
> queries.
>
> If you want to know the genes involved in HD activity you do a  
> normal GO DAG query, but do not traverse any CC-MF links. If you  
> specifically want "directly involved in HD activity", it is the  
> same query but omitting any annotations with the contributes_to  
> qualifier.
>
> I think the tricky question is whether it is a good idea to allow  
> queries of the form "show me genes involved in X activity or  
> localised to complexes that have X activity", and if so how these  
> queries should be presented to a user in a non-confusing way.
>
> I don't think there should be any debate involved on a case-by-case  
> basis - we should have rules about how information is propagated.  
> I'm not quite following your example about inferring backwards.
>
>> Maybe this is obvious, but I think much software exists which  
>> doesn't make any inferences.
>
> I place the external software that allows GO queries or GO based  
> analyses into 3 categories:
>
> [1] makes no inferences - ie no DAG traversal whatsoever
> [2] uses the DAG, but ignores the relation, and assumes information  
> can be propagated up the DAG regardless
> [3] uses the DAG and the relations in the DAG
>
> There is a scarily high amount of tools and interfaces in [1],  
> which is something we have to work on as part of our outreach, but  
> can be considered separately from the CC to MF links.
>
> The majority falls into [2], which means the CC-to-MF links should  
> be an optional extension to the main GO files. This will ensure [2]  
> will continue to work correctly without erroneous inferences, and  
> the more advanced providers can consciously use the additional  
> links to provide more advanced capabilities.
>
>>
>> Ben
>>
>> On Aug 17, 2007, at 5:01 PM, Chris Mungall wrote:
>>
>>> Related to the contributes_to question and the relations between  
>>> proteins, protein complexes and molecular functions:
>>>
>>> Currently in GO there is no explicitly asserted link between:
>>>
>>> CC - GO:0000118 histone deacetylase complex
>>> MF - GO:0004407 histone deacetylase activity
>>> BP - GO:0016575 histone deacetylation
>>>
>>> Clearly the function, process and components denoted by these terms
>>> are inter-related: the CC executes the MF, the MF catalyses the BP
>>>
>>> The parts of a whole do not necessarily inherit the function of the
>>> whole; the whole does not inherit the function of the parts; and the
>>> sibling parts of a whole do not necessarily share the same
>>> function. These kinds of rules can be stated formally so that  
>>> there is
>>> less room for confusion (just like the true path rule).
>>>
>>> I suspect that one reason annotators may be tempted to make the
>>> erroneous transitive inference and transfer the function of the  
>>> whole
>>> (complex) to the part (gene product) is because there is a perceived
>>> loss of information in *not* doing so.
>>>
>>> For example, if correct curation protocol is followed, then SIF2
>>> should not be annotated to HD Activity (MF), only to HD complex
>>> (CC). Searches for the MF "HD Activity" will exclude SIF2. This is
>>> correct behavior. However, it may be useful to have some  
>>> intuitive way
>>> of navigating from a search on "HD activity" to SIF2, by means of  
>>> the
>>> complex, so long as it is obvious that SIF2 does not inherit the
>>> function of the complex.
>>>
>>> Using the latest results from Obol, we can now link terms across GO
>>> ontologies. Links between CC and MF the relation would be labeled
>>> something like 'executes' or simply 'has function'. In a tree-type
>>> display we might show:
>>>
>>>    [i] GO:0019213 deacetylase activity
>>>      [i] GO:0033558 protein deacetylase activity
>>>       [i] GO:0004407 histone deacetylase activity    [RPD3]
>>>        [X] GO:0000118 histone deacetylase complex    [SIF2,  
>>> SPCC1235.09]
>>>         [i] GO:0000508 Rpd3L complex                 [RPD3]
>>>         [i] GO:0000509 Rpd3S complex
>>>         [i] GO:0032221 Clr6 histone deacetylase complex
>>>         ...
>>>
>>> This display correctly represents the biology, but the danger  
>>> here is
>>> that over the years we have built up an expectation in our users  
>>> that
>>> the relation label can be ignored and gene products can be  
>>> propagated
>>> up the DAG, willy-nilly. The correct way to read the DAG above is:
>>>
>>>   SIF2 is localized_to HD complex,
>>>   HD complex has_function HD activity
>>>
>>> And we can infer
>>>
>>>   SIF2 is localized_to some complex that has_function deacetylase
>>>   activity
>>>
>>> But we *cannot* infer anything about the activity of SIF2 without
>>> further evidence. We would not propagate SIF2 up in slimmers, term
>>> enrichment, gene product count summaries or any other graph based
>>> operation (a curator *may* apply their expertise and decide to make
>>> contributes_to annotations based on these CC to MF links, but this
>>> would not be automatic).
>>>
>>> This means we have to be careful about how we release these  
>>> (valuable)
>>> cross-ontology links to the public, and ensure they are not  
>>> abused. From
>>> a software perspective we are almost ready to load these kinds of
>>> links and start showing them in AmiGO, but we should proceed  
>>> carefully
>>> to make sure these kinds of relations are better understood both
>>> within GO and outside.
>>>
>>> This seems to be related to the contributes_to issue. Is this worth
>>> discussing in the same slot at the GO meeting?
>>>
>>> The (unvetted) CC to MF links are in cvs:
>>>
>>>   go/scratch/obol_results/ 
>>> cellular_component_links_to_molecular_function.obo
>>>
>>> Cheers
>>> Chris
>>>
>>> On Aug 16, 2007, at 5:19 AM, Valerie Wood wrote:
>>>
>>>> It seems we have all used it slightly differently anyway.
>>>>
>>>> But here are two 2 examples why it is bad.
>>>>
>>>> 1.
>>>> I had annotated the ortholog of  S. cerevisiae SIF2 (histone  
>>>> deacetylase complex subunit) to
>>>> histone deacetylase activity, contributes_to ISS.
>>>> It is a WD repeat protein  (which doesn't have HD activity, so  
>>>> it seems odd to attribute this function) the original SGD  
>>>> annotation is IPI.
>>>> I am now removing the pombe  annotation.
>>>>
>>>> 2.
>>>> FET3/YMR058W
>>>> is a copper oxidate involved in iron assimilation by reduction  
>>>> and transport. it isn't a transporter but it is part of the  
>>>> transporter complex.
>>>> This has an iron transporter activity (with contributes to) in  
>>>> SGD, and has been ISS's to this activity (without  
>>>> contributes_to) by two drosophila genes (FBgn0032116 and  
>>>> FBgn0039387)
>>>>
>>>> I see man many examples of this (too many to give feedback on)
>>>>
>>>> Can this go on the agenda for September meeting?
>>>>
>>>> Val
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Pascale Gaudet wrote:
>>>>
>>>>> I did mean unessential role; ie, the complex might have the  
>>>>> activity without the protein you're annotating, but adding it  
>>>>> enhances the activity (but not a regulator-- that would be  
>>>>> 'positive regulation of...'). But if adding it does nothing, I  
>>>>> would annotate to unknown.
>>>>>
>>>>> Pascale
>>>>>
>>>>> Valerie Wood wrote:
>>>>>
>>>>>> It seems so to me too, these are equivalent to process  
>>>>>> annotations
>>>>>> But did you mean essential role in the activity ? This is how  
>>>>>> I would use it.
>>>>>>
>>>>>> VAl
>>>>>>
>>>>>>
>>>>>> Pascale Gaudet <pgaudet at northwestern.edu> wrote:
>>>>>>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
>>>>>>> <html>
>>>>>>> <head>
>>>>>>>  <meta content="text/html;charset=ISO-8859-1" http- 
>>>>>>> equiv="Content-Type">
>>>>>>>  <title></title>
>>>>>>> </head>
>>>>>>> <body bgcolor="#ffffff" text="#000000">
>>>>>>> Val, <br>
>>>>>>> My understanding was that the subunit had to have at least an
>>>>>>> unessential role in the activity, although the documentation  
>>>>>>> is very
>>>>>>> ambiguous. But what you are describing is really capturing  
>>>>>>> component
>>>>>>> information with a function annotation. That seems wrong. <br>
>>>>>>> <br>
>>>>>>> Pascale<br>
>>>>>>> <br>
>>>>>>> <br>
>>>>>>> Valerie Wood wrote:
>>>>>>> <blockquote cite="mid:E1ILDTY-0006f2- 
>>>>>>> Vx at web-2-10.internal.sanger.ac.uk"
>>>>>>> type="cite">
>>>>>>>  <pre wrap="">I'm really asking the question why arbitrarily  
>>>>>>> add these function annotations to the 'unknown' subunits
>>>>>>> of complexes in the first place, when they are clearly not  
>>>>>>> the subunit that posseses the catalytic activity, or when  
>>>>>>> they clearly have another activity.
>>>>>>>
>>>>>>> Some of these complexes have
>>>>>>> ATPase activity,
>>>>>>> ubiquitin ligase activity
>>>>>>> acetyltransferase activity
>>>>>>> etc.
>>>>>>>
>>>>>>> so if this type of annotation was valid (or useful)  then we  
>>>>>>> would (presumably) add all these annotations to all subunits  
>>>>>>> for completion?
>>>>>>>
>>>>>>> Wouldn't users rather see  which subunits had known function  
>>>>>>> and which had 'unknown function'.
>>>>>>> It just seems that the qualifier is being used much more  
>>>>>>> liberally than was originally intended (i.e as a filler to  
>>>>>>> avoid adding an 'unknown' annotation)
>>>>>>>
>>>>>>> and it skews functional predictions/genome comparisons.
>>>>>>>
>>>>>>>
>>>>>>> val
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Chris Mungall <a class="moz-txt-link-rfc2396E"  
>>>>>>> href="mailto:cjm at fruitfly.org">&lt;cjm at fruitfly.org&gt;</a>  
>>>>>>> wrote:  </pre>
>>>>>>>  <blockquote type="cite">
>>>>>>>    <pre wrap="">..which like many such recommendations will  
>>>>>>> be ignored by the  majority of implementations (in this case  
>>>>>>> it is forgivable if we  issue the recommendation at this late  
>>>>>>> stage..)
>>>>>>>
>>>>>>> Perhaps any association qualified in any way should be  
>>>>>>> omitted from  the default annotations we provide. We would of  
>>>>>>> course also provide  the full annotation set but it would be  
>>>>>>> made obvious that this  'advanced' set came with certain caveats
>>>>>>>
>>>>>>> On Aug 14, 2007, at 8:00 AM, Midori Harris wrote:
>>>>>>>
>>>>>>>    </pre>
>>>>>>>    <blockquote type="cite">
>>>>>>>      <pre wrap="">Whatever we decide, I would recommend that  
>>>>>>> computational analyses  omit 'contributes_to' annotations.
>>>>>>>
>>>>>>> m
>>>>>>>
>>>>>>> On Mon, 13 Aug 2007, Valerie Wood wrote:
>>>>>>>
>>>>>>>      </pre>
>>>>>>>      <blockquote type="cite">
>>>>>>>        <pre wrap="">Recently I'm wondering recently why we  
>>>>>>> have 2 meanings for  contributes_to:
>>>>>>>
>>>>>>> When the qualifier was initially implemented, it was so  
>>>>>>> function  terms could be added to complexes like DNA  
>>>>>>> polymerase and the F1  Fo ATPase where the function cannot be  
>>>>>>> attributed to a single  subunit. This seems fine.
>>>>>>>
>>>>>>> Increasingly I see annotations to complexes which are  
>>>>>>> described as  (for example) a histone acetyltransferase  
>>>>>>> complex, and all of the  subunits are given histone de/ 
>>>>>>> acetlytransferase or  methyltransferase activity with  
>>>>>>> contributes_to, even thought the  other subunits clearly have  
>>>>>>> other functions (I see ATPases,  ubiquitin ligases actin-like  
>>>>>>> proteins etc, which are commonly  associated with histone  
>>>>>>> acetyltransferases and methyltransferases).
>>>>>>>
>>>>>>> This seems odd, for a number of reasons.
>>>>>>> Often these subunits are not required for the activity, but  
>>>>>>> their  deletion (sometimes, but not always) affects the rate   
>>>>>>> the activity
>>>>>>>
>>>>>>> Primarily I don't understand what this type of  
>>>>>>> 'contributes_to'  annotation provides  to GO users above a  
>>>>>>> process annotation to the  histone acetylation (if this has  
>>>>>>> been shown), a complex  annotation, and a function term to  
>>>>>>> unknown/root node.  Isn't it  more useful to know that there  
>>>>>>> is some information about the  process, but the molecular  
>>>>>>> function is not known?
>>>>>>>
>>>>>>> 1) Another problem is that these particular chromatin  
>>>>>>> associated  complexes often have shared subunits so the  
>>>>>>> function annotations  aren't so clear-cut (i.e some of these  
>>>>>>> subunits may be members of  other complexes which do not have  
>>>>>>> this activity)
>>>>>>>
>>>>>>> 2) Also computational analysis using RCA which infer these   
>>>>>>> 'functions' to similar proteins which, from their domain   
>>>>>>> composition it is unlikely possess this activity. 3) It  
>>>>>>> makes  cross species comparisons difficult because you get  
>>>>>>> different  numbers of functions to what you would  expect  
>>>>>>> when comparing  annotations between species. For example it  
>>>>>>> is known how many  histone acetyltransferases / 
>>>>>>> methytrasferases etc. pombe has,  compered to S. cerevisiae,  
>>>>>>> but when I compare the 2 the numbers  are skewed.
>>>>>>>
>>>>>>> The documentation clearly allows this (although there is not  
>>>>>>> an  example of this type of annotation in the documentation,  
>>>>>>> so I  wonder if this is what we meant?):
>>>>>>>
>>>>>>>        </pre>
>>>>>>>        <blockquote type="cite">
>>>>>>>          <pre wrap="">From the documentation:
>>>>>>>          </pre>
>>>>>>>        </blockquote>
>>>>>>>        <pre wrap="">
>>>>>>> Annotating individual gene products according to attributes  
>>>>>>> of a  complex is especially useful for molecular function  
>>>>>>> annotations in  cases where a complex has an activity, but  
>>>>>>> not all of the  individual subunits do. (For example, there  
>>>>>>> may be a known  catalytic subunit and one or more additional  
>>>>>>> subunits, or the  activity may only be present when the  
>>>>>>> complex is assembled.)  Molecular function annotations of  
>>>>>>> complex subunits that are not  known to possess the activity  
>>>>>>> of the complex must include the  entry contributes_to in the  
>>>>>>> Qualifier column.
>>>>>>>
>>>>>>> Note that contributes_to is not needed to annotate a  
>>>>>>> catalytic  subunit. Furthermore, contributes_to may be used  
>>>>>>> for any non- catalytic subunit, whether the subunit is  
>>>>>>> essential for the  activity of the complex or not.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        </pre>
>>>>>>>      </blockquote>
>>>>>>>    </blockquote>
>>>>>>>  </blockquote>
>>>>>>>  <pre wrap=""><!---->
>>>>>>>  </pre>
>>>>>>> </blockquote>
>>>>>>> </body>
>>>>>>> </html>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> The Wellcome Trust Sanger Institute is operated by Genome  
>>>> Research Limited, a charity registered in England with number  
>>>> 1021457 and a company registered in England with number 2742969,  
>>>> whose registered office is 215 Euston Road, London, NW1 2BE.
>>>>
>>
>> --
>> Ben Hitz
>> Senior Scientific Programmer ** Saccharomyces Genome Database **  
>> GO Consortium
>> Stanford University ** hitz at genome.stanford.edu
>>
>>
>>
>>

--
Ben Hitz
Senior Scientific Programmer ** Saccharomyces Genome Database ** GO  
Consortium
Stanford University ** hitz at genome.stanford.edu






More information about the Go mailing list