[go] Protein domain GO annotation
E Dimmer
edimmer at ebi.ac.uk
Fri Nov 2 09:37:16 PDT 2007
Hi,
I realize I'm getting nowhere fast with this suggestion(!), but I
thought it would be important to clarify how InterPro2GO works - as it
produces a large number of GO annotations and there appears to be
misunderstanding as to how these are created.
1. taking a InterPro domain, an InterPro curator looks at all the
proteins which contain this domain in UniProtKB, and will only continue
if there is an adequate number Swiss-Prot entries in this group of
proteins which have useful manual annotation.
2. the curator looks at the existing annotations to these proteins and
then asks the question - can all these proteins be safely annotated to
the same term in GO? Therefore from what is known of the uncurated
UniProtKB entries that are included in this set, is it likely that this
mapping will be correct for them also? InterPro curators then proceed
conservatively with the mapping (which is why there are often some
high-level GO terms used).
So, InterPro curators annotate at the level of the whole protein. While
InterPro2GO mappings may often hint as to what kind of contribution a
domain might have, it is not correct to assume this - e.g. if two
domains always appear together in a protein, then both may be mapped to
the same set of GO terms.
Emily
Ben Hitz wrote:
>
> The difference is whether or not the (sub) domain imparts the
> "quality" (i.e, MF) to the whole protein or not.
> An ATP binding domain imparts the function ATP binding to the protein
> that contains it.
> "Linker Region" does not impart anything to the whole.
> "Entropic Bristle", might I suppose depending on the definition.
> "Entropic Spring" really sounds like a function of the whole protein.
>
> So really there are two separate issues here: 1) whether or not we
> should annotate regions of protein ("domains") to distinct terms. 2)
> Whether or not we should add certain MF terms that are related to a
> protein domains "disorder".
>
> Interpro, in principal, already does the former. I can see a possible
> argument for the latter, but a curator would have to determine
> biological significance. (Assuming experimental evidence, yadda yadda)
>
> Ben
> On Nov 2, 2007, at 2:28 AM, E Dimmer wrote:
>
>> However there are quite a number of GO function terms which occur on
>> discrete portions of a protein sequence, for instance many of the
>> child terms of 'binding' (GO:0005488) (protein, ATP, lipid,
>> co-factor etc) and simple catalytic domains.
>>
>> I feel that there could be a mid-way point - there are GO terms that
>> can be annotated to a specific region of a sequence where it is also
>> appropriate for the function to be 'inherited' by the whole protein.
>> But also there there are domain terms which are not appropriate for a
>> whole protein - then these should go into another ontology, which
>> could be a composite of GO terms and domain-specific terms.
>>
>> So while protein domain function annotators and and gene-product
>> annotators will need to work from a different term set and add
>> different parameters to their annotations, where a domain has been
>> annotated to a GO term e.g. 'DNA binding' IDA , then we could
>> consider including these into GO.
>>
>> Would this suggestion be more acceptable to GO folk?
>>
>> Emily
>>
>>
>> Michael Ashburner wrote:
>>> I agree with Ben, this is not for the GO.
>>> Michael
>>>
>>> On 1 Nov 2007, at 14:49, Benjamin Hitz wrote:
>>>
>>>>
>>>> As resident protein structure expert, no.
>>>> Not that what they are doing is wrong, or not important - but it's
>>>> not a biological process/function/component of the gene (product)
>>>> in question.
>>>>
>>>> What's next? We annotate alpha helices?
>>>>
>>>> Ben
>>>>
>>>> On Nov 1, 2007, at 9:17 AM, E Dimmer wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Could I please ask people's opinion on the functional annotation
>>>>> of protein domains/regions to the GO?
>>>>>
>>>>> I have been contacted by a group who would like to annotate GO
>>>>> functions to identified disordered regions in proteins.
>>>>>
>>>>> The thought so far is that they would annotate to a
>>>>> 'disordered_region' SO term, along with sequence co-ordinates, and
>>>>> then also attach a GO term with a reference and evidence code.
>>>>> (I have spoken with Gabby Reeves from BioSapiens, who would be
>>>>> happy to add 'disordered_region' terms to the BioSapiens protein
>>>>> feature ontology section of SO).
>>>>>
>>>>> For an annotation example: protein LEF-1 (Q9QXN1) has a disordered
>>>>> region corresponding to residues 296 - 397. This domain has been
>>>>> found to act to bend DNA, as reported in a experiment in PMID:
>>>>> 7651541.
>>>>> In the normal course of GO annotation I would of course happily to
>>>>> annotate the whole protein (Q9QXN1) to the DNA bending term (DNA
>>>>> bending activity, GO:0008301), and while I might read about the
>>>>> discrete region in the protein that is responsible for this
>>>>> function I would not capture this data.
>>>>> However the IUP(Intrinsically Unstructured Protein) curators would
>>>>> include the aa residue information in their annotations and want
>>>>> to describe the individual functions that a protein's multiple
>>>>> domains might have.
>>>>>
>>>>> So I assume that for these kinds of annotations, where an
>>>>> equivalent GO term exists, a GOC annotation group could integrate
>>>>> this group's annotations and relate it up to the whole
>>>>> protein/gene product (and possibly being able to keep the SO term
>>>>> in the new cross-reference column 16? but not the aa residue
>>>>> location?).
>>>>>
>>>>> While the majority of the function terms that the IUP community
>>>>> are interested in applying to their domains do map quite
>>>>> straight-forwardly to GO terms, there are some new ones which
>>>>> would need to be requested. And some of these new terms seem to
>>>>> describe more domain-specific, intra-protein function. For
>>>>> example, for some of the function terms used in the DisProt database:
>>>>>
>>>>> flexible linker/spacer
>>>>> Provides separation and permits movement between adjacent domains
>>>>>
>>>>> entropic brisle
>>>>> A disordered region that creates a zone of exclusion by its
>>>>> entropic movement
>>>>>
>>>>> entropic spring
>>>>> Provides a restoring force resulting from randomization of bond
>>>>> torsion angles that become restricted upon stretching.
>>>>>
>>>>> (see: http://www.disprot.org/view_function_subclass.php)
>>>>>
>>>>> So, would GO be willing to add these types of terms? And how much
>>>>> of the IUP communities annotation data would GOC groups be happy
>>>>> to incorporate into their own annotation sets?
>>>>>
>>>>> Thanks,
>>>>> Emily
>>>>>
>>>>>
>>>>> --************************************
>>>>> Emily Dimmer
>>>>> GOA Coordinator
>>>>> EMBL-EBI
>>>>> Wellcome Trust Genome Campus
>>>>> Hinxton
>>>>> Cambridge CB10 1SD, U.K.
>>>>> Tel: +44 1223 494654
>>>>> Fax: +44 1223 494468
>>>>> email: edimmer at ebi.ac.uk
>>>>> ************************************
>>>>
>>>> --Ben Hitz
>>>> Senior Scientific Programmer ** Saccharomyces Genome Database ** GO
>>>> Consortium
>>>> Stanford University ** hitz at genome.stanford.edu
>>>>
>>>>
>>>>
>>
>>
>> --************************************
>> Emily Dimmer
>> GOA Coordinator
>> EMBL-EBI
>> Wellcome Trust Genome Campus
>> Hinxton
>> Cambridge CB10 1SD, U.K.
>> Tel: +44 1223 494654
>> Fax: +44 1223 494468
>> email: edimmer at ebi.ac.uk
>> ************************************
>
> --
> Ben Hitz
> Senior Scientific Programmer ** Saccharomyces Genome Database ** GO
> Consortium
> Stanford University ** hitz at genome.stanford.edu
>
>
--
************************************
Emily Dimmer
GOA Coordinator
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD, U.K.
Tel: +44 1223 494654
Fax: +44 1223 494468
email: edimmer at ebi.ac.uk
************************************
More information about the Go
mailing list