[Annotation] filtering IEA translations?
Emily Dimmer
edimmer at ebi.ac.uk
Wed Mar 18 02:43:26 PDT 2009
Hi,
All of the UniProt external2GO mappings (UniProtKB/Swiss-Prot2GO,
HAMAP2GO, UniProtKB/Subcellular2GO and InterPro2GO) are still developed
and maintained so that if any incorrect annotation is created from one
of these mappings, the mapping is removed or edited (we do _aim_ for
100% correctness, just as David stated, even if this results in the
removal of a mapping which is correct for the majority of proteins or
the assignment of quite high-level/broad terms ).
Therefore if any one does see an incorrect IEA annotation being
generated from one of these external2GO mappings, please let the GOA or
InterPro group know (via the Annotation issues tracker on SourceForge)
so that we can correct this, and thus remove any erroneous annotations
may be being generated for proteins in other species.
For InterPro, matches to UniProtKB proteins by member database methods,
including PROSITE patterns and profiles, are all considered to be TRUE
if the score is above the individual threshold(s) given by the member
database and are thus flagged as T; any proteins not matching an
InterPro domain are flagged by InterPro as unknown ('?').
InterPro then additionally takes the manually curated status reports
from Swiss-Prot/UniProtKB for PROSITE patterns and profiles, this
manually-curated status overrides the calculated status for matches. Any
false positive matches causes the InterPro domain to be suppressed in
the UniProtKB entry, and is displayed in the InterPro browser as a band
of a fainter colour (for further details please see:
http://www.ebi.ac.uk/interpro/user_manual.html).
You can download a file called match_complete.xml from InterPro
(ftp://ftp.ebi.ac.uk/pub/databases/interpro/match_complete.xml.gz),
which contains all UniProtKB proteins and their match status to InterPro
domains, including those not matching any InterPro signatures.
Harold, could you possibly let us know the InterPro identifier for the
S6 kinase domain you have concerns about? We checked 'InterPro:
IPR016238 Ribosomal protein S6 kinase'
(http://www.ebi.ac.uk/interpro/IEntry?ac=IPR016238), and this maps to:
'GO:0006468 protein amino acid phosphorylation, GO:0007165 signal
transduction, GO:0004674 protein serine/threonine kinase activity'. As
far as we're aware, its never been mapped to the ribosome term.
I've also included David Lonsdale, the InterPro curation coordinator, on
this thread.
Cheers,
Emily
Harold Drabkin wrote:
> We do load and display all of the domains from both types of records,
> but we only translate the ones from the SwissProt records; in the
> example I have, the translation is correct but the possession of the
> domain is usually not. We have in the past requested both translation
> removal, as well as question the domains when we think dubious.
> hd
>
>
> Doug wrote:
>> We start with both SwissProt and Trembl records for domain
>> assignments...so we may pick up some dubious stuff I suppose. It
>> seems there are several places such an issue could be corrected:
>>
>> 1. Don't download domain info from Trembl records.
>> 2. Improve the protein domain model so it produces fewer false
>> positives.
>> 3. Remove the translation from the translation file if warranted.
>>
>> Doug Howe, Ph.D.
>> Scientific Curator
>> Zebrafish Nomenclature Coordinator
>> Zebrafish Information Network
>> 541-346-0120
>> dhowe at cs.uoregon.edu
>>
>>
>>
>> On Mar 12, 2009, at 3:10 PM, Harold Drabkin wrote:
>>
>>> We get our IEAs based on ip2go from domains contained in SwissProt
>>> records. However, we make sure we only use the curated records and
>>> not Trembl records. These appear to often contain domains that are
>>> kind of weird. For example, there is an S6 kinase domain that often
>>> appears, and then this results in getting ip2GO for ribosome, etc.
>>> for proteins that aren't ribosomal.
>>> How are you getting the domain assignments?
>>>
>>> Harold
>>>
>>> Doug wrote:
>>>> I vaguely recall that interpro can mark domains as false positive
>>>> hits. The problem with that system is that I believe the domain
>>>> hit remains but only interpro knows it is marked as a false
>>>> positive...ie we don't get that false positive information when we
>>>> sync our data with UniProt (our source for protein domains).
>>>> Perhaps I am mistaken there though?
>>>>
>>>> Doug Howe, Ph.D.
>>>> Scientific Curator
>>>> Zebrafish Nomenclature Coordinator
>>>> Zebrafish Information Network
>>>> 541-346-0120
>>>> dhowe at cs.uoregon.edu
>>>>
>>>>
>>>>
>>>> On Mar 12, 2009, at 2:32 PM, David Hill wrote:
>>>>
>>>>> I think if the problem is with the mis-assignment of the domain,
>>>>> then the translation should be kept and the interpro domain
>>>>> assignment should be corrected.
>>>>>
>>>>>
>>>>>
>>>>> Doug wrote:
>>>>>> So in my example, I would speculate wildly that the zebrafish
>>>>>> gene does in fact have a domain that looks very much like
>>>>>> something that would cause beta-catenin binding, but is perhaps
>>>>>> different enough as to not promote such binding...so the
>>>>>> translation should be stricken from the translation file until
>>>>>> the domain model itself can be improved so it can distinguish
>>>>>> between domains that do and don't bind beta-catenin?
>>>>>>
>>>>>>
>>>>>> Doug Howe, Ph.D.
>>>>>> Scientific Curator
>>>>>> Zebrafish Nomenclature Coordinator
>>>>>> Zebrafish Information Network
>>>>>> 541-346-0120
>>>>>> dhowe at cs.uoregon.edu
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mar 12, 2009, at 12:41 PM, David Hill wrote:
>>>>>>
>>>>>>> I think we should remove the translation. When we originally
>>>>>>> made these translation table, we used the very conservative rule
>>>>>>> that if we could find an exception to the translation being
>>>>>>> correct, we would remove it. Otherwise, erroneous data may be
>>>>>>> generated for organisms that don't have experimental support.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> Doug wrote:
>>>>>>>> For groups that apply interpro2go, spkw2go, or ec2go
>>>>>>>> translation files:
>>>>>>>>
>>>>>>>> If a translation from interpro2go for example takes you to a GO
>>>>>>>> term which is directly contradictory to an experimentally
>>>>>>>> supported annotation in your database, do you apply the IEA
>>>>>>>> annotation or do you filter it out?
>>>>>>>>
>>>>>>>> Example:
>>>>>>>> We have an IPI annotation to NOT beta-catenin binding and an
>>>>>>>> IEA annotation (translation of InterPro:IPR009428) to
>>>>>>>> 'beta-catenin binding' on our lzic gene. Should such an IEA
>>>>>>>> annotation be made when it conflicts with experimental
>>>>>>>> annotations?
>>>>>>>>
>>>>>>>> I see no problem as long as the evidence code is taken into
>>>>>>>> account...what do others think?
>>>>>>>>
>>>>>>>> -Doug
>>>>>>>>
>>>>>>>> Doug Howe, Ph.D.
>>>>>>>> Scientific Curator
>>>>>>>> Zebrafish Nomenclature Coordinator
>>>>>>>> Zebrafish Information Network
>>>>>>>> 541-346-0120
>>>>>>>> dhowe at cs.uoregon.edu
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Annotation mailing list
>>>>>>>> Annotation at geneontology.org
>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>
>>>>>>> --
>>>>>>> David P. Hill, Ph.D.
>>>>>>> Bioinformatics Scientist: Ontology Development
>>>>>>> Gene Ontology Consortium
>>>>>>> The Jackson Laboratory
>>>>>>> www.geneontology.org
>>>>>>> www.informatics.jax.org
>>>>>>> tel:207-288-6430
>>>>>>
>>>>>
>>>>> --
>>>>> David P. Hill, Ph.D.
>>>>> Bioinformatics Scientist: Ontology Development
>>>>> Gene Ontology Consortium
>>>>> The Jackson Laboratory
>>>>> www.geneontology.org
>>>>> www.informatics.jax.org
>>>>> tel:207-288-6430
>>>>
>>>> _______________________________________________
>>>> Annotation mailing list
>>>> Annotation at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>
>
> _______________________________________________
> Annotation mailing list
> Annotation at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/annotation
--
Do you need any additional GO annotation resources?
Which proteins would you like annotated with GO?
Let us know in the GOA User Survey, available at: http://www.ebi.ac.uk/GOA/contactus.html
------------------------------------------------------------------
Emily Dimmer Ph.D.
GOA Coordinator
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD, U.K.
Tel: +44 1223 494654
Fax: +44 1223 494468
email: edimmer at ebi.ac.uk
URL: http://www.ebi.ac.uk/goa
More information about the Annotation
mailing list