[Annotation] filtering IEA translations?

Emily Dimmer edimmer at ebi.ac.uk
Wed Mar 18 02:43:26 PDT 2009


Hi,

All of the UniProt  external2GO mappings (UniProtKB/Swiss-Prot2GO, 
HAMAP2GO, UniProtKB/Subcellular2GO and InterPro2GO) are still developed 
and maintained so that if any incorrect annotation is created from one 
of these mappings, the mapping is removed or edited (we do _aim_ for 
100% correctness, just as David stated, even if this results in the 
removal of a mapping which is correct for the majority of proteins or 
the assignment of quite high-level/broad terms ).

Therefore if any one does see an incorrect IEA annotation being 
generated from one of these external2GO mappings, please let the GOA or 
InterPro group know (via the Annotation issues tracker on SourceForge) 
so that we can correct this, and thus remove any erroneous annotations 
may be being generated for proteins in other species.

For InterPro, matches to UniProtKB proteins by member database methods, 
including PROSITE patterns and profiles, are all considered to be TRUE 
if the score is above the individual threshold(s) given by the member 
database and are thus flagged as T; any proteins not matching an 
InterPro domain are flagged by InterPro as unknown ('?').
InterPro then additionally takes the manually curated status reports 
from Swiss-Prot/UniProtKB for PROSITE patterns and profiles, this 
manually-curated status overrides the calculated status for matches. Any 
false positive matches causes the InterPro domain to be suppressed in 
the UniProtKB entry, and is displayed in the InterPro browser as a band 
of a fainter colour (for further details please see: 
http://www.ebi.ac.uk/interpro/user_manual.html).

You can download a file called match_complete.xml from InterPro 
(ftp://ftp.ebi.ac.uk/pub/databases/interpro/match_complete.xml.gz), 
which contains all UniProtKB proteins and their match status to InterPro 
domains, including those not matching any InterPro signatures.

Harold, could you possibly let us know the InterPro identifier for the 
S6 kinase domain you have concerns about? We checked 'InterPro: 
IPR016238 Ribosomal protein S6 kinase' 
(http://www.ebi.ac.uk/interpro/IEntry?ac=IPR016238), and this maps to: 
'GO:0006468 protein amino acid phosphorylation, GO:0007165 signal 
transduction, GO:0004674 protein serine/threonine kinase activity'. As 
far as we're aware, its never been mapped to the ribosome term.

I've also included David Lonsdale, the InterPro curation coordinator, on 
this thread.

Cheers,
Emily

Harold Drabkin wrote:
> We do load and display all of the domains from both types of records, 
> but we only translate the ones from the SwissProt records; in the 
> example I have, the translation is correct but the possession of the 
> domain is usually not. We have in the past requested  both translation 
> removal, as well as question the domains when we think dubious.
> hd
>
>
> Doug wrote:
>> We start with both SwissProt and Trembl records for domain 
>> assignments...so we may pick up some dubious stuff I suppose.   It 
>> seems there are several places such an issue could be corrected:
>>
>> 1.  Don't download domain info from Trembl records.
>> 2.  Improve the protein domain model so it produces fewer false 
>> positives.
>> 3.  Remove the translation from the translation file if warranted.
>>
>> Doug Howe, Ph.D.
>> Scientific Curator
>> Zebrafish Nomenclature Coordinator
>> Zebrafish Information Network
>> 541-346-0120
>> dhowe at cs.uoregon.edu
>>
>>
>>
>> On Mar 12, 2009, at 3:10 PM, Harold Drabkin wrote:
>>
>>> We get our IEAs based on ip2go from domains contained in SwissProt 
>>> records. However, we make sure we only use the curated records and 
>>> not Trembl records. These appear to often contain domains that are 
>>> kind of weird. For example, there is an S6 kinase domain that often 
>>> appears, and then this results in getting ip2GO for ribosome, etc. 
>>> for proteins that aren't ribosomal.
>>> How are you getting the domain assignments?
>>>
>>> Harold
>>>
>>> Doug wrote:
>>>> I vaguely recall that interpro can mark domains as false positive 
>>>> hits.  The problem with that system is that I believe the domain 
>>>> hit remains but only interpro knows it is marked as a false 
>>>> positive...ie we don't get that false positive information when we 
>>>> sync our data with UniProt (our source for protein domains).  
>>>> Perhaps I am mistaken there though?
>>>>
>>>> Doug Howe, Ph.D.
>>>> Scientific Curator
>>>> Zebrafish Nomenclature Coordinator
>>>> Zebrafish Information Network
>>>> 541-346-0120
>>>> dhowe at cs.uoregon.edu
>>>>
>>>>
>>>>
>>>> On Mar 12, 2009, at 2:32 PM, David Hill wrote:
>>>>
>>>>> I think if the problem is with the mis-assignment of the domain, 
>>>>> then the translation should be kept and the interpro domain 
>>>>> assignment should be corrected.
>>>>>
>>>>>
>>>>>
>>>>> Doug wrote:
>>>>>> So in my example, I would speculate wildly that the zebrafish 
>>>>>> gene does in fact have a domain that looks very much like 
>>>>>> something that would cause beta-catenin binding, but is perhaps 
>>>>>> different enough as to not promote such binding...so the 
>>>>>> translation should be stricken from the translation file until 
>>>>>> the domain model itself can be improved so it can distinguish 
>>>>>> between domains that do and don't bind beta-catenin?
>>>>>>
>>>>>>
>>>>>> Doug Howe, Ph.D.
>>>>>> Scientific Curator
>>>>>> Zebrafish Nomenclature Coordinator
>>>>>> Zebrafish Information Network
>>>>>> 541-346-0120
>>>>>> dhowe at cs.uoregon.edu
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mar 12, 2009, at 12:41 PM, David Hill wrote:
>>>>>>
>>>>>>> I think we should remove the translation. When we originally 
>>>>>>> made these translation table, we used the very conservative rule 
>>>>>>> that if we could find an exception to the translation being 
>>>>>>> correct, we would remove it. Otherwise, erroneous data may be 
>>>>>>> generated for organisms that don't have experimental support.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> Doug wrote:
>>>>>>>> For groups that apply interpro2go, spkw2go, or ec2go 
>>>>>>>> translation files:
>>>>>>>>
>>>>>>>> If a translation from interpro2go for example takes you to a GO 
>>>>>>>> term which is directly contradictory to an experimentally 
>>>>>>>> supported annotation in your database, do you apply the IEA 
>>>>>>>> annotation or do you filter it out?
>>>>>>>>
>>>>>>>> Example:
>>>>>>>>  We have an IPI annotation to NOT beta-catenin binding and an 
>>>>>>>> IEA annotation (translation of InterPro:IPR009428) to 
>>>>>>>> 'beta-catenin binding' on our lzic gene.  Should such an IEA 
>>>>>>>> annotation be made when it conflicts with experimental 
>>>>>>>> annotations?
>>>>>>>>
>>>>>>>> I see no problem as long as the evidence code is taken into 
>>>>>>>> account...what do others think?
>>>>>>>>
>>>>>>>> -Doug
>>>>>>>>
>>>>>>>> Doug Howe, Ph.D.
>>>>>>>> Scientific Curator
>>>>>>>> Zebrafish Nomenclature Coordinator
>>>>>>>> Zebrafish Information Network
>>>>>>>> 541-346-0120
>>>>>>>> dhowe at cs.uoregon.edu
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Annotation mailing list
>>>>>>>> Annotation at geneontology.org
>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>
>>>>>>> -- 
>>>>>>> David P. Hill, Ph.D.
>>>>>>> Bioinformatics Scientist: Ontology Development
>>>>>>> Gene Ontology Consortium
>>>>>>> The Jackson Laboratory
>>>>>>> www.geneontology.org
>>>>>>> www.informatics.jax.org
>>>>>>> tel:207-288-6430
>>>>>>
>>>>>
>>>>> -- 
>>>>> David P. Hill, Ph.D.
>>>>> Bioinformatics Scientist: Ontology Development
>>>>> Gene Ontology Consortium
>>>>> The Jackson Laboratory
>>>>> www.geneontology.org
>>>>> www.informatics.jax.org
>>>>> tel:207-288-6430
>>>>
>>>> _______________________________________________
>>>> Annotation mailing list
>>>> Annotation at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>
>
> _______________________________________________
> Annotation mailing list
> Annotation at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/annotation


-- 


Do you need any additional GO annotation resources?
Which proteins would you like annotated with GO?

Let us know in the GOA User Survey, available at: http://www.ebi.ac.uk/GOA/contactus.html

------------------------------------------------------------------ 

    Emily Dimmer Ph.D.
    GOA Coordinator
    EMBL-EBI
    Wellcome Trust Genome Campus
    Hinxton
    Cambridge CB10 1SD, U.K.
    Tel:     +44 1223 494654
    Fax:    +44 1223 494468
    email:  edimmer at ebi.ac.uk
    URL:    http://www.ebi.ac.uk/goa
    



More information about the Annotation mailing list