[go] Do you upgrade InterPro2GO IEA annotations to ISS?

Valerie Wood val at sanger.ac.uk
Wed Oct 10 10:01:32 PDT 2007


Hi Emily,

I don't make a conscious effort to 'convert these to ISS', but  I have 
filtering in place so that is a manual annotation exists the annotations 
are suppressed.

If 28666 possible mappings (SPKW and Interpro)  4454 are 
retained post filtering. So most Interpro annotations  are already 
represented by an existing annotation (either experimental or from an 
ISS to an orthologs with experimental evidence).
 
Some terms which come through the filter are clearly not appropriate 
mappings and these are submitted to the SF annotation tracker
https://sourceforge.net/tracker/?group_id=36855&atid=605890
to be removed.This is usually because the mapping does not apply to the entire family 
(i.e is obviously too broad). 

Occasionally,  annotations are made  manually to a domain or family with 
ISS  Interpro (174) or Pfam  (1240). This is usually because the 
ortholog cannot be identified unambiguously, but all family 
members are considered to have this process or function.
i.e. protein kinase for PF00069 or transcription factor for PF00172.

Sometimes I will make these annotations when I have built a family for 
Pfam,  before it has been given any mappings by the Interpro database, 
so they not directly Interpro mappings as such (I am just using the 
alignment as the object in the "with" column, because it is better 
evidence than a single homolog).  I use GOC:unpublished in the ref 
column when I do this.

I guess in answer to "What data sources do you look at to confirm 
whether these electronic predictions are correct"  most of the errors 
are when a mapping has been made to a family which contains several 
subfamilies and only the members of one subfamily have been shown to 
have a particular process. It needs to reasonably conclusive that all 
members of the family are likely to have the applied mapping, so family 
size is usually a consideration. A very general rule of thumb would be 
that a large enzyme family  is unlikely to be accurately mapped to a 
very specific activity or process. However a family with uniform 
conservation can often have more granular mappings.

This process seems to work really well for identifiying mapping errors.
If you look at a few of the examples on the tracker it would probably be 
clearer.


Val




E Dimmer wrote:

> Hi,
>
> A question for those GO annotation groups that manually assess 
> InterPro2GO 'IEA' predictions, and when appropriate, convert them into 
> manual ISS annotations.
>
> I would be very grateful if you could let me know what criteria your 
> group uses to evaluate InterPro2GO annotations. What data sources do 
> you look at to confirm whether these electronic predictions are 
> correct and what information does your final annotation include? (i.e. 
> I assume that the InterPro ID would go into the 'with' column, but 
> what would be cited in the reference column - do you have a GO 
> reference?), also how long does this process take? Am I right in 
> thinking that S. Pombe and DictyBase groups carry out these kinds of 
> annotations?
>
> I have just been asked this by a group who are considering whether 
> they could carry out this kind of assessment while annotating their 
> new genome.
>
> Thanks,
> Emily
>



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Go mailing list