Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniProtKB

Evelyn Camon camon at ebi.ac.uk
Thu Jul 13 03:39:29 PDT 2006


Dear Farm Animal Interest Group,

As you may already know the Gene Ontology Annotation (GOA)database at 
the EBI is the agreed supplier of GO annotation association files to the 
GO consortium for the Chicken and Bovine species. As such it is our 
responsibility to ensure that we supply and integrate high quality 
experimentally verified manual GO anotation from external groups 
(AgBase, Roslin, others) and create as complete an annotation 
association file as possible.

The completed bovine and chicken genome sequences are still in a 
preliminary state in the Whole Genome Shotgun (WGS) section of the 
EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein 
coding regions have been annotated. As a result there is no way for 
UniProt to automatically create new entries using normal procedures and 
as such the sequences get archived in UniParc instead. This creates a 
problem since only UniProtKB identifiers could be annotated in GOA in 
the past.

We have the following proposal for the farm animal communities:

In order to get the data upgraded into UniProtKB we need

(a) either individuals to submit a third party annotation (TPA) to 
either EMBL or Genbank where they upgrade the annotations to CDS (coding 
sequence). Then the data will automatically get integrated into 
UniProtKB/TrEMBL...

AND/OR

(b) Individuals or ChickGO Consortium make requests to the sequencing 
centres to update the annotation of these sequences from EST to CDS. 
Once this is done the data will enter the UniProtKB/TrEMBL by normal 
pipeline procedures and will also be available for the UniProtKB 
curators to annotate and promote into UniProtKB/Swiss-Prot. The TrEMBL 
sequences would also then automatically inherit good quality electronic 
GO annotation from the GOA group via InterPro, Swiss-Prot keywords and 
Enzyme to GO mappings (and other future planned GO mappings to pathways).

Although the above is our preferred route, we acknowledge that it might 
take some time to implement SO we also propose (c) to temporarily allow 
the annotation of UniParc identifiers in the protein2GO annotation tool 
at GOA by Roslin, EBI and AgBase staff and will also consider the 
integration of GO annotation to UniParc identifiers from external groups 
if it is in keeping with GO Consortium guidelines. This is possible as 
UniParc identifiers are now STABLE and we can upgrade to UniProtKB 
accesions automatically later.

Also to aid the farm animal proteome GO annotations we are proposing (d) 
to transfer automatically, any experimentally verified GO annotation 
between species using ensembl compara(predicts orthologs). This data 
will be evidence coded as IEA(inferred from electronic annotation) and 
NOT ISS(inferred from sequence similarity) in the GOA database.

We are very interested in hearing your opinions particularly concerning 
the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations to 
solve this problem.

Dave and Fiona could the ChickGO Consortium get in touch with the 
sequencing centres???

Kind regards,

Evelyn Camon

-- 
Evelyn Camon
GOA Coordinator
Senior Scientific Curator
European Bioinformatics Institute
Tel:01223-494465
Fax:01223-494468
E-mail: camon at ebi.ac.uk
URL: http://www.ebi.ac.uk/goa




More information about the Farmanimals mailing list