Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniProtKB
Evelyn Camon
camon at ebi.ac.uk
Thu Jul 13 03:39:29 PDT 2006
Dear Farm Animal Interest Group,
As you may already know the Gene Ontology Annotation (GOA)database at
the EBI is the agreed supplier of GO annotation association files to the
GO consortium for the Chicken and Bovine species. As such it is our
responsibility to ensure that we supply and integrate high quality
experimentally verified manual GO anotation from external groups
(AgBase, Roslin, others) and create as complete an annotation
association file as possible.
The completed bovine and chicken genome sequences are still in a
preliminary state in the Whole Genome Shotgun (WGS) section of the
EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein
coding regions have been annotated. As a result there is no way for
UniProt to automatically create new entries using normal procedures and
as such the sequences get archived in UniParc instead. This creates a
problem since only UniProtKB identifiers could be annotated in GOA in
the past.
We have the following proposal for the farm animal communities:
In order to get the data upgraded into UniProtKB we need
(a) either individuals to submit a third party annotation (TPA) to
either EMBL or Genbank where they upgrade the annotations to CDS (coding
sequence). Then the data will automatically get integrated into
UniProtKB/TrEMBL...
AND/OR
(b) Individuals or ChickGO Consortium make requests to the sequencing
centres to update the annotation of these sequences from EST to CDS.
Once this is done the data will enter the UniProtKB/TrEMBL by normal
pipeline procedures and will also be available for the UniProtKB
curators to annotate and promote into UniProtKB/Swiss-Prot. The TrEMBL
sequences would also then automatically inherit good quality electronic
GO annotation from the GOA group via InterPro, Swiss-Prot keywords and
Enzyme to GO mappings (and other future planned GO mappings to pathways).
Although the above is our preferred route, we acknowledge that it might
take some time to implement SO we also propose (c) to temporarily allow
the annotation of UniParc identifiers in the protein2GO annotation tool
at GOA by Roslin, EBI and AgBase staff and will also consider the
integration of GO annotation to UniParc identifiers from external groups
if it is in keeping with GO Consortium guidelines. This is possible as
UniParc identifiers are now STABLE and we can upgrade to UniProtKB
accesions automatically later.
Also to aid the farm animal proteome GO annotations we are proposing (d)
to transfer automatically, any experimentally verified GO annotation
between species using ensembl compara(predicts orthologs). This data
will be evidence coded as IEA(inferred from electronic annotation) and
NOT ISS(inferred from sequence similarity) in the GOA database.
We are very interested in hearing your opinions particularly concerning
the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations to
solve this problem.
Dave and Fiona could the ChickGO Consortium get in touch with the
sequencing centres???
Kind regards,
Evelyn Camon
--
Evelyn Camon
GOA Coordinator
Senior Scientific Curator
European Bioinformatics Institute
Tel:01223-494465
Fax:01223-494468
E-mail: camon at ebi.ac.uk
URL: http://www.ebi.ac.uk/goa
More information about the Farmanimals
mailing list