[Fwd: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniPro]
Michele Magrane
magrane at ebi.ac.uk
Wed Jul 19 00:37:50 PDT 2006
Dear John,
Evelyn passed on your message about self-policing of sequences in UniProt. All
sequences in the Swiss-Prot section of the UniProt Knowledgebase are manually
checked during the curation process and this involves comparing multiple reports
(if they exist) for a particular sequence as well as comparison with
orthologs/paralogs and and providing what we believe to be the most accurate
sequence. Sequences in TrEMBL which are awaiting manual curation are only as
good as what has been provided in the underlying nucleotide entry so the quality
here can vary, depending on what has been provided by the submitters of the
nucleotide sequence. However, unlike RefSeq, we don't make predictions ourselves
based on unannotated genomes. I hope this clarifies things but if you have any
questions, feel free to contact me.
Regards,
Michele.
> ---------------------------- Original Message ----------------------------
> Subject: RE: [Fwd: Updating Cow and Chicken entries to CDS in
> EMBL/DDBJ/Genabnk and UniPro From: "john young \(IAH-C\)"
> <john.young at bbsrc.ac.uk>
> Date: Thu, July 13, 2006 9:23 pm
> To: "Fiona McCarthy" <FMcCarthy at cvm.msstate.edu>
> "Evelyn Camon" <camon at ebi.ac.uk>
> "goa_curators" <goa_curators at ebi.ac.uk>
> farmanimals at genome.stanford.edu
> --------------------------------------------------------------------------
>
> Just a note.
> I agree with the importance of this. I don't know about Uniprot, but I do
> know that the refseq database does appear to have some
> self-correction
> mechanism in place when real sequences come out that are better than the
> predicted ones. For some strange reason they decided to include GNOMON
> predictions in refseq, some of which were amazingly stupid. An example was
> chicken CD86. However, when I submitted the real sequence to embl, it very
> soon replaced the silly prediction in refseq. (Unfortunately not before
> some people had referred to the silly prediction as a definitive sequence
> in reviews, revealing the fact that they cannot possibly have looked at
> the sequences! It's amazing what referees will let through these days!).
> So - maybe Uniprot could/should have some similar
> automatic
> self-policing mechanism to update silly predictions with real sequences.
> It is clearly a feasible process.
> John Young
>
> -----Original Message-----
> From: owner-farmanimals at genome.stanford.edu
> [mailto:owner-farmanimals at genome.stanford.edu] On Behalf Of Fiona
> McCarthy
> Sent: 13 July 2006 16:40
> To: Evelyn Camon; goa_curators; farmanimals at genome.stanford.edu
> Subject: Re: [Fwd: Updating Cow and Chicken entries to CDS in
> EMBL/DDBJ/Genabnk and UniPro
>
> Hi Evelyn,
>
> Thank you for raising this point. As you know this is something that we at
> AgBase have been working towards for some time now. Since not everyone may
> be aware, the problem for farm animals with sequenced genomes is that many
> of the proteins are 'predicted' based on electronic ORF prediction
> algorithms. As Evelyn has already stated these entries are initially found
> as UniParc entries rather than UniProtKB entries. The good news is that
> this situation is changing and already this year the number of chicken
> proteins not represented in UniProtKB has decreased from 70% to 50% of the
> estimated total chicken genes.
>
> I think that if we want individuals to submit a third party
> structural-genomic annotation (TPA) to either EMBL or Genbank we will need
> to make the exact procedure very clear. I have already tried this route
> with Genbank but was unable to make any progress because there was no
> clear mechanism in place to change 'predicted' entries once they had been
> submitted by the sequencing consortium. Maybe this will be easier now that
> NCBI has genome champions.
>
> Your suggestion to temporarily allow the annotation of UniParc
> identifiers
> in the protein2GO annotation tool at GOA is a good one but from my direct
> experience with UniParc IDs I can tell you that it is *very* time
> consuming for annotators to track down the UniParc IDs. I think this is
> because the UniParc database is so very large and cannot be parsed by
> species.
>
> The way the gene association files are set up the gene product
> identifier
> can be from any public database. In fact, GOA has gene association linked
> to Ensembl and Vega IDs (I did not check for Genbank IDs) so the
> precedent
> exists for using different database IDs.
>
> I think the issue is that we need
> more flexibility in the use of IDs. Since the protein2GO tool cannot
> handle every ID, the issue may be that we need to be able to map between
> different IDs and this is something that I think would benefit many
> communities.
>
> regards,
>
> Fiona
>
> Evelyn Camon <camon at ebi.ac.uk> writes:
>
> >
> >
> >-------- Original Message --------
> >Subject: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk
> and UniProtKB
> >Date: Thu, 13 Jul 2006 11:39:29 +0100
> >From: Evelyn Camon <camon at ebi.ac.uk>
> >To: farmanimals at genome.stanford.edu, Jen Clark
> ><jenclark at ebi.ac.uk>,
> > camon at ebi.ac.uk, jane at ebi.ac.uk
> >
> >Dear Farm Animal Interest Group,
> >
> >As you may already know the Gene Ontology Annotation (GOA)database at the
> EBI is the agreed supplier of GO annotation association files to
> the
> >GO consortium for the Chicken and Bovine species. As such it is our
> responsibility to ensure that we supply and integrate high quality
> experimentally verified manual GO anotation from external groups
> >(AgBase, Roslin, others) and create as complete an annotation
> >association file as possible.
> >
> >The completed bovine and chicken genome sequences are still in a
> >preliminary state in the Whole Genome Shotgun (WGS) section of the
> EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein
> coding regions have been annotated. As a result there is no way for
> UniProt to automatically create new entries using normal procedures and
> as such the sequences get archived in UniParc instead. This creates a
> problem since only UniProtKB identifiers could be annotated in GOA in the
> past.
> >
> >We have the following proposal for the farm animal communities:
> >
> >In order to get the data upgraded into UniProtKB we need
> >
> >(a) either individuals to submit a third party annotation (TPA) to either
> EMBL or Genbank where they upgrade the annotations to CDS
> (coding
> >sequence). Then the data will automatically get integrated into
> >UniProtKB/TrEMBL...
> >
> >AND/OR
> >
> >(b) Individuals or ChickGO Consortium make requests to the sequencing
> centres to update the annotation of these sequences from EST to CDS. Once
> this is done the data will enter the UniProtKB/TrEMBL by normal pipeline
> procedures and will also be available for the UniProtKB
> >curators to annotate and promote into UniProtKB/Swiss-Prot. The TrEMBL
> sequences would also then automatically inherit good quality electronic
> GO annotation from the GOA group via InterPro, Swiss-Prot keywords and
> Enzyme to GO mappings (and other future planned GO mappings to
> pathways).
> >
> >Although the above is our preferred route, we acknowledge that it might
> take some time to implement SO we also propose (c) to temporarily allow
> the annotation of UniParc identifiers in the protein2GO annotation tool
> at GOA by Roslin, EBI and AgBase staff and will also consider the
> integration of GO annotation to UniParc identifiers from external
> groups
> >if it is in keeping with GO Consortium guidelines. This is possible as
> UniParc identifiers are now STABLE and we can upgrade to UniProtKB
> accesions automatically later.
> >
> >Also to aid the farm animal proteome GO annotations we are proposing
> (d)
> >to transfer automatically, any experimentally verified GO annotation
> between species using ensembl compara(predicts orthologs). This data will
> be evidence coded as IEA(inferred from electronic annotation) and NOT
> ISS(inferred from sequence similarity) in the GOA database.
> >
> >We are very interested in hearing your opinions particularly concerning
> the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations to
> solve this problem.
> >
> >Dave and Fiona could the ChickGO Consortium get in touch with the
> sequencing centres???
> >
> >Kind regards,
> >
> >Evelyn Camon
> >
> >--
> >Evelyn Camon
> >GOA Coordinator
> >Senior Scientific Curator
> >European Bioinformatics Institute
> >Tel:01223-494465
> >Fax:01223-494468
> >E-mail: camon at ebi.ac.uk
> >URL: http://www.ebi.ac.uk/goa
> >
> >
> >--
> >Evelyn Camon
> >GOA Coordinator
> >Senior Scientific Curator
> >European Bioinformatics Institute
> >Tel:01223-494465
> >Fax:01223-494468
> >E-mail: camon at ebi.ac.uk
> >URL: http://www.ebi.ac.uk/goa
> >
>
> AgBase Biocurator
> Department of Basic Sciences
> Box 6100
> MS 39762-6100
> Mississippi State University
> USA
> Tel: (+ 1) 662 325 5859
> Fax: (+ 1) 662 325 1031
>
> http://www.agbase.msstate.edu/
--
Michele Magrane
UniProt Knowledgebase curation coordinator
EMBL Outstation - European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, U.K.
Tel: +44-1223-494656
Fax: +44-1223-494468
URL: http://www.ebi.ac.uk/
More information about the Farmanimals
mailing list