From camon at ebi.ac.uk Thu Jul 13 03:39:29 2006 From: camon at ebi.ac.uk (Evelyn Camon) Date: Thu, 13 Jul 2006 11:39:29 +0100 Subject: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniProtKB Message-ID: <44B622E1.3040206@ebi.ac.uk> Dear Farm Animal Interest Group, As you may already know the Gene Ontology Annotation (GOA)database at the EBI is the agreed supplier of GO annotation association files to the GO consortium for the Chicken and Bovine species. As such it is our responsibility to ensure that we supply and integrate high quality experimentally verified manual GO anotation from external groups (AgBase, Roslin, others) and create as complete an annotation association file as possible. The completed bovine and chicken genome sequences are still in a preliminary state in the Whole Genome Shotgun (WGS) section of the EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein coding regions have been annotated. As a result there is no way for UniProt to automatically create new entries using normal procedures and as such the sequences get archived in UniParc instead. This creates a problem since only UniProtKB identifiers could be annotated in GOA in the past. We have the following proposal for the farm animal communities: In order to get the data upgraded into UniProtKB we need (a) either individuals to submit a third party annotation (TPA) to either EMBL or Genbank where they upgrade the annotations to CDS (coding sequence). Then the data will automatically get integrated into UniProtKB/TrEMBL... AND/OR (b) Individuals or ChickGO Consortium make requests to the sequencing centres to update the annotation of these sequences from EST to CDS. Once this is done the data will enter the UniProtKB/TrEMBL by normal pipeline procedures and will also be available for the UniProtKB curators to annotate and promote into UniProtKB/Swiss-Prot. The TrEMBL sequences would also then automatically inherit good quality electronic GO annotation from the GOA group via InterPro, Swiss-Prot keywords and Enzyme to GO mappings (and other future planned GO mappings to pathways). Although the above is our preferred route, we acknowledge that it might take some time to implement SO we also propose (c) to temporarily allow the annotation of UniParc identifiers in the protein2GO annotation tool at GOA by Roslin, EBI and AgBase staff and will also consider the integration of GO annotation to UniParc identifiers from external groups if it is in keeping with GO Consortium guidelines. This is possible as UniParc identifiers are now STABLE and we can upgrade to UniProtKB accesions automatically later. Also to aid the farm animal proteome GO annotations we are proposing (d) to transfer automatically, any experimentally verified GO annotation between species using ensembl compara(predicts orthologs). This data will be evidence coded as IEA(inferred from electronic annotation) and NOT ISS(inferred from sequence similarity) in the GOA database. We are very interested in hearing your opinions particularly concerning the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations to solve this problem. Dave and Fiona could the ChickGO Consortium get in touch with the sequencing centres??? Kind regards, Evelyn Camon -- Evelyn Camon GOA Coordinator Senior Scientific Curator European Bioinformatics Institute Tel:01223-494465 Fax:01223-494468 E-mail: camon at ebi.ac.uk URL: http://www.ebi.ac.uk/goa From camon at ebi.ac.uk Thu Jul 13 06:19:48 2006 From: camon at ebi.ac.uk (Evelyn Camon) Date: Thu, 13 Jul 2006 14:19:48 +0100 Subject: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniProtKB References: <44B622E1.3040206@ebi.ac.uk> Message-ID: <44B64874.5030704@ebi.ac.uk> Hi We just had a discussion with Paul Kersey (IPI database)and Abel Ureta-vidal (Ensembl). At that meeting it was suggested that the IPI ID might be better than the UniParc Id for manual annotation purposes when UniProtAcessions not available. Might be easier to track accessions after sucessive sequence updates. I will investigate if that would work best. Any comments helpful at this point. cheers Evelyn Evelyn Camon wrote: > Dear Farm Animal Interest Group, > > As you may already know the Gene Ontology Annotation (GOA)database at > the EBI is the agreed supplier of GO annotation association files to the > GO consortium for the Chicken and Bovine species. As such it is our > responsibility to ensure that we supply and integrate high quality > experimentally verified manual GO anotation from external groups > (AgBase, Roslin, others) and create as complete an annotation > association file as possible. > > The completed bovine and chicken genome sequences are still in a > preliminary state in the Whole Genome Shotgun (WGS) section of the > EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein > coding regions have been annotated. As a result there is no way for > UniProt to automatically create new entries using normal procedures and > as such the sequences get archived in UniParc instead. This creates a > problem since only UniProtKB identifiers could be annotated in GOA in > the past. > > We have the following proposal for the farm animal communities: > > In order to get the data upgraded into UniProtKB we need > > (a) either individuals to submit a third party annotation (TPA) to > either EMBL or Genbank where they upgrade the annotations to CDS (coding > sequence). Then the data will automatically get integrated into > UniProtKB/TrEMBL... > > AND/OR > > (b) Individuals or ChickGO Consortium make requests to the sequencing > centres to update the annotation of these sequences from EST to CDS. > Once this is done the data will enter the UniProtKB/TrEMBL by normal > pipeline procedures and will also be available for the UniProtKB > curators to annotate and promote into UniProtKB/Swiss-Prot. The TrEMBL > sequences would also then automatically inherit good quality electronic > GO annotation from the GOA group via InterPro, Swiss-Prot keywords and > Enzyme to GO mappings (and other future planned GO mappings to pathways). > > Although the above is our preferred route, we acknowledge that it might > take some time to implement SO we also propose (c) to temporarily allow > the annotation of UniParc identifiers in the protein2GO annotation tool > at GOA by Roslin, EBI and AgBase staff and will also consider the > integration of GO annotation to UniParc identifiers from external groups > if it is in keeping with GO Consortium guidelines. This is possible as > UniParc identifiers are now STABLE and we can upgrade to UniProtKB > accesions automatically later. > > Also to aid the farm animal proteome GO annotations we are proposing (d) > to transfer automatically, any experimentally verified GO annotation > between species using ensembl compara(predicts orthologs). This data > will be evidence coded as IEA(inferred from electronic annotation) and > NOT ISS(inferred from sequence similarity) in the GOA database. > > We are very interested in hearing your opinions particularly concerning > the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations to > solve this problem. > > Dave and Fiona could the ChickGO Consortium get in touch with the > sequencing centres??? > > Kind regards, > > Evelyn Camon > -- Evelyn Camon GOA Coordinator Senior Scientific Curator European Bioinformatics Institute Tel:01223-494465 Fax:01223-494468 E-mail: camon at ebi.ac.uk URL: http://www.ebi.ac.uk/goa From FMcCarthy at cvm.msstate.edu Thu Jul 13 08:39:46 2006 From: FMcCarthy at cvm.msstate.edu (Fiona McCarthy) Date: Thu, 13 Jul 2006 10:39:46 -0500 Subject: [Fwd: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniPro In-Reply-To: <44B628B8.2060001@ebi.ac.uk> References: <44B628B8.2060001@ebi.ac.uk> Message-ID: Hi Evelyn, Thank you for raising this point. As you know this is something that we at AgBase have been working towards for some time now. Since not everyone may be aware, the problem for farm animals with sequenced genomes is that many of the proteins are ?predicted? based on electronic ORF prediction algorithms. As Evelyn has already stated these entries are initially found as UniParc entries rather than UniProtKB entries. The good news is that this situation is changing and already this year the number of chicken proteins not represented in UniProtKB has decreased from 70% to 50% of the estimated total chicken genes. I think that if we want individuals to submit a third party structural-genomic annotation (TPA) to either EMBL or Genbank we will need to make the exact procedure very clear. I have already tried this route with Genbank but was unable to make any progress because there was no clear mechanism in place to change ?predicted? entries once they had been submitted by the sequencing consortium. Maybe this will be easier now that NCBI has genome champions. Your suggestion to temporarily allow the annotation of UniParc identifiers in the protein2GO annotation tool at GOA is a good one but from my direct experience with UniParc IDs I can tell you that it is *very* time consuming for annotators to track down the UniParc IDs. I think this is because the UniParc database is so very large and cannot be parsed by species. The way the gene association files are set up the gene product identifier can be from any public database. In fact, GOA has gene association linked to Ensembl and Vega IDs (I did not check for Genbank IDs) so the precedent exists for using different database IDs. I think the issue is that we need more flexibility in the use of IDs. Since the protein2GO tool cannot handle every ID, the issue may be that we need to be able to map between different IDs and this is something that I think would benefit many communities. regards, Fiona Evelyn Camon writes: > > >-------- Original Message -------- >Subject: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk >and UniProtKB >Date: Thu, 13 Jul 2006 11:39:29 +0100 >From: Evelyn Camon >To: farmanimals at genome.stanford.edu, Jen Clark >, > camon at ebi.ac.uk, jane at ebi.ac.uk > >Dear Farm Animal Interest Group, > >As you may already know the Gene Ontology Annotation (GOA)database at >the EBI is the agreed supplier of GO annotation association files to the >GO consortium for the Chicken and Bovine species. As such it is our >responsibility to ensure that we supply and integrate high quality >experimentally verified manual GO anotation from external groups >(AgBase, Roslin, others) and create as complete an annotation >association file as possible. > >The completed bovine and chicken genome sequences are still in a >preliminary state in the Whole Genome Shotgun (WGS) section of the >EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein >coding regions have been annotated. As a result there is no way for >UniProt to automatically create new entries using normal procedures and >as such the sequences get archived in UniParc instead. This creates a >problem since only UniProtKB identifiers could be annotated in GOA in >the past. > >We have the following proposal for the farm animal communities: > >In order to get the data upgraded into UniProtKB we need > >(a) either individuals to submit a third party annotation (TPA) to >either EMBL or Genbank where they upgrade the annotations to CDS (coding >sequence). Then the data will automatically get integrated into >UniProtKB/TrEMBL... > >AND/OR > >(b) Individuals or ChickGO Consortium make requests to the sequencing >centres to update the annotation of these sequences from EST to CDS. >Once this is done the data will enter the UniProtKB/TrEMBL by normal >pipeline procedures and will also be available for the UniProtKB >curators to annotate and promote into UniProtKB/Swiss-Prot. The TrEMBL >sequences would also then automatically inherit good quality electronic >GO annotation from the GOA group via InterPro, Swiss-Prot keywords and >Enzyme to GO mappings (and other future planned GO mappings to pathways). > >Although the above is our preferred route, we acknowledge that it might >take some time to implement SO we also propose (c) to temporarily allow >the annotation of UniParc identifiers in the protein2GO annotation tool >at GOA by Roslin, EBI and AgBase staff and will also consider the >integration of GO annotation to UniParc identifiers from external groups >if it is in keeping with GO Consortium guidelines. This is possible as >UniParc identifiers are now STABLE and we can upgrade to UniProtKB >accesions automatically later. > >Also to aid the farm animal proteome GO annotations we are proposing (d) >to transfer automatically, any experimentally verified GO annotation >between species using ensembl compara(predicts orthologs). This data >will be evidence coded as IEA(inferred from electronic annotation) and >NOT ISS(inferred from sequence similarity) in the GOA database. > >We are very interested in hearing your opinions particularly concerning >the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations to >solve this problem. > >Dave and Fiona could the ChickGO Consortium get in touch with the >sequencing centres??? > >Kind regards, > >Evelyn Camon > >-- >Evelyn Camon >GOA Coordinator >Senior Scientific Curator >European Bioinformatics Institute >Tel:01223-494465 >Fax:01223-494468 >E-mail: camon at ebi.ac.uk >URL: http://www.ebi.ac.uk/goa > > >-- >Evelyn Camon >GOA Coordinator >Senior Scientific Curator >European Bioinformatics Institute >Tel:01223-494465 >Fax:01223-494468 >E-mail: camon at ebi.ac.uk >URL: http://www.ebi.ac.uk/goa > AgBase Biocurator Department of Basic Sciences Box 6100 MS 39762-6100 Mississippi State University USA Tel: (+ 1) 662 325 5859 Fax: (+ 1) 662 325 1031 http://www.agbase.msstate.edu/ From c-elsik at neo.tamu.edu Thu Jul 13 09:42:29 2006 From: c-elsik at neo.tamu.edu (Elsik, Christine G) Date: Thu, 13 Jul 2006 16:42:29 -0000 Subject: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniProtKB In-Reply-To: <44B622E1.3040206@ebi.ac.uk> Message-ID: <200607131642.k6DGgT5j056486@xyzzy-4.tamu.edu> Hi Evelyn, I'm working with Baylor on bovine genome analysis and annotation. They haven't yet released the ~8X assembly on which the automated gene predictions and manual gene model annotation will be done. This assembly should be available within weeks, and then Ensembl and NCBI will run their automated gene prediction pipelines. We will be creating a single consensus gene set, doing some community gene model annotation, and submitting the final gene set to GenBank as features on the assembly probably sometime in early 2007. I would suggest that people wait for the NCBI and Ensembl predictions on the 8X assembly. Those will probably be available early this Fall. If people want to get started on those, we can then later easily transfer the GO annotations to the consensus gene set, because we will know which consensus gene model is associated with each Ensembl and RefSeq gene model. all the best, Chris Elsik Evelyn Camon said: > Dear Farm Animal Interest Group, > > As you may already know the Gene Ontology Annotation (GOA)database at > the EBI is the agreed supplier of GO annotation association files to the > GO consortium for the Chicken and Bovine species. As such it is our > responsibility to ensure that we supply and integrate high quality > experimentally verified manual GO anotation from external groups > (AgBase, Roslin, others) and create as complete an annotation > association file as possible. > > The completed bovine and chicken genome sequences are still in a > preliminary state in the Whole Genome Shotgun (WGS) section of the > EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein > coding regions have been annotated. As a result there is no way for > UniProt to automatically create new entries using normal procedures and > as such the sequences get archived in UniParc instead. This creates a > problem since only UniProtKB identifiers could be annotated in GOA in > the past. > > We have the following proposal for the farm animal communities: > > In order to get the data upgraded into UniProtKB we need > > (a) either individuals to submit a third party annotation (TPA) to > either EMBL or Genbank where they upgrade the annotations to CDS (coding > sequence). Then the data will automatically get integrated into > UniProtKB/TrEMBL... > > AND/OR > > (b) Individuals or ChickGO Consortium make requests to the sequencing > centres to update the annotation of these sequences from EST to CDS. > Once this is done the data will enter the UniProtKB/TrEMBL by normal > pipeline procedures and will also be available for the UniProtKB > curators to annotate and promote into UniProtKB/Swiss-Prot. The TrEMBL > sequences would also then automatically inherit good quality electronic > GO annotation from the GOA group via InterPro, Swiss-Prot keywords and > Enzyme to GO mappings (and other future planned GO mappings to pathways). > > Although the above is our preferred route, we acknowledge that it might > take some time to implement SO we also propose (c) to temporarily allow > the annotation of UniParc identifiers in the protein2GO annotation tool > at GOA by Roslin, EBI and AgBase staff and will also consider the > integration of GO annotation to UniParc identifiers from external groups > if it is in keeping with GO Consortium guidelines. This is possible as > UniParc identifiers are now STABLE and we can upgrade to UniProtKB > accesions automatically later. > > Also to aid the farm animal proteome GO annotations we are proposing (d) > to transfer automatically, any experimentally verified GO annotation > between species using ensembl compara(predicts orthologs). This data > will be evidence coded as IEA(inferred from electronic annotation) and > NOT ISS(inferred from sequence similarity) in the GOA database. > > We are very interested in hearing your opinions particularly concerning > the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations to > solve this problem. > > Dave and Fiona could the ChickGO Consortium get in touch with the > sequencing centres??? > > Kind regards, > > Evelyn Camon > > -- > Evelyn Camon > GOA Coordinator > Senior Scientific Curator > European Bioinformatics Institute > Tel:01223-494465 > Fax:01223-494468 > E-mail: camon at ebi.ac.uk > URL: http://www.ebi.ac.uk/goa > > -- Christine Elsik Department of Animal Science Texas A&M University 2471 TAMU College Station, TX 77843-2471 phone 979-845-2618 fax 979-845-6970 From john.young at bbsrc.ac.uk Thu Jul 13 13:23:26 2006 From: john.young at bbsrc.ac.uk (john young (IAH-C)) Date: Thu, 13 Jul 2006 21:23:26 +0100 Subject: [Fwd: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniPro In-Reply-To: Message-ID: <8975119BCD0AC5419D61A9CF1A923E95017FFCD4@iahce2ksrv1.iah.bbsrc.ac.uk> Just a note. I agree with the importance of this. I don't know about Uniprot, but I do know that the refseq database does appear to have some self-correction mechanism in place when real sequences come out that are better than the predicted ones. For some strange reason they decided to include GNOMON predictions in refseq, some of which were amazingly stupid. An example was chicken CD86. However, when I submitted the real sequence to embl, it very soon replaced the silly prediction in refseq. (Unfortunately not before some people had referred to the silly prediction as a definitive sequence in reviews, revealing the fact that they cannot possibly have looked at the sequences! It's amazing what referees will let through these days!). So - maybe Uniprot could/should have some similar automatic self-policing mechanism to update silly predictions with real sequences. It is clearly a feasible process. John Young -----Original Message----- From: owner-farmanimals at genome.stanford.edu [mailto:owner-farmanimals at genome.stanford.edu] On Behalf Of Fiona McCarthy Sent: 13 July 2006 16:40 To: Evelyn Camon; goa_curators; farmanimals at genome.stanford.edu Subject: Re: [Fwd: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniPro Hi Evelyn, Thank you for raising this point. As you know this is something that we at AgBase have been working towards for some time now. Since not everyone may be aware, the problem for farm animals with sequenced genomes is that many of the proteins are 'predicted' based on electronic ORF prediction algorithms. As Evelyn has already stated these entries are initially found as UniParc entries rather than UniProtKB entries. The good news is that this situation is changing and already this year the number of chicken proteins not represented in UniProtKB has decreased from 70% to 50% of the estimated total chicken genes. I think that if we want individuals to submit a third party structural-genomic annotation (TPA) to either EMBL or Genbank we will need to make the exact procedure very clear. I have already tried this route with Genbank but was unable to make any progress because there was no clear mechanism in place to change 'predicted' entries once they had been submitted by the sequencing consortium. Maybe this will be easier now that NCBI has genome champions. Your suggestion to temporarily allow the annotation of UniParc identifiers in the protein2GO annotation tool at GOA is a good one but from my direct experience with UniParc IDs I can tell you that it is *very* time consuming for annotators to track down the UniParc IDs. I think this is because the UniParc database is so very large and cannot be parsed by species. The way the gene association files are set up the gene product identifier can be from any public database. In fact, GOA has gene association linked to Ensembl and Vega IDs (I did not check for Genbank IDs) so the precedent exists for using different database IDs. I think the issue is that we need more flexibility in the use of IDs. Since the protein2GO tool cannot handle every ID, the issue may be that we need to be able to map between different IDs and this is something that I think would benefit many communities. regards, Fiona Evelyn Camon writes: > > >-------- Original Message -------- >Subject: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk >and UniProtKB >Date: Thu, 13 Jul 2006 11:39:29 +0100 >From: Evelyn Camon >To: farmanimals at genome.stanford.edu, Jen Clark >, > camon at ebi.ac.uk, jane at ebi.ac.uk > >Dear Farm Animal Interest Group, > >As you may already know the Gene Ontology Annotation (GOA)database at >the EBI is the agreed supplier of GO annotation association files to the >GO consortium for the Chicken and Bovine species. As such it is our >responsibility to ensure that we supply and integrate high quality >experimentally verified manual GO anotation from external groups >(AgBase, Roslin, others) and create as complete an annotation >association file as possible. > >The completed bovine and chicken genome sequences are still in a >preliminary state in the Whole Genome Shotgun (WGS) section of the >EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein >coding regions have been annotated. As a result there is no way for >UniProt to automatically create new entries using normal procedures and >as such the sequences get archived in UniParc instead. This creates a >problem since only UniProtKB identifiers could be annotated in GOA in >the past. > >We have the following proposal for the farm animal communities: > >In order to get the data upgraded into UniProtKB we need > >(a) either individuals to submit a third party annotation (TPA) to >either EMBL or Genbank where they upgrade the annotations to CDS (coding >sequence). Then the data will automatically get integrated into >UniProtKB/TrEMBL... > >AND/OR > >(b) Individuals or ChickGO Consortium make requests to the sequencing >centres to update the annotation of these sequences from EST to CDS. >Once this is done the data will enter the UniProtKB/TrEMBL by normal >pipeline procedures and will also be available for the UniProtKB >curators to annotate and promote into UniProtKB/Swiss-Prot. The TrEMBL >sequences would also then automatically inherit good quality electronic >GO annotation from the GOA group via InterPro, Swiss-Prot keywords and >Enzyme to GO mappings (and other future planned GO mappings to pathways). > >Although the above is our preferred route, we acknowledge that it might >take some time to implement SO we also propose (c) to temporarily allow >the annotation of UniParc identifiers in the protein2GO annotation tool >at GOA by Roslin, EBI and AgBase staff and will also consider the >integration of GO annotation to UniParc identifiers from external groups >if it is in keeping with GO Consortium guidelines. This is possible as >UniParc identifiers are now STABLE and we can upgrade to UniProtKB >accesions automatically later. > >Also to aid the farm animal proteome GO annotations we are proposing (d) >to transfer automatically, any experimentally verified GO annotation >between species using ensembl compara(predicts orthologs). This data >will be evidence coded as IEA(inferred from electronic annotation) and >NOT ISS(inferred from sequence similarity) in the GOA database. > >We are very interested in hearing your opinions particularly concerning >the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations to >solve this problem. > >Dave and Fiona could the ChickGO Consortium get in touch with the >sequencing centres??? > >Kind regards, > >Evelyn Camon > >-- >Evelyn Camon >GOA Coordinator >Senior Scientific Curator >European Bioinformatics Institute >Tel:01223-494465 >Fax:01223-494468 >E-mail: camon at ebi.ac.uk >URL: http://www.ebi.ac.uk/goa > > >-- >Evelyn Camon >GOA Coordinator >Senior Scientific Curator >European Bioinformatics Institute >Tel:01223-494465 >Fax:01223-494468 >E-mail: camon at ebi.ac.uk >URL: http://www.ebi.ac.uk/goa > AgBase Biocurator Department of Basic Sciences Box 6100 MS 39762-6100 Mississippi State University USA Tel: (+ 1) 662 325 5859 Fax: (+ 1) 662 325 1031 http://www.agbase.msstate.edu/ From FMcCarthy at cvm.msstate.edu Thu Jul 13 16:17:39 2006 From: FMcCarthy at cvm.msstate.edu (Fiona McCarthy) Date: Thu, 13 Jul 2006 18:17:39 -0500 Subject: [Goa_curators] Re: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genab In-Reply-To: <44B64874.5030704@ebi.ac.uk> References: <44B622E1.3040206@ebi.ac.uk> <,> <44B64874.5030704@ebi.ac.uk> Message-ID: Hi Evelyn, I really like the IPI idea, I have found the IPI database very useful in a lot of our work. For this reason I have tried to attach IPI accessions to many of the 'predicted' proteins that we were initially looking for in UniParc but I found that not all of the 'predicted' proteins were in IPI. Could we state that when UniProtKB is unavailable, our first priority will be to use IPI and then only if the gene product is not in IPI to use another database ID? Fiona Evelyn Camon writes: >Hi > >We just had a discussion with Paul Kersey (IPI database)and Abel >Ureta-vidal (Ensembl). At that meeting it was suggested that the IPI ID >might be better than the UniParc Id for manual annotation purposes when >UniProtAcessions not available. Might be easier to track accessions >after sucessive sequence updates. >I will investigate if that would work best. Any comments helpful at this >point. > >cheers >Evelyn > >Evelyn Camon wrote: >> Dear Farm Animal Interest Group, >> >> As you may already know the Gene Ontology Annotation (GOA)database >at >> the EBI is the agreed supplier of GO annotation association files to >the >> GO consortium for the Chicken and Bovine species. As such it is our >> responsibility to ensure that we supply and integrate high quality >> experimentally verified manual GO anotation from external groups >> (AgBase, Roslin, others) and create as complete an annotation >> association file as possible. >> >> The completed bovine and chicken genome sequences are still in a >> preliminary state in the Whole Genome Shotgun (WGS) section of the >> EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein >> coding regions have been annotated. As a result there is no way for >> UniProt to automatically create new entries using normal procedures >and >> as such the sequences get archived in UniParc instead. This creates >a >> problem since only UniProtKB identifiers could be annotated in GOA >in >> the past. >> >> We have the following proposal for the farm animal communities: >> >> In order to get the data upgraded into UniProtKB we need >> >> (a) either individuals to submit a third party annotation (TPA) to >> either EMBL or Genbank where they upgrade the annotations to CDS >(coding >> sequence). Then the data will automatically get integrated into >> UniProtKB/TrEMBL... >> >> AND/OR >> >> (b) Individuals or ChickGO Consortium make requests to the >sequencing >> centres to update the annotation of these sequences from EST to CDS. >> Once this is done the data will enter the UniProtKB/TrEMBL by normal >> pipeline procedures and will also be available for the UniProtKB >> curators to annotate and promote into UniProtKB/Swiss-Prot. The >TrEMBL >> sequences would also then automatically inherit good quality >electronic >> GO annotation from the GOA group via InterPro, Swiss-Prot keywords >and >> Enzyme to GO mappings (and other future planned GO mappings to >pathways). >> >> Although the above is our preferred route, we acknowledge that it >might >> take some time to implement SO we also propose (c) to temporarily >allow >> the annotation of UniParc identifiers in the protein2GO annotation >tool >> at GOA by Roslin, EBI and AgBase staff and will also consider the >> integration of GO annotation to UniParc identifiers from external >groups >> if it is in keeping with GO Consortium guidelines. This is possible >as >> UniParc identifiers are now STABLE and we can upgrade to UniProtKB >> accesions automatically later. >> >> Also to aid the farm animal proteome GO annotations we are proposing >(d) >> to transfer automatically, any experimentally verified GO annotation >> between species using ensembl compara(predicts orthologs). This data >> will be evidence coded as IEA(inferred from electronic annotation) >and >> NOT ISS(inferred from sequence similarity) in the GOA database. >> >> We are very interested in hearing your opinions particularly >concerning >> the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations >to >> solve this problem. >> >> Dave and Fiona could the ChickGO Consortium get in touch with the >> sequencing centres??? >> >> Kind regards, >> >> Evelyn Camon >> > > >-- >Evelyn Camon >GOA Coordinator >Senior Scientific Curator >European Bioinformatics Institute >Tel:01223-494465 >Fax:01223-494468 >E-mail: camon at ebi.ac.uk >URL: http://www.ebi.ac.uk/goa > >_______________________________________________ >Goa_curators mailing list >Goa_curators at ebi.ac.uk >http://listserver.ebi.ac.uk/mailman/listinfo/goa_curators AgBase Biocurator Department of Basic Sciences Box 6100 MS 39762-6100 Mississippi State University USA Tel: (+ 1) 662 325 5859 Fax: (+ 1) 662 325 1031 http://www.agbase.msstate.edu/ From camon at ebi.ac.uk Fri Jul 14 02:14:20 2006 From: camon at ebi.ac.uk (camon at ebi.ac.uk) Date: Fri, 14 Jul 2006 10:14:20 +0100 (BST) Subject: [Goa_curators] Re: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genab In-Reply-To: References: <44B622E1.3040206@ebi.ac.uk> <,> <44B64874.5030704@ebi.ac.uk> Message-ID: <1309.88.106.36.50.1152868460.squirrel@webmail.ebi.ac.uk> Hi Fiona, Thanks for the mails. Yes I agree I will talk to David Binns(GOA tool developer) about upgrading the GOA annotation tool to permit the manual GO annotation of IPI ids related to chicken and cow sequences so that the curators can continue capturing information in the literature. Could you report via e-mail the gene products that don't have IPI ids to goa_curators at ebi.ac.uk. Paul Kersey seemed to believe they should all pretty much have IPI ids. Of course not all authors of scientific papers submit their sequences to genbank/embl too which is another problem. But we can submit these to the journal scanning curator at UniProtKB. cheers, Evelyn > Hi Evelyn, > > I really like the IPI idea, I have found the IPI database very useful in a > lot of our work. For this reason I have tried to attach IPI accessions to > many of the 'predicted' proteins that we were initially looking for in > UniParc but I found that not all of the 'predicted' proteins were in IPI. > > Could we state that when UniProtKB is unavailable, our first priority will > be to use IPI and then only if the gene product is not in IPI to use > another database ID? > > Fiona > > > > Evelyn Camon writes: > > >>Hi >> >>We just had a discussion with Paul Kersey (IPI database)and Abel >>Ureta-vidal (Ensembl). At that meeting it was suggested that the IPI > ID >>might be better than the UniParc Id for manual annotation purposes > when >>UniProtAcessions not available. Might be easier to track accessions >>after sucessive sequence updates. >>I will investigate if that would work best. Any comments helpful at > this >>point. >> >>cheers >>Evelyn >> >>Evelyn Camon wrote: >>> Dear Farm Animal Interest Group, >>> >>> As you may already know the Gene Ontology Annotation (GOA)database >>at >>> the EBI is the agreed supplier of GO annotation association files > to >>the >>> GO consortium for the Chicken and Bovine species. As such it is > our >>> responsibility to ensure that we supply and integrate high > quality >>> experimentally verified manual GO anotation from external groups >>> (AgBase, Roslin, others) and create as complete an annotation >>> association file as possible. >>> >>> The completed bovine and chicken genome sequences are still in a >>> preliminary state in the Whole Genome Shotgun (WGS) section of > the >>> EMBL/GenBank/DDBJ nucleotide databases and therfore not all > protein >>> coding regions have been annotated. As a result there is no way > for >>> UniProt to automatically create new entries using normal > procedures >>and >>> as such the sequences get archived in UniParc instead. This > creates >>a >>> problem since only UniProtKB identifiers could be annotated in GOA >>in >>> the past. >>> >>> We have the following proposal for the farm animal communities: >>> >>> In order to get the data upgraded into UniProtKB we need >>> >>> (a) either individuals to submit a third party annotation (TPA) > to >>> either EMBL or Genbank where they upgrade the annotations to CDS >>(coding >>> sequence). Then the data will automatically get integrated into >>> UniProtKB/TrEMBL... >>> >>> AND/OR >>> >>> (b) Individuals or ChickGO Consortium make requests to the >>sequencing >>> centres to update the annotation of these sequences from EST to > CDS. >>> Once this is done the data will enter the UniProtKB/TrEMBL by > normal >>> pipeline procedures and will also be available for the UniProtKB >>> curators to annotate and promote into UniProtKB/Swiss-Prot. The >>TrEMBL >>> sequences would also then automatically inherit good quality >>electronic >>> GO annotation from the GOA group via InterPro, Swiss-Prot keywords >>and >>> Enzyme to GO mappings (and other future planned GO mappings to >>pathways). >>> >>> Although the above is our preferred route, we acknowledge that it >>might >>> take some time to implement SO we also propose (c) to temporarily >>allow >>> the annotation of UniParc identifiers in the protein2GO annotation >>tool >>> at GOA by Roslin, EBI and AgBase staff and will also consider the >>> integration of GO annotation to UniParc identifiers from external >>groups >>> if it is in keeping with GO Consortium guidelines. This is > possible >>as >>> UniParc identifiers are now STABLE and we can upgrade to > UniProtKB >>> accesions automatically later. >>> >>> Also to aid the farm animal proteome GO annotations we are > proposing >>(d) >>> to transfer automatically, any experimentally verified GO > annotation >>> between species using ensembl compara(predicts orthologs). This > data >>> will be evidence coded as IEA(inferred from electronic annotation) >>and >>> NOT ISS(inferred from sequence similarity) in the GOA database. >>> >>> We are very interested in hearing your opinions particularly >>concerning >>> the TPA route of updating the EMBL/GenBank/DDBJ sequence > annotations >>to >>> solve this problem. >>> >>> Dave and Fiona could the ChickGO Consortium get in touch with the >>> sequencing centres??? >>> >>> Kind regards, >>> >>> Evelyn Camon >>> >> >> >>-- >>Evelyn Camon >>GOA Coordinator >>Senior Scientific Curator >>European Bioinformatics Institute >>Tel:01223-494465 >>Fax:01223-494468 >>E-mail: camon at ebi.ac.uk >>URL: http://www.ebi.ac.uk/goa >> >>_______________________________________________ >>Goa_curators mailing list >>Goa_curators at ebi.ac.uk >>http://listserver.ebi.ac.uk/mailman/listinfo/goa_curators > > > > AgBase Biocurator > Department of Basic Sciences > Box 6100 > MS 39762-6100 > Mississippi State University > USA > Tel: (+ 1) 662 325 5859 > Fax: (+ 1) 662 325 1031 > > http://www.agbase.msstate.edu/ > > > > > > > From camon at ebi.ac.uk Fri Jul 14 02:24:09 2006 From: camon at ebi.ac.uk (camon at ebi.ac.uk) Date: Fri, 14 Jul 2006 10:24:09 +0100 (BST) Subject: [Fwd: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniPro In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95017FFCD4@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E95017FFCD4@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <1534.88.106.36.50.1152869049.squirrel@webmail.ebi.ac.uk> Hi John, I will raise your concerns with both EMBL and UniProtKB here at EBI. CD86 is a nice example to show them and I will look back in the EMBL sequence archive to see how major the differences were between the REAL and PREDICTED sequence. Both EMBL and UniProtKB encourage feedback from the community on updating the anotation and sequence of nucleotide sequences via TPA. http://www.ebi.ac.uk/webin/webin_help.html?/webin/home.html#tpa and to update UniProtKB entries via their webpages: http://www.ebi.ac.uk/swissprot/Submissions/spin/index.jsp http://www.ebi.uniprot.org/support/helpdesk.shtml There is a UniProt Consortium meeting coming up in August and I will pass on your issues. cheers, Evelyn > Just a note. > I agree with the importance of this. I don't know about Uniprot, but I > do know that the refseq database does appear to have some > self-correction > mechanism in place when real sequences come out that are better than the > predicted ones. For some strange reason they decided to include GNOMON > predictions in refseq, some of which were amazingly stupid. An example > was chicken CD86. However, when I submitted the real sequence to embl, > it very soon replaced the silly prediction in refseq. (Unfortunately not > before some people had referred to the silly prediction as a definitive > sequence in reviews, revealing the fact that they cannot possibly have > looked at the sequences! It's amazing what referees will let through > these days!). So - maybe Uniprot could/should have some similar > automatic > self-policing mechanism to update silly predictions with real sequences. > It is clearly a feasible process. > John Young > > > > -----Original Message----- > From: owner-farmanimals at genome.stanford.edu > [mailto:owner-farmanimals at genome.stanford.edu] On Behalf Of Fiona > McCarthy > Sent: 13 July 2006 16:40 > To: Evelyn Camon; goa_curators; farmanimals at genome.stanford.edu > Subject: Re: [Fwd: Updating Cow and Chicken entries to CDS in > EMBL/DDBJ/Genabnk and UniPro > > > Hi Evelyn, > > Thank you for raising this point. As you know this is something that we > at > AgBase have been working towards for some time now. Since not everyone > may > be aware, the problem for farm animals with sequenced genomes is that > many > of the proteins are 'predicted' based on electronic ORF prediction > algorithms. As Evelyn has already stated these entries are initially > found > as UniParc entries rather than UniProtKB entries. The good news is that > this situation is changing and already this year the number of chicken > proteins not represented in UniProtKB has decreased from 70% to 50% of > the > estimated total chicken genes. > > I think that if we want individuals to submit a third party > structural-genomic annotation (TPA) to either EMBL or Genbank we will > need > to make the exact procedure very clear. I have already tried this route > with Genbank but was unable to make any progress because there was no > clear mechanism in place to change 'predicted' entries once they had > been > submitted by the sequencing consortium. Maybe this will be easier now > that > NCBI has genome champions. > > Your suggestion to temporarily allow the annotation of UniParc > identifiers > in the protein2GO annotation tool at GOA is a good one but from my > direct > experience with UniParc IDs I can tell you that it is *very* time > consuming for annotators to track down the UniParc IDs. I think this is > because the UniParc database is so very large and cannot be parsed by > species. > > The way the gene association files are set up the gene product > identifier > can be from any public database. In fact, GOA has gene association > linked > to Ensembl and Vega IDs (I did not check for Genbank IDs) so the > precedent > exists for using different database IDs. > > I think the issue is that we need > more flexibility in the use of IDs. Since the protein2GO tool cannot > handle every ID, the issue may be that we need to be able to map between > different IDs and this is something that I think would benefit many > communities. > > regards, > > Fiona > > Evelyn Camon writes: > > >> >> >>-------- Original Message -------- >>Subject: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk >>and UniProtKB >>Date: Thu, 13 Jul 2006 11:39:29 +0100 >>From: Evelyn Camon >>To: farmanimals at genome.stanford.edu, Jen Clark >>, >> camon at ebi.ac.uk, jane at ebi.ac.uk >> >>Dear Farm Animal Interest Group, >> >>As you may already know the Gene Ontology Annotation (GOA)database at >>the EBI is the agreed supplier of GO annotation association files to > the >>GO consortium for the Chicken and Bovine species. As such it is our >>responsibility to ensure that we supply and integrate high quality >>experimentally verified manual GO anotation from external groups >>(AgBase, Roslin, others) and create as complete an annotation >>association file as possible. >> >>The completed bovine and chicken genome sequences are still in a >>preliminary state in the Whole Genome Shotgun (WGS) section of the >>EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein >>coding regions have been annotated. As a result there is no way for >>UniProt to automatically create new entries using normal procedures and >>as such the sequences get archived in UniParc instead. This creates a >>problem since only UniProtKB identifiers could be annotated in GOA in >>the past. >> >>We have the following proposal for the farm animal communities: >> >>In order to get the data upgraded into UniProtKB we need >> >>(a) either individuals to submit a third party annotation (TPA) to >>either EMBL or Genbank where they upgrade the annotations to CDS > (coding >>sequence). Then the data will automatically get integrated into >>UniProtKB/TrEMBL... >> >>AND/OR >> >>(b) Individuals or ChickGO Consortium make requests to the sequencing >>centres to update the annotation of these sequences from EST to CDS. >>Once this is done the data will enter the UniProtKB/TrEMBL by normal >>pipeline procedures and will also be available for the UniProtKB >>curators to annotate and promote into UniProtKB/Swiss-Prot. The TrEMBL >>sequences would also then automatically inherit good quality electronic >>GO annotation from the GOA group via InterPro, Swiss-Prot keywords and >>Enzyme to GO mappings (and other future planned GO mappings to > pathways). >> >>Although the above is our preferred route, we acknowledge that it might >>take some time to implement SO we also propose (c) to temporarily allow >>the annotation of UniParc identifiers in the protein2GO annotation tool >>at GOA by Roslin, EBI and AgBase staff and will also consider the >>integration of GO annotation to UniParc identifiers from external > groups >>if it is in keeping with GO Consortium guidelines. This is possible as >>UniParc identifiers are now STABLE and we can upgrade to UniProtKB >>accesions automatically later. >> >>Also to aid the farm animal proteome GO annotations we are proposing > (d) >>to transfer automatically, any experimentally verified GO annotation >>between species using ensembl compara(predicts orthologs). This data >>will be evidence coded as IEA(inferred from electronic annotation) and >>NOT ISS(inferred from sequence similarity) in the GOA database. >> >>We are very interested in hearing your opinions particularly concerning >>the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations to >>solve this problem. >> >>Dave and Fiona could the ChickGO Consortium get in touch with the >>sequencing centres??? >> >>Kind regards, >> >>Evelyn Camon >> >>-- >>Evelyn Camon >>GOA Coordinator >>Senior Scientific Curator >>European Bioinformatics Institute >>Tel:01223-494465 >>Fax:01223-494468 >>E-mail: camon at ebi.ac.uk >>URL: http://www.ebi.ac.uk/goa >> >> >>-- >>Evelyn Camon >>GOA Coordinator >>Senior Scientific Curator >>European Bioinformatics Institute >>Tel:01223-494465 >>Fax:01223-494468 >>E-mail: camon at ebi.ac.uk >>URL: http://www.ebi.ac.uk/goa >> > > > > AgBase Biocurator > Department of Basic Sciences > Box 6100 > MS 39762-6100 > Mississippi State University > USA > Tel: (+ 1) 662 325 5859 > Fax: (+ 1) 662 325 1031 > > http://www.agbase.msstate.edu/ > > > > > > > > > From camon at ebi.ac.uk Fri Jul 14 03:19:59 2006 From: camon at ebi.ac.uk (camon at ebi.ac.uk) Date: Fri, 14 Jul 2006 11:19:59 +0100 (BST) Subject: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniProtKB In-Reply-To: <200607131642.k6DGgT5j056486@xyzzy-4.tamu.edu> References: <44B622E1.3040206@ebi.ac.uk> <200607131642.k6DGgT5j056486@xyzzy-4.tamu.edu> Message-ID: <2063.88.106.36.50.1152872399.squirrel@webmail.ebi.ac.uk> Hi Chris, Its very nice to hear from you. Thats good news for the automatic and manual bovine GO annotation pipeline but I still think we will need to manually annotated occasionally to IPI ids so we don't lose out on any precious GO annotation in the interim. thanks very much for the information, with kind regards, Evelyn > Hi Evelyn, > > I'm working with Baylor on bovine genome analysis and annotation. They > haven't > yet released the ~8X assembly on which the automated gene predictions and > manual gene model annotation will be done. This assembly should be > available > within weeks, and then Ensembl and NCBI will run their automated gene > prediction pipelines. We will be creating a single consensus gene set, > doing > some community gene model annotation, and submitting the final gene set to > GenBank as features on the assembly probably sometime in early 2007. I > would > suggest that people wait for the NCBI and Ensembl predictions on the 8X > assembly. Those will probably be available early this Fall. If people want > to > get started on those, we can then later easily transfer the GO annotations > to > the consensus gene set, because we will know which consensus gene model is > associated with each Ensembl and RefSeq gene model. > > all the best, > Chris Elsik > > > Evelyn Camon said: > >> Dear Farm Animal Interest Group, >> >> As you may already know the Gene Ontology Annotation (GOA)database at >> the EBI is the agreed supplier of GO annotation association files to the >> GO consortium for the Chicken and Bovine species. As such it is our >> responsibility to ensure that we supply and integrate high quality >> experimentally verified manual GO anotation from external groups >> (AgBase, Roslin, others) and create as complete an annotation >> association file as possible. >> >> The completed bovine and chicken genome sequences are still in a >> preliminary state in the Whole Genome Shotgun (WGS) section of the >> EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein >> coding regions have been annotated. As a result there is no way for >> UniProt to automatically create new entries using normal procedures and >> as such the sequences get archived in UniParc instead. This creates a >> problem since only UniProtKB identifiers could be annotated in GOA in >> the past. >> >> We have the following proposal for the farm animal communities: >> >> In order to get the data upgraded into UniProtKB we need >> >> (a) either individuals to submit a third party annotation (TPA) to >> either EMBL or Genbank where they upgrade the annotations to CDS (coding >> sequence). Then the data will automatically get integrated into >> UniProtKB/TrEMBL... >> >> AND/OR >> >> (b) Individuals or ChickGO Consortium make requests to the sequencing >> centres to update the annotation of these sequences from EST to CDS. >> Once this is done the data will enter the UniProtKB/TrEMBL by normal >> pipeline procedures and will also be available for the UniProtKB >> curators to annotate and promote into UniProtKB/Swiss-Prot. The TrEMBL >> sequences would also then automatically inherit good quality electronic >> GO annotation from the GOA group via InterPro, Swiss-Prot keywords and >> Enzyme to GO mappings (and other future planned GO mappings to >> pathways). >> >> Although the above is our preferred route, we acknowledge that it might >> take some time to implement SO we also propose (c) to temporarily allow >> the annotation of UniParc identifiers in the protein2GO annotation tool >> at GOA by Roslin, EBI and AgBase staff and will also consider the >> integration of GO annotation to UniParc identifiers from external groups >> if it is in keeping with GO Consortium guidelines. This is possible as >> UniParc identifiers are now STABLE and we can upgrade to UniProtKB >> accesions automatically later. >> >> Also to aid the farm animal proteome GO annotations we are proposing (d) >> to transfer automatically, any experimentally verified GO annotation >> between species using ensembl compara(predicts orthologs). This data >> will be evidence coded as IEA(inferred from electronic annotation) and >> NOT ISS(inferred from sequence similarity) in the GOA database. >> >> We are very interested in hearing your opinions particularly concerning >> the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations to >> solve this problem. >> >> Dave and Fiona could the ChickGO Consortium get in touch with the >> sequencing centres??? >> >> Kind regards, >> >> Evelyn Camon >> >> -- >> Evelyn Camon >> GOA Coordinator >> Senior Scientific Curator >> European Bioinformatics Institute >> Tel:01223-494465 >> Fax:01223-494468 >> E-mail: camon at ebi.ac.uk >> URL: http://www.ebi.ac.uk/goa >> >> > > > > -- > Christine Elsik > Department of Animal Science > Texas A&M University > 2471 TAMU > College Station, TX 77843-2471 > phone 979-845-2618 > fax 979-845-6970 > > > > From magrane at ebi.ac.uk Wed Jul 19 00:37:50 2006 From: magrane at ebi.ac.uk (Michele Magrane) Date: Wed, 19 Jul 2006 08:37:50 +0100 Subject: [Fwd: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk and UniPro] References: <1303.88.106.36.50.1152867987.squirrel@webmail.ebi.ac.uk> Message-ID: <44BDE14D.7E15D79@ebi.ac.uk> Dear John, Evelyn passed on your message about self-policing of sequences in UniProt. All sequences in the Swiss-Prot section of the UniProt Knowledgebase are manually checked during the curation process and this involves comparing multiple reports (if they exist) for a particular sequence as well as comparison with orthologs/paralogs and and providing what we believe to be the most accurate sequence. Sequences in TrEMBL which are awaiting manual curation are only as good as what has been provided in the underlying nucleotide entry so the quality here can vary, depending on what has been provided by the submitters of the nucleotide sequence. However, unlike RefSeq, we don't make predictions ourselves based on unannotated genomes. I hope this clarifies things but if you have any questions, feel free to contact me. Regards, Michele. > ---------------------------- Original Message ---------------------------- > Subject: RE: [Fwd: Updating Cow and Chicken entries to CDS in > EMBL/DDBJ/Genabnk and UniPro From: "john young \(IAH-C\)" > > Date: Thu, July 13, 2006 9:23 pm > To: "Fiona McCarthy" > "Evelyn Camon" > "goa_curators" > farmanimals at genome.stanford.edu > -------------------------------------------------------------------------- > > Just a note. > I agree with the importance of this. I don't know about Uniprot, but I do > know that the refseq database does appear to have some > self-correction > mechanism in place when real sequences come out that are better than the > predicted ones. For some strange reason they decided to include GNOMON > predictions in refseq, some of which were amazingly stupid. An example was > chicken CD86. However, when I submitted the real sequence to embl, it very > soon replaced the silly prediction in refseq. (Unfortunately not before > some people had referred to the silly prediction as a definitive sequence > in reviews, revealing the fact that they cannot possibly have looked at > the sequences! It's amazing what referees will let through these days!). > So - maybe Uniprot could/should have some similar > automatic > self-policing mechanism to update silly predictions with real sequences. > It is clearly a feasible process. > John Young > > -----Original Message----- > From: owner-farmanimals at genome.stanford.edu > [mailto:owner-farmanimals at genome.stanford.edu] On Behalf Of Fiona > McCarthy > Sent: 13 July 2006 16:40 > To: Evelyn Camon; goa_curators; farmanimals at genome.stanford.edu > Subject: Re: [Fwd: Updating Cow and Chicken entries to CDS in > EMBL/DDBJ/Genabnk and UniPro > > Hi Evelyn, > > Thank you for raising this point. As you know this is something that we at > AgBase have been working towards for some time now. Since not everyone may > be aware, the problem for farm animals with sequenced genomes is that many > of the proteins are 'predicted' based on electronic ORF prediction > algorithms. As Evelyn has already stated these entries are initially found > as UniParc entries rather than UniProtKB entries. The good news is that > this situation is changing and already this year the number of chicken > proteins not represented in UniProtKB has decreased from 70% to 50% of the > estimated total chicken genes. > > I think that if we want individuals to submit a third party > structural-genomic annotation (TPA) to either EMBL or Genbank we will need > to make the exact procedure very clear. I have already tried this route > with Genbank but was unable to make any progress because there was no > clear mechanism in place to change 'predicted' entries once they had been > submitted by the sequencing consortium. Maybe this will be easier now that > NCBI has genome champions. > > Your suggestion to temporarily allow the annotation of UniParc > identifiers > in the protein2GO annotation tool at GOA is a good one but from my direct > experience with UniParc IDs I can tell you that it is *very* time > consuming for annotators to track down the UniParc IDs. I think this is > because the UniParc database is so very large and cannot be parsed by > species. > > The way the gene association files are set up the gene product > identifier > can be from any public database. In fact, GOA has gene association linked > to Ensembl and Vega IDs (I did not check for Genbank IDs) so the > precedent > exists for using different database IDs. > > I think the issue is that we need > more flexibility in the use of IDs. Since the protein2GO tool cannot > handle every ID, the issue may be that we need to be able to map between > different IDs and this is something that I think would benefit many > communities. > > regards, > > Fiona > > Evelyn Camon writes: > > > > > > >-------- Original Message -------- > >Subject: Updating Cow and Chicken entries to CDS in EMBL/DDBJ/Genabnk > and UniProtKB > >Date: Thu, 13 Jul 2006 11:39:29 +0100 > >From: Evelyn Camon > >To: farmanimals at genome.stanford.edu, Jen Clark > >, > > camon at ebi.ac.uk, jane at ebi.ac.uk > > > >Dear Farm Animal Interest Group, > > > >As you may already know the Gene Ontology Annotation (GOA)database at the > EBI is the agreed supplier of GO annotation association files to > the > >GO consortium for the Chicken and Bovine species. As such it is our > responsibility to ensure that we supply and integrate high quality > experimentally verified manual GO anotation from external groups > >(AgBase, Roslin, others) and create as complete an annotation > >association file as possible. > > > >The completed bovine and chicken genome sequences are still in a > >preliminary state in the Whole Genome Shotgun (WGS) section of the > EMBL/GenBank/DDBJ nucleotide databases and therfore not all protein > coding regions have been annotated. As a result there is no way for > UniProt to automatically create new entries using normal procedures and > as such the sequences get archived in UniParc instead. This creates a > problem since only UniProtKB identifiers could be annotated in GOA in the > past. > > > >We have the following proposal for the farm animal communities: > > > >In order to get the data upgraded into UniProtKB we need > > > >(a) either individuals to submit a third party annotation (TPA) to either > EMBL or Genbank where they upgrade the annotations to CDS > (coding > >sequence). Then the data will automatically get integrated into > >UniProtKB/TrEMBL... > > > >AND/OR > > > >(b) Individuals or ChickGO Consortium make requests to the sequencing > centres to update the annotation of these sequences from EST to CDS. Once > this is done the data will enter the UniProtKB/TrEMBL by normal pipeline > procedures and will also be available for the UniProtKB > >curators to annotate and promote into UniProtKB/Swiss-Prot. The TrEMBL > sequences would also then automatically inherit good quality electronic > GO annotation from the GOA group via InterPro, Swiss-Prot keywords and > Enzyme to GO mappings (and other future planned GO mappings to > pathways). > > > >Although the above is our preferred route, we acknowledge that it might > take some time to implement SO we also propose (c) to temporarily allow > the annotation of UniParc identifiers in the protein2GO annotation tool > at GOA by Roslin, EBI and AgBase staff and will also consider the > integration of GO annotation to UniParc identifiers from external > groups > >if it is in keeping with GO Consortium guidelines. This is possible as > UniParc identifiers are now STABLE and we can upgrade to UniProtKB > accesions automatically later. > > > >Also to aid the farm animal proteome GO annotations we are proposing > (d) > >to transfer automatically, any experimentally verified GO annotation > between species using ensembl compara(predicts orthologs). This data will > be evidence coded as IEA(inferred from electronic annotation) and NOT > ISS(inferred from sequence similarity) in the GOA database. > > > >We are very interested in hearing your opinions particularly concerning > the TPA route of updating the EMBL/GenBank/DDBJ sequence annotations to > solve this problem. > > > >Dave and Fiona could the ChickGO Consortium get in touch with the > sequencing centres??? > > > >Kind regards, > > > >Evelyn Camon > > > >-- > >Evelyn Camon > >GOA Coordinator > >Senior Scientific Curator > >European Bioinformatics Institute > >Tel:01223-494465 > >Fax:01223-494468 > >E-mail: camon at ebi.ac.uk > >URL: http://www.ebi.ac.uk/goa > > > > > >-- > >Evelyn Camon > >GOA Coordinator > >Senior Scientific Curator > >European Bioinformatics Institute > >Tel:01223-494465 > >Fax:01223-494468 > >E-mail: camon at ebi.ac.uk > >URL: http://www.ebi.ac.uk/goa > > > > AgBase Biocurator > Department of Basic Sciences > Box 6100 > MS 39762-6100 > Mississippi State University > USA > Tel: (+ 1) 662 325 5859 > Fax: (+ 1) 662 325 1031 > > http://www.agbase.msstate.edu/ -- Michele Magrane UniProt Knowledgebase curation coordinator EMBL Outstation - European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, U.K. Tel: +44-1223-494656 Fax: +44-1223-494468 URL: http://www.ebi.ac.uk/