[Go] Spliceform column in GAF

Benjamin Hitz hitz at genome.stanford.edu
Wed Sep 10 13:48:36 PDT 2008


Since I loath and despise blank columns, maybe we need a "default"  
splice form (Identical to col 2?  'nr' for not reported?)

Instead of locus as generic col 12, maybe gene_product.  Is this is  
SO?  Is anything the parent of RNA and PROTEIN?

Ben

On Sep 10, 2008, at 1:35 PM, Chris Mungall wrote:

>
> Thanks Karen
>
> I requested these because in the original proposal, col12 (type)  
> retained its original meaning, i.e. the type of the entity in col 2.  
> However, the majority felt strongly that the type column should be  
> changed to be the type of the entity in col 12, i.e. the specific  
> gene product spliceform - this would never be a gene
>
> On Sep 10, 2008, at 12:44 PM, Karen Eilbeck wrote:
>
>> After the last meeting we made an action item to add the terms for  
>> Col 12 to SO
>> http://sourceforge.net/tracker/index.php?func=detail&aid=1953535&group_id=72703&atid=810408
>> This has been done.
>>
>>  SO:0001264   gRNA_gene
>>  SO:0001265   miRNA_gene
>>  SO:0001263   nc_RNA_gene
>>  SO:0001266   scRNA_gene
>>  SO:0001267   snoRNA_gene
>>  SO:0001268   snRNA_gene
>>  SO:0001269   SRP_RNA_gene
>>  SO:0001270   stRNA_gene
>>  SO:0001271   tmRNA_gene
>>  SO:0001272   tRNA_gene
>>  SO:0001217   protein_coding_gene
>>
>> --Karen
>>
>> On 9/10/08 1:01 PM, "Mike Cherry" <cherry at stanford.edu> wrote:
>>
>> I think column 12 should be mandatory.  I'd suggest locus when you
>> don't know.  However locus is not in SO.  The root node of SO is
>> region, it follows our usage with GO, use the root node.
>>
>> -Mike
>>
>> On Sep 10, 2008, at 11:29 AM, Chris Mungall wrote:
>>
>> >
>> > On Sep 10, 2008, at 11:13 AM, Rama Balakrishnan wrote:
>> >
>> >> Chris,
>> >>
>> >> I am trying to understand this proposal. So please bear with me.
>> >>
>> >> Column 17 is optional and column 12 is mandatory. If you don't  
>> know
>> >> the spliceform of the product in col 2 and hence leave col 17 as
>> >> blank, then what would you put in column 12?
>> >
>> > protein, if it's a protein-coding gene. RNA if RNA-coding etc.
>> >
>> > I didn't document the case where the gene has not been molecularly
>> > characterized and we have IGI annotations. I'm open to suggestions
>> > here. Perhaps the best thing is to allow this column to be blank if
>> > we truly do not know if the gene is protein coding or not.
>> >
>> > However, if the gene is known to be protein coding yet the
>> > particular spliceform is not known then the type column should be
>> > protein
>> >
>> >> Because as I understand the proposal, what is in Col12 should
>> >> reflect the type of the spliceform in col 17?
>> >
>> > Yes, but we're making the open-world assumption here: absence of
>> > data does not mean an absence of the entity in reality.
>> >
>> > Thanks for the questions, looks like the document needs work to  
>> make
>> > it more readable.
>> >
>> > We only had a very cursory discussion of the type column in SLC,  
>> and
>> > we certainly didn't give people time to absorb the ramifications.
>> >
>> >>
>> >> Thanks,
>> >>
>> >> Rama
>> >>
>> >> On Sep 10, 2008, at 10:51 AM, Chris Mungall wrote:
>> >>
>> >>>
>> >>> If a group submitted annotations for two records corresponding to
>> >>> the same gene this would be in violation. The most likely way for
>> >>> this to happen would be when a MOD submits annotations to both
>> >>> UniProtKB IDs and MOD IDs.
>> >>>
>> >>> On Sep 10, 2008, at 10:34 AM, Stoddard, Alexander wrote:
>> >>>
>> >>>> I do not clearly understand the following part of the spec
>> >>>> regarding
>> >>>> non-redundant canonical entities:
>> >>>>
>> >>>> "In addition the GAF must be non-redundant with respect to
>> >>>> canonical
>> >>>> entities in a genome"
>> >>>>
>> >>>> Chris, would you please give an example of how a GAF file  
>> could be
>> >>>> redundant with respect to canonical entities and how to  
>> correct the
>> >>>> example?
>> >>>>
>> >>>> Thank you,
>> >>>> Alex Stoddard
>> >>>>
>> >>>>
>> >>>> -----Original Message-----
>> >>>> From: go-bounces at genome.stanford.edu
>> >>>> [mailto:go-bounces at genome.stanford.edu] On Behalf Of Chris  
>> Mungall
>> >>>> Sent: Tuesday, September 09, 2008 5:30 PM
>> >>>> To: go list; Paul D Thomas
>> >>>> Subject: [Go] Spliceform column in GAF [was Re: [Gofriends]
>> >>>> Redundancy
>> >>>> ingo_XXXXXX-assocdb-tables/dbxref.txt]
>> >>>>
>> >>>> [redirected to GO]
>> >>>>
>> >>>> The change Mike speaks of is for the new spliceform column in  
>> the
>> >>>> GAF.
>> >>>>
>> >>>> I have specced this out here:
>> >>>>
>> >>>>
>> >>>> http://wiki.geneontology.org/index.php/GAF_Spliceform_Column_Proposal
>> >>>>
>> >>>> Note that most of you will have read the previous document
>> >>>> describing
>> >>>> current practices for annotating alternate spliceforms:
>> >>>>
>> >>>>
>> >>>> http://wiki.geneontology.org/index.php/Annotation_of_Alternate_Splicefor
>> >>>> ms
>> >>>>
>> >>>> But you won't have read the fully formulated proposal, as I only
>> >>>> put
>> >>>> it on the wiki today.
>> >>>>
>> >>>> Note that this proposal was ratified at the SLC GOC meeting, but
>> >>>> the
>> >>>> majority of the discussion was at the RefG portion of the  
>> meeting.
>> >>>> It's particularly important that folks who weren't at this part
>> >>>> read
>> >>>> and understand the proposal. Ratification at the GOC meeting may
>> >>>> have
>> >>>> been premature as I only intended to sketch out a solution
>> >>>> collaboratively at that meeting.
>> >>>>
>> >>>> Once the above wiki page is in shape, we should send an
>> >>>> announcement
>> >>>> to gofriends (promptly, as it is of relevance to the current
>> >>>> discussion below), all data providers and consumers, and then  
>> after
>> >>>> that in the newsletter and on the main GO docs.
>> >>>>
>> >>>> As Mike says we are aiming for a introduction some time in 2009.
>> >>>> It's
>> >>>> important that anyone involved with producing GAFs is aware of  
>> the
>> >>>> changes and is OK with this timetable.
>> >>>>
>> >>>> Cheers
>> >>>> Chris
>> >>>>
>> >>>> On Sep 9, 2008, at 1:22 PM, Mike Cherry wrote:
>> >>>>
>> >>>>> There is a change coming to the format of the gene association
>> >>>>> file
>> >>>>> which will solve this problem.  Annotations to proteins, gene,
>> >>>>> transcripts, etc for a particular locus will be identified as
>> >>>>> such.
>> >>>>> The change should occur in 2009.
>> >>>>>
>> >>>>> -Mike
>> >>>>>
>> >>>>>
>> >>>>>> From: "Quaid Morris" <quaid.morris at gmail.com>
>> >>>>>> To: "Gabriel Berriz" <gberriz at hms.harvard.edu>
>> >>>>>> Subject: Re: [Gofriends] Redundancy in go_XXXXXX-assocdb- 
>> tables/
>> >>>>>> dbxref.txt
>> >>>>>> Cc: gofriends at genome.stanford.edu
>> >>>>>>
>> >>>>>> Hi Gabriel,
>> >>>>>>
>> >>>>>> It looks like in the example that you gave RGD ID 1302948 is a
>> >>>>>> gene
>> >>>>>> ID and
>> >>>>>> ENSRNOP00000034933 is a protein ID.  Are all your examples  
>> like
>> >>>>>> this?  Maybe
>> >>>>>> there are circumstances when it's possible to annotate a  
>> specific
>> >>>>>> isoform
>> >>>>>> and others when only the gene can be annotated.
>> >>>>>>
>> >>>>>> Q
>> >>>>>>
>> >>>>> _______________________________________________
>> >>>>> Gofriends mailing list
>> >>>>> Gofriends at geneontology.org
>> >>>>> http://fafner.stanford.edu/mailman/listinfo/gofriends
>> >>>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Go mailing list
>> >>>> Go at geneontology.org
>> >>>> http://fafner.stanford.edu/mailman/listinfo/go
>> >>>>
>> >>>
>> >>> _______________________________________________
>> >>> Go mailing list
>> >>> Go at geneontology.org
>> >>> http://fafner.stanford.edu/mailman/listinfo/go
>> >>
>> >>
>> >
>> > _______________________________________________
>> > Go mailing list
>> > Go at geneontology.org
>> > http://fafner.stanford.edu/mailman/listinfo/go
>>
>> _______________________________________________
>> Go mailing list
>> Go at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/go
>>
>>
>
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go

--
Ben Hitz
Senior Scientific Programmer ** Saccharomyces Genome Database ** GO  
Consortium
Stanford University ** hitz at genome.stanford.edu





More information about the Go mailing list