[Go] Spliceform column in GAF
Chris Mungall
cjm at berkeleybop.org
Wed Sep 10 13:35:48 PDT 2008
Thanks Karen
I requested these because in the original proposal, col12 (type)
retained its original meaning, i.e. the type of the entity in col 2.
However, the majority felt strongly that the type column should be
changed to be the type of the entity in col 12, i.e. the specific gene
product spliceform - this would never be a gene
On Sep 10, 2008, at 12:44 PM, Karen Eilbeck wrote:
> After the last meeting we made an action item to add the terms for
> Col 12 to SO
> http://sourceforge.net/tracker/index.php?func=detail&aid=1953535&group_id=72703&atid=810408
> This has been done.
>
> SO:0001264 gRNA_gene
> SO:0001265 miRNA_gene
> SO:0001263 nc_RNA_gene
> SO:0001266 scRNA_gene
> SO:0001267 snoRNA_gene
> SO:0001268 snRNA_gene
> SO:0001269 SRP_RNA_gene
> SO:0001270 stRNA_gene
> SO:0001271 tmRNA_gene
> SO:0001272 tRNA_gene
> SO:0001217 protein_coding_gene
>
> --Karen
>
> On 9/10/08 1:01 PM, "Mike Cherry" <cherry at stanford.edu> wrote:
>
> I think column 12 should be mandatory. I'd suggest locus when you
> don't know. However locus is not in SO. The root node of SO is
> region, it follows our usage with GO, use the root node.
>
> -Mike
>
> On Sep 10, 2008, at 11:29 AM, Chris Mungall wrote:
>
> >
> > On Sep 10, 2008, at 11:13 AM, Rama Balakrishnan wrote:
> >
> >> Chris,
> >>
> >> I am trying to understand this proposal. So please bear with me.
> >>
> >> Column 17 is optional and column 12 is mandatory. If you don't know
> >> the spliceform of the product in col 2 and hence leave col 17 as
> >> blank, then what would you put in column 12?
> >
> > protein, if it's a protein-coding gene. RNA if RNA-coding etc.
> >
> > I didn't document the case where the gene has not been molecularly
> > characterized and we have IGI annotations. I'm open to suggestions
> > here. Perhaps the best thing is to allow this column to be blank if
> > we truly do not know if the gene is protein coding or not.
> >
> > However, if the gene is known to be protein coding yet the
> > particular spliceform is not known then the type column should be
> > protein
> >
> >> Because as I understand the proposal, what is in Col12 should
> >> reflect the type of the spliceform in col 17?
> >
> > Yes, but we're making the open-world assumption here: absence of
> > data does not mean an absence of the entity in reality.
> >
> > Thanks for the questions, looks like the document needs work to make
> > it more readable.
> >
> > We only had a very cursory discussion of the type column in SLC, and
> > we certainly didn't give people time to absorb the ramifications.
> >
> >>
> >> Thanks,
> >>
> >> Rama
> >>
> >> On Sep 10, 2008, at 10:51 AM, Chris Mungall wrote:
> >>
> >>>
> >>> If a group submitted annotations for two records corresponding to
> >>> the same gene this would be in violation. The most likely way for
> >>> this to happen would be when a MOD submits annotations to both
> >>> UniProtKB IDs and MOD IDs.
> >>>
> >>> On Sep 10, 2008, at 10:34 AM, Stoddard, Alexander wrote:
> >>>
> >>>> I do not clearly understand the following part of the spec
> >>>> regarding
> >>>> non-redundant canonical entities:
> >>>>
> >>>> "In addition the GAF must be non-redundant with respect to
> >>>> canonical
> >>>> entities in a genome"
> >>>>
> >>>> Chris, would you please give an example of how a GAF file could
> be
> >>>> redundant with respect to canonical entities and how to correct
> the
> >>>> example?
> >>>>
> >>>> Thank you,
> >>>> Alex Stoddard
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: go-bounces at genome.stanford.edu
> >>>> [mailto:go-bounces at genome.stanford.edu] On Behalf Of Chris
> Mungall
> >>>> Sent: Tuesday, September 09, 2008 5:30 PM
> >>>> To: go list; Paul D Thomas
> >>>> Subject: [Go] Spliceform column in GAF [was Re: [Gofriends]
> >>>> Redundancy
> >>>> ingo_XXXXXX-assocdb-tables/dbxref.txt]
> >>>>
> >>>> [redirected to GO]
> >>>>
> >>>> The change Mike speaks of is for the new spliceform column in the
> >>>> GAF.
> >>>>
> >>>> I have specced this out here:
> >>>>
> >>>>
> >>>> http://wiki.geneontology.org/index.php/GAF_Spliceform_Column_Proposal
> >>>>
> >>>> Note that most of you will have read the previous document
> >>>> describing
> >>>> current practices for annotating alternate spliceforms:
> >>>>
> >>>>
> >>>> http://wiki.geneontology.org/index.php/Annotation_of_Alternate_Splicefor
> >>>> ms
> >>>>
> >>>> But you won't have read the fully formulated proposal, as I only
> >>>> put
> >>>> it on the wiki today.
> >>>>
> >>>> Note that this proposal was ratified at the SLC GOC meeting, but
> >>>> the
> >>>> majority of the discussion was at the RefG portion of the
> meeting.
> >>>> It's particularly important that folks who weren't at this part
> >>>> read
> >>>> and understand the proposal. Ratification at the GOC meeting may
> >>>> have
> >>>> been premature as I only intended to sketch out a solution
> >>>> collaboratively at that meeting.
> >>>>
> >>>> Once the above wiki page is in shape, we should send an
> >>>> announcement
> >>>> to gofriends (promptly, as it is of relevance to the current
> >>>> discussion below), all data providers and consumers, and then
> after
> >>>> that in the newsletter and on the main GO docs.
> >>>>
> >>>> As Mike says we are aiming for a introduction some time in 2009.
> >>>> It's
> >>>> important that anyone involved with producing GAFs is aware of
> the
> >>>> changes and is OK with this timetable.
> >>>>
> >>>> Cheers
> >>>> Chris
> >>>>
> >>>> On Sep 9, 2008, at 1:22 PM, Mike Cherry wrote:
> >>>>
> >>>>> There is a change coming to the format of the gene association
> >>>>> file
> >>>>> which will solve this problem. Annotations to proteins, gene,
> >>>>> transcripts, etc for a particular locus will be identified as
> >>>>> such.
> >>>>> The change should occur in 2009.
> >>>>>
> >>>>> -Mike
> >>>>>
> >>>>>
> >>>>>> From: "Quaid Morris" <quaid.morris at gmail.com>
> >>>>>> To: "Gabriel Berriz" <gberriz at hms.harvard.edu>
> >>>>>> Subject: Re: [Gofriends] Redundancy in go_XXXXXX-assocdb-
> tables/
> >>>>>> dbxref.txt
> >>>>>> Cc: gofriends at genome.stanford.edu
> >>>>>>
> >>>>>> Hi Gabriel,
> >>>>>>
> >>>>>> It looks like in the example that you gave RGD ID 1302948 is a
> >>>>>> gene
> >>>>>> ID and
> >>>>>> ENSRNOP00000034933 is a protein ID. Are all your examples like
> >>>>>> this? Maybe
> >>>>>> there are circumstances when it's possible to annotate a
> specific
> >>>>>> isoform
> >>>>>> and others when only the gene can be annotated.
> >>>>>>
> >>>>>> Q
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Gofriends mailing list
> >>>>> Gofriends at geneontology.org
> >>>>> http://fafner.stanford.edu/mailman/listinfo/gofriends
> >>>>>
> >>>>
> >>>> _______________________________________________
> >>>> Go mailing list
> >>>> Go at geneontology.org
> >>>> http://fafner.stanford.edu/mailman/listinfo/go
> >>>>
> >>>
> >>> _______________________________________________
> >>> Go mailing list
> >>> Go at geneontology.org
> >>> http://fafner.stanford.edu/mailman/listinfo/go
> >>
> >>
> >
> > _______________________________________________
> > Go mailing list
> > Go at geneontology.org
> > http://fafner.stanford.edu/mailman/listinfo/go
>
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go
>
>
More information about the Go
mailing list