[Go] Spliceform column in GAF

Harold Drabkin hjd at informatics.jax.org
Thu Sep 11 06:24:47 PDT 2008


But if you do not know which isoform is there are several, then you 
really can't pin it to anything other than gene.
The experiment is being described by the annotation, and the experiment 
might have been performed in a way that is really at a gene level (eg, 
amutation in one gene compliments mutation in another gene to restore 
some function: IGI. 
h

Chris Mungall wrote:
>
> On Sep 10, 2008, at 12:02 PM, Harold Drabkin wrote:
>
>> It was my understanding that the annotation object for IGI, being a  
>> gene; in those cases, Col17 would be identical to column 2 (db object 
>> ID).
>
> I'm not in favour of this - it doesn't add any information and 
> actually confuses things.
>
> No matter what the evidence code, we are saying something about the 
> wild type gene product encoded by the gene
>
>>
>> This would also be applicable to IMP annotations where the allele (in 
>> inferred_from) is not specific for a single isoform (the whole coding 
>> region gets clobbered)..
>>
>> Column 12 can be gene, transcript (or RNA)  or protein, so it gets 
>> "gene".
>>
>> hjd
>>
>>
>> Chris Mungall wrote:
>>>
>>> On Sep 10, 2008, at 11:13 AM, Rama Balakrishnan wrote:
>>>
>>>> Chris,
>>>>
>>>> I am trying to understand this proposal. So please bear with me.
>>>>
>>>> Column 17 is optional and column 12 is mandatory. If you don't know 
>>>> the spliceform of the product in col 2 and hence leave col 17 as 
>>>> blank, then what would you put in column 12?
>>>
>>> protein, if it's a protein-coding gene. RNA if RNA-coding etc.
>>>
>>> I didn't document the case where the gene has not been molecularly 
>>> characterized and we have IGI annotations. I'm open to suggestions 
>>> here. Perhaps the best thing is to allow this column to be blank if 
>>> we truly do not know if the gene is protein coding or not.
>>>
>>> However, if the gene is known to be protein coding yet the 
>>> particular spliceform is not known then the type column should be 
>>> protein
>>>
>>>> Because as I understand the proposal, what is in Col12 should 
>>>> reflect the type of the spliceform in col 17?
>>>
>>> Yes, but we're making the open-world assumption here: absence of 
>>> data does not mean an absence of the entity in reality.
>>>
>>> Thanks for the questions, looks like the document needs work to make 
>>> it more readable.
>>>
>>> We only had a very cursory discussion of the type column in SLC, and 
>>> we certainly didn't give people time to absorb the ramifications.
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Rama
>>>>
>>>> On Sep 10, 2008, at 10:51 AM, Chris Mungall wrote:
>>>>
>>>>>
>>>>> If a group submitted annotations for two records corresponding to 
>>>>> the same gene this would be in violation. The most likely way for 
>>>>> this to happen would be when a MOD submits annotations to both 
>>>>> UniProtKB IDs and MOD IDs.
>>>>>
>>>>> On Sep 10, 2008, at 10:34 AM, Stoddard, Alexander wrote:
>>>>>
>>>>>> I do not clearly understand the following part of the spec regarding
>>>>>> non-redundant canonical entities:
>>>>>>
>>>>>> "In addition the GAF must be non-redundant with respect to canonical
>>>>>> entities in a genome"
>>>>>>
>>>>>> Chris, would you please give an example of how a GAF file could be
>>>>>> redundant with respect to canonical entities and how to correct the
>>>>>> example?
>>>>>>
>>>>>> Thank you,
>>>>>> Alex Stoddard
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: go-bounces at genome.stanford.edu
>>>>>> [mailto:go-bounces at genome.stanford.edu] On Behalf Of Chris Mungall
>>>>>> Sent: Tuesday, September 09, 2008 5:30 PM
>>>>>> To: go list; Paul D Thomas
>>>>>> Subject: [Go] Spliceform column in GAF [was Re: [Gofriends] 
>>>>>> Redundancy
>>>>>> ingo_XXXXXX-assocdb-tables/dbxref.txt]
>>>>>>
>>>>>> [redirected to GO]
>>>>>>
>>>>>> The change Mike speaks of is for the new spliceform column in the 
>>>>>> GAF.
>>>>>>
>>>>>> I have specced this out here:
>>>>>>
>>>>>>    
>>>>>> http://wiki.geneontology.org/index.php/GAF_Spliceform_Column_Proposal 
>>>>>>
>>>>>>
>>>>>> Note that most of you will have read the previous document 
>>>>>> describing
>>>>>> current practices for annotating alternate spliceforms:
>>>>>>
>>>>>>    
>>>>>> http://wiki.geneontology.org/index.php/Annotation_of_Alternate_Splicefor 
>>>>>>
>>>>>> ms
>>>>>>
>>>>>> But you won't have read the fully formulated proposal, as I only put
>>>>>> it on the wiki today.
>>>>>>
>>>>>> Note that this proposal was ratified at the SLC GOC meeting, but the
>>>>>> majority of the discussion was at the RefG portion of the meeting.
>>>>>> It's particularly important that folks who weren't at this part read
>>>>>> and understand the proposal. Ratification at the GOC meeting may 
>>>>>> have
>>>>>> been premature as I only intended to sketch out a solution
>>>>>> collaboratively at that meeting.
>>>>>>
>>>>>> Once the above wiki page is in shape, we should send an announcement
>>>>>> to gofriends (promptly, as it is of relevance to the current
>>>>>> discussion below), all data providers and consumers, and then after
>>>>>> that in the newsletter and on the main GO docs.
>>>>>>
>>>>>> As Mike says we are aiming for a introduction some time in 2009. 
>>>>>> It's
>>>>>> important that anyone involved with producing GAFs is aware of the
>>>>>> changes and is OK with this timetable.
>>>>>>
>>>>>> Cheers
>>>>>> Chris
>>>>>>
>>>>>> On Sep 9, 2008, at 1:22 PM, Mike Cherry wrote:
>>>>>>
>>>>>>> There is a change coming to the format of the gene association file
>>>>>>> which will solve this problem.  Annotations to proteins, gene,
>>>>>>> transcripts, etc for a particular locus will be identified as such.
>>>>>>> The change should occur in 2009.
>>>>>>>
>>>>>>> -Mike
>>>>>>>
>>>>>>>
>>>>>>>> From: "Quaid Morris" <quaid.morris at gmail.com>
>>>>>>>> To: "Gabriel Berriz" <gberriz at hms.harvard.edu>
>>>>>>>> Subject: Re: [Gofriends] Redundancy in go_XXXXXX-assocdb-tables/
>>>>>>>> dbxref.txt
>>>>>>>> Cc: gofriends at genome.stanford.edu
>>>>>>>>
>>>>>>>> Hi Gabriel,
>>>>>>>>
>>>>>>>> It looks like in the example that you gave RGD ID 1302948 is a 
>>>>>>>> gene
>>>>>>>> ID and
>>>>>>>> ENSRNOP00000034933 is a protein ID.  Are all your examples like
>>>>>>>> this?  Maybe
>>>>>>>> there are circumstances when it's possible to annotate a specific
>>>>>>>> isoform
>>>>>>>> and others when only the gene can be annotated.
>>>>>>>>
>>>>>>>> Q
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gofriends mailing list
>>>>>>> Gofriends at geneontology.org
>>>>>>> http://fafner.stanford.edu/mailman/listinfo/gofriends
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Go mailing list
>>>>>> Go at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Go mailing list
>>>>> Go at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Go mailing list
>>> Go at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/go
>>
>>
>



More information about the Go mailing list