[Go] [go] annotations for refgenomes
Suzanna Lewis
suzi at berkeleybop.org
Thu Feb 28 10:57:10 PST 2008
I agree with Val. While I understand your arguments for the purposes
of teaching the purpose is different here. Here we are trying to track
annotation status and progress.
Along those lines I was thinking that the total number of "genes" on
the y axis might be better counted using the gp2protein file. That
way, if the gene is totally missing from the GA file, it will still be
counted in the totals (I'm not sure if everyone is yet ensuring that -
all- of their genes are in the GA file, but everyone on the ref.
genome has committed to ensuring that all of their genes are in the
gp2protein file).
-S
On Feb 28, 2008, at 8:31 AM, Valerie Wood wrote:
>
>>
>> For the ND annotations I prefer not to include them. For sure ND are
>> in the GA files but in my opinion they don't help the students
>> understand the research of the model organism. This is about
>> community research not about the work of the MODs. Annotations to
>> the
>> root are "No Data". My point is how much or how little
>> experimentation has been done on a particular organism. ND shows the
>> work of the curators, but what the experimental community hasn't
>> done. Also not all MODs have filled in annotations to the root for
>> all gene products. In my slide before this graph I talk about how
>> many genes are in each organism.
>>
>>
>>
>>
>
> hi Mike,
>
> I see your point for teaching purposes, but doesn't the inclusion of
> ND's help to show more clearly the number of genes which have not
> been
> studied? It is the work of curators, but its aim is to show where
> there
> is no data available i.e. there is no research. Otherwise you don't
> know
> whether the absence of data in the graph is due to the lack of data or
> the lack of curation.
>
> The fact that not all MODs have made all the possible ND annotations
> also makes its inclusion helpful to interpret of the data. You can see
> clearly in the graph where the ND data is included that for pombe and
> cerevisiae all genes have been looked at, because F,P and C bars are
> the
> same size. For other organisms they are different sizes which would
> indicate ND is not saturating (i.e there could be more biological
> knowledge but it hasn't been curated yet).
>
> If the ND data was shown at the top of the bar, you could mentally
> include or exclude it. It is probably not so useful when combined with
> NAS,ND,IC,TAS, but we are all making best efforts to minimize the
> use of
> these codes, so I guess they will eventually disappear.
>
> Coincidently the reviewers of our NRG review manuscript asked us to
> explain this annotation difference, and that some MODs use ND if there
> is ISS or IEA data available and some don't (between MODs) and how
> this
> affects GO usage.
>
> Val
>
>
>
>
>
>
>
>
>
>> _______________________________________________
>> Go mailing list
>> Go at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/go
>>
>>
>>
>>
>>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go
>
More information about the Go
mailing list