[Go] [go] annotations for refgenomes
Chris Mungall
cjm at fruitfly.org
Fri Feb 29 10:26:20 PST 2008
On Feb 28, 2008, at 7:58 AM, Mike Cherry wrote:
> On Feb 28, 2008, at 3:47 AM, Judith Blake wrote:
>
>> Mike,
>>
>> I'm a little confused. I don't understand how this is constructed.
>> Does number of genes 'ISS' mean the number of genes annotated Only
>> with ISS? or are the numbers cumulative. For mouse, for example,
>> do you mean to say that ~ 8000 genes are annotated with IGC or RCA?
>> or that ~ 1000 genes are annotated with these codes in addition to
>> the other codes except IEA?
>>
>> Judy
>>
>
> Its not only. There is a hierarchy used. If a gp had IDA, ISS and
> IEA then only IDA is counted. If ISS and IEA then ISS is counted.
> Here is my order of evidence I use for the graphs:
>
> (IMP|IDA|IPI|IGI|IEP) > (ISS) > (IC|NR|ND|NAS|TAS) > (IGC|RCA) > (IEA)
The bar charts are great!
Minor comments:
1) If the trust hierarchy Mike mentions above is useful beyond Mike's
teaching purposes (it seems that we agree it is) then it should be
explicitly encoded in ECO, or some supplemental ontology to ECO, so
that we don't have to hardcode this in every piece of software that
summarises annotations in a similar fashion
2) We should adopt some consistent visuals or terminology across all
presentation aspects of GO to avoid confusion regarding whether
annotated entities are double counted or not.
One option is to explicitly list codes and the "only" qualifier, for
all but the most trusted evidence:
IEA only
IGC, RCA or IEA only
...
ISS, IC, ND, NR, NAS, TAS, IGC, RCA or IEA only
IDA, IMP, IPI, IEP
This is relatively unambiguous, but costs more screen real estate
Another option would be to include the '<' symbol in the key, and to
ensure we use this symbol consistently in the context of evidence.
This has the advantage of keeping the figure legend almost as compact
as in Mike's diagram, although it doesn't make the exclusion clear
Perhaps the evidence WG can get back on (1) and the WPWG on (2)?
> I use the graph for a class I teach. My point is to show how many
> annotations exist and general classes of experimental, and non-
> experimental, evidence known for an organism's gene products.
>
> For the ND annotations I prefer not to include them. For sure ND are
> in the GA files but in my opinion they don't help the students
> understand the research of the model organism. This is about
> community research not about the work of the MODs. Annotations to the
> root are "No Data". My point is how much or how little
> experimentation has been done on a particular organism. ND shows the
> work of the curators, but what the experimental community hasn't
> done. Also not all MODs have filled in annotations to the root for
> all gene products. In my slide before this graph I talk about how
> many genes are in each organism.
>
> Because of evolution we have this beautiful connectedness in biology.
> We work on the particular systems that are most appropriate, more
> powerful, in exploring biology.
>
> -Mike
>
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go
>
More information about the Go
mailing list