[go] NOT annotations
Benjamin Hitz
hitz at genome.Stanford.EDU
Fri Feb 1 13:48:55 PST 2008
On Feb 1, 2008, at 12:47 PM, Judith Blake wrote:
>
> I think by far the most common case right now is not with isoforms,
> but with conflicting data. And that should be represented.
The following is a philosophical argument, and as such many have
limited bearing on biology.
I wonder if the value of reporting these is overstated. To head off
an obvious counter argument, it's data, and valid data should not be
thrown out - but since we do not curate to an infinite depth, we are
already "throwing out" information so it's really a matter of where
you draw the line.
It is of course true that a NOT annotation represents both
experimental and curational work indicating "something" was done.
But I think that reporting gene_product x term associations via a
positive standard only has it's merits.
Mainly what I am thinking about is the coverage of negative
experiments reported in the literature and some theoretical
"completeness" of the GO/annotation system.
In principle, at some theoretical level - if you are using any sort
of negative standard, you would have to assert (technically) that for
gene product X it is NOT (by experiment) all GO terms that it
isn't. I think that GO associations (and biology in general) has
an implict not. For some protein annotated as protein kinase, it is
NOT a glycohydrolase unless someone publishes (and a curator curates)
an experiment indicating that it is, in fact, both.
Couple this with the fact that many (I would venture to say MOST)
negative results do not get published, you have a very small number
of "valid, useful, explicit" NOT associations (such as conflicting
experiments) relative to positive associations.
What this means, to me, is that IN THE AGGREGATE NOT annotations are
not very useful to the community at large. The probability of
someone misinterperting a given NOT experiment is vastly greater than
someone finding it useful. As an imperfect analogy - if you have a
genetic disorder that occurs 1 in 1,000,000 persons, and a test that
gives a false positive result 5% of the time, is the test useful?
(ANSWER it's not, because the ratio of false positives to true
positives is roughly 47,000:1.
Lest you think those numbers are not in the correct ball park, there
are 3634 qualified associations in the January gofull. There were
~730000 non-IEA associations and over 19 million associations
including IEAs.
This is why I think all "qualified" associations should be in
separate files, and never shown by default on interfaces.
Ben
> Pankaj Jaiswal wrote:
>>
>>
>> Judith Blake wrote:
>>> hummm
>>> I think the case that Harold was saying, and that we currently
>>> have in other annotations here at MGI, is that we have
>>>
>>> x A
>>> x NOT A
>>>
>>> both lines of evidence exist at this point.
>>>
>>> In some cases, different experiments give different results
>>>
>>> In the case that Harold discussed, the issue really was that we
>>> don't properly distinguish isoforms, so from the gene level, the
>>> two are combined whereas if you could represent each isoform, the
>>> one would be x-1 A and one would be x-2 NOT A.
>>>
>>> judy
>>>
>>
>> I think unless the correct object_type is not defined in the
>> annotations, it may not be very obvious. Ideally annotations
>> should not be done to the gene, but to the transcripts/proteins or
>> their isoforms (means all coming from the same gene/loci on the
>> genome).
>>
>> So it can be
>>
>> x.1 (object_type: protein isoform) and x.2 (object_type: protein
>> isoform) map to x (object_type: gene)
>>
>> Annotations
>>
>> x.1 A
>> x.2 NOT A
>>
>> Or
>>
>> x A With x.1
>> x NOT A With x.2
>>
>
--
Ben Hitz
Senior Scientific Programmer ** Saccharomyces Genome Database ** GO
Consortium
Stanford University ** hitz at genome.stanford.edu
More information about the Go
mailing list