[go] Requirement for all 'unknown' annotations to use ND code
Karen Christie
kchris at genome.Stanford.EDU
Mon Sep 17 11:47:54 PDT 2007
Hi,
The reason I brought this issue up was that I was very uncomfortable with
the rationale that people could use the ND evidence code as a way to find
the unknown annotations, or with having that purpose as a justification to
only allow the ND code for root annotations in our documentation. It seems
that we have come to consensus that we should not be saying anything about
this as a software shortcut to find unknown annotations.
The more recent discussion has been dealing with the issue that
annotations to the root node are a special case anyway, allowing us to
track curation progress. While we know that these aren't annotations of
knowledge in the same way as other annotations, I think the group has
agreed many, many times that we DO want a way to distinguish between genes
that have been looked but where nothing is known and genes that just
haven't been curated yet, so I think we're stuck with tracking curation
progress in some way.
In that sense, I can see a rationale for only allowing annotations to the
root node to be made with the ND code in that we are making a curatorial
statement about what a curator looked at in order to make an annotation to
the root node. The ND evidence code is already a special case in that it
can only be used for annotations to the root node.
Provided that the documentation is phrased in terms of curatorial process,
i.e. the procedure required in order to be able to make the statement the
a given aspect is unknown for a given gene, I'm OK with this restriction.
Note that I'm not volunteering to examine, or do any rewriting of, the ND
documentation since I'm due to deliver within the coming week.
Thanks to everyone for a good discussion.
-Karen
On Mon, 17 Sep 2007, Jim Hu wrote:
> Makes sense to me (and I'm sure it won't be the last bad idea I throw out
> there).
>
> So... as a user, having access to instances where negative results have been
> found, as in the no phenotype example, is useful. Knowing that the mutant
> has been made and looked at is valuable. But I can see that it probably
> doesn't belong in GO.
>
> I think the distinction that you and Ben raise about curation progress vs.
> annotation is important. Perhaps curation progress really doesn't belong at
> all.
>
> Jim
>
>
>
> On Sep 17, 2007, at 12:53 PM, Mike Cherry wrote:
>
>> I think this is a bad idea. ISS to another organism that has the root
>> association is not useful. That just means that in the other organism
>> there was no data, you have to look for experiments in each organism. A
>> knockout with no phenotype is data, its a negative result yes but its data.
>> You have data that there is no phenotype for that gene for the screen that
>> was performed.
>>
>> An association to the root was a convenience we used to show that we looked
>> for a result. Its not an annotation its a curation progress statement. A
>> note to say we are looking at all the genes. I don't like the use of any
>> experimental code for the root.
>>
>> -Mike
>>
>>
>> On Sep 17, 2007, at 10:02 AM, Jim Hu wrote:
>>
>>> On Sep 17, 2007, at 11:07 AM, Valerie Wood wrote:
>>>
>>>>
>>>>
>>>>
>>>> I don't see how you can make an annotation to the root node using
>>>> RCA/IC/IMP/ISS or IDA?
>>>
>>> We haven't done these yet, but
>>>
>>> ISS - similarity to proteins annotated to the root node with ND in another
>>> organism?
>>> IMP - What does one do for large scale knockout screens when a KO shows no
>>> phenotype. Someone did look, so it's not really ND, is it?
>>>
>>> IDA seems pretty hard to rationalize. I can imagine negative results, as
>>> in "previous analysis suggested that gene X has activity Y, but we can't
>>> detect it" but wouldn't that get a NOT modifier for the assayed activity
>>> Y, if it was annotated at all? I'm actually thinking of a case where
>>> paper A says that an E. coli protein is a nuclease, and paper B shows that
>>> the nuclease activity is a contaminant. I'm thinking there are the
>>> following choices if that results in not knowing the function of the gene
>>> X product:
>>>
>>> * delete the annotation from paper A
>>> ** no annotation to the root node
>>> ** annotate to the root node
>>> * add the annotation from paper B with a not Y
>>> ** no annotation to the root node
>>> ** annotate to the root node
>>>
>>> I recall discussing this kind of situation with Karen, but I'm not sure
>>> that we covered how to handle the root node. Does this change if the
>>> information that the putative activity was a contaminant is not published
>>> but the curator knows about it from a meeting or a personal communication?
>>>
>>> Similarly, can one annotate to the root node with RCA if a computational
>>> analysis shows that protein X does not have previously suggested activity
>>> Y based on improved sophistication of motif analysis? Again, if this
>>> removes the only putative activity from an earlier analysis, does the
>>> protein get a root node annotation, or does it get nothing? Example, a
>>> protein is annotated by some project as a thioredoxin based on good
>>> sequence similarity to the fold family members. Later, someone notices
>>> that the active site residues are missing.
>>>
>>> Jim
>>>
>>>>
>>>> The ND means the curator has looked at all the papers for this gene (and
>>>> for some databases checked the annotations to orthologs to see if any
>>>> sensible inferences can be made), and as of the data the annotation was
>>>> mane there is "no data".
>>>>
>>>> We wouldn't be able to do this with any of the other evidence codes.
>>>>
>>>> Val
>>>>
>>>>
>>>> Suzanna Lewis wrote:
>>>>
>>>>> After reading through this thread I see no strong reason for requiring
>>>>> ND as the evidence code for annotations to the root.
>>>>>
>>>>> In fact, I'm now wondering why we have ND at all. Seems to me that "no
>>>>> data" is a result. It is not the type of experiment that was done.
>>>>> Maybe the only accurate use of ND is when we don't even know what kind
>>>>> of experiment was carried out.
>>>>>
>>>>> -S
>>>>>
>>>>> On Sep 17, 2007, at 8:42 AM, Valerie Wood wrote:
>>>>>
>>>>>> So we don't all need to run the query......
>>>>>>
>>>>>>
>>>>>> biological_process Dictybase ISS 1
>>>>>> biological_process Dictybase ND 1313
>>>>>> biological_process FB ND 1022
>>>>>> biological_process GeneDB_Pfalciparum ND 702
>>>>>> biological_process GeneDB_Spombe ND 1021
>>>>>> biological_process GeneDB_Tbrucei ND 1087
>>>>>> biological_process GeneDB_Tbrucei TAS 1
>>>>>> biological_process GR_protein IC 11
>>>>>> biological_process MGI IDA 1
>>>>>> biological_process MGI IMP 2
>>>>>> biological_process MGI ND 1382
>>>>>> biological_process PseudoCAP IDA 13
>>>>>> biological_process PseudoCAP ISS 2
>>>>>> biological_process PseudoCAP RCA 26
>>>>>> biological_process RGD IEA 1
>>>>>> biological_process RGD ND 607
>>>>>> biological_process SGD IMP 1
>>>>>> biological_process SGD NAS 1
>>>>>> biological_process SGD ND 1429
>>>>>> biological_process SGD TAS 1
>>>>>> biological_process TAIR ND 11086
>>>>>> biological_process TAIR RCA 12
>>>>>> biological_process TAIR TAS 3
>>>>>> biological_process TIGR_CMR ND 19190
>>>>>> biological_process TIGR_Tba1 ND 194
>>>>>> biological_process UniProt IEA 6
>>>>>> biological_process UniProt ND 966
>>>>>> biological_process WB IMP 1326
>>>>>> biological_process WB ND 2
>>>>>> biological_process ZFIN ND 5269
>>>>>> cellular_component Dictybase ISS 3
>>>>>> cellular_component Dictybase ND 1551
>>>>>> cellular_component FB ISS 1
>>>>>> cellular_component FB ND 2058
>>>>>> cellular_component GeneDB_Pfalciparum ND 288
>>>>>> cellular_component GeneDB_Spombe ND 190
>>>>>> cellular_component GeneDB_Tbrucei NAS 2
>>>>>> cellular_component GeneDB_Tbrucei ND 1623
>>>>>> cellular_component GeneDB_Tbrucei TAS 1
>>>>>> cellular_component GR_protein TAS 8
>>>>>> cellular_component MGI ND 1362
>>>>>> cellular_component MGI TAS 1
>>>>>> cellular_component PseudoCAP IDA 13
>>>>>> cellular_component PseudoCAP ISS 2
>>>>>> cellular_component RGD ND 718
>>>>>> cellular_component SGD ND 972
>>>>>> cellular_component SGD TAS 1
>>>>>> cellular_component TAIR ND 9877
>>>>>> cellular_component TAIR TAS 12
>>>>>> cellular_component TIGR_CMR ND 14318
>>>>>> cellular_component TIGR_Tba1 NAS 2
>>>>>> cellular_component TIGR_Tba1 ND 184
>>>>>> cellular_component UniProt ND 1278
>>>>>> cellular_component WB ND 55
>>>>>> cellular_component ZFIN ND 6283
>>>>>> molecular_function Dictybase ND 1064
>>>>>> molecular_function FB ND 1935
>>>>>> molecular_function FB TAS 1
>>>>>> molecular_function GeneDB_Lmajor IEA 57
>>>>>> molecular_function GeneDB_Pfalciparum IEA 38
>>>>>> molecular_function GeneDB_Pfalciparum ND 789
>>>>>> molecular_function GeneDB_Spombe ND 1452
>>>>>> molecular_function GeneDB_Tbrucei IEA 44
>>>>>> molecular_function GeneDB_Tbrucei ND 977
>>>>>> molecular_function GeneDB_Tbrucei TAS 7
>>>>>> molecular_function GR_protein IEA 255
>>>>>> molecular_function GR_protein RCA 15
>>>>>> molecular_function MGI ND 1381
>>>>>> molecular_function PseudoCAP IDA 13
>>>>>> molecular_function PseudoCAP ISS 2
>>>>>> molecular_function PseudoCAP RCA 46
>>>>>> molecular_function RGD ND 701
>>>>>> molecular_function SGD ISS 4
>>>>>> molecular_function SGD NAS 1
>>>>>> molecular_function SGD ND 2166
>>>>>> molecular_function SGD TAS 19
>>>>>> molecular_function TAIR NAS 3
>>>>>> molecular_function TAIR ND 10095
>>>>>> molecular_function TAIR RCA 403
>>>>>> molecular_function TAIR TAS 72
>>>>>> molecular_function TIGR_CMR ND 19337
>>>>>> molecular_function TIGR_Tba1 ND 181
>>>>>> molecular_function TIGR_Tba1 TAS 7
>>>>>> molecular_function UniProt ND 1124
>>>>>> molecular_function WB NAS 1
>>>>>> molecular_function WB ND 51
>>>>>> molecular_function WB TAS 2
>>>>>> molecular_function ZFIN ND 4950
>>>>>>
>>>>>>
>>>>>> --
>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>>>>> Limited, a charity registered in England with number 1021457 and a
>>>>>> company registered in England with number 2742969, whose registered
>>>>>> office is 215 Euston Road, London, NW1 2BE.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>>> Limited, a charity registered in England with number 1021457 and a
>>>> company registered in England with number 2742969, whose registered
>>>> office is 215 Euston Road, London, NW1 2BE.
>>>
>>> =====================================
>>> Jim Hu
>>> Associate Professor
>>> Dept. of Biochemistry and Biophysics
>>> 2128 TAMU
>>> Texas A&M Univ.
>>> College Station, TX 77843-2128
>>> 979-862-4054
>>>
>>>
>>
>
> =====================================
> Jim Hu
> Associate Professor
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4054
>
>
More information about the Go
mailing list