[go] Requirement for all 'unknown' annotations to use ND code

Karen Christie kchris at genome.Stanford.EDU
Mon Sep 17 15:35:11 PDT 2007


Hi,

responses inserted below


On Mon, 17 Sep 2007, Suzanna Lewis wrote:

>
> On Sep 17, 2007, at 11:47 AM, Karen Christie wrote:
>
>> Hi,
>> 
>> The reason I brought this issue up was that I was very uncomfortable with 
>> the rationale that people could use the ND evidence code as a way to find 
>> the unknown annotations, or with having that purpose as a justification to 
>> only allow the ND code for root annotations in our documentation. It seems 
>> that we have come to consensus that we should not be saying anything about 
>> this as a software shortcut to find unknown annotations.
>
> Whatever is decided. No one should do the above, ever. It was never, ever the 
> justification for the ND-only-at-root restriction. So yes, we have consensus: 
> we do -not- mention this in any documentation.
>
> I am convinced by what Ben and Mike have said. Yes we have overloaded the 
> meaning of the evidence code here. ND is really curation status. What I'm 
> still confused on Mike is the question I was trying to get at. You say

Annotating to the root node is definitely curation status, but I think 
that ND is still an evidence code.

It's just different than all the other ones in the sense that the evidence 
one has accumulated is not from a single reference, but rather from having 
undertaken a search for information and not having found any. It may be 
that the curator didn't find any papers at all, found a few papers none of 
which provided any insight, or that there was no informative sequence 
comparisons, but ND is still a statement about the basis of the 
annotation, like all of the other evidence codes.


>>>>> A knockout with no phenotype is data, its a negative result yes but its 
>>>>> data. You have data that there is no phenotype for that gene for the 
>>>>> screen that was performed.
>
> So would this be an annotation to a root node with IMP?

To me, this situation is something that would generally not get annotated 
at all, with one possible exception.

IF there was a STRONG expectation that knocking out a particular gene 
would have a particular phenotype, one might opt to annotate it to a 
relevant process term with the NOT qualifier. I should add though, that I 
don't think I've ever done a NOT annotation on the basis of an IMP, but 
that's the only annotation I can see that one might make from this type of 
case where a null mutation has no phenotype.


>> The more recent discussion has been dealing with the issue that annotations 
>> to the root node are a special case anyway, allowing us to track curation 
>> progress. While we know that these aren't annotations of knowledge in the 
>> same way as other annotations, I think the group has agreed many, many 
>> times that we DO want a way to distinguish between genes that have been 
>> looked but where nothing is known and genes that just haven't been curated 
>> yet, so I think we're stuck with tracking curation progress in some way.
>> 
>> In that sense, I can see a rationale for only allowing annotations to the 
>> root node to be made with the ND code in that we are making a curatorial 
>> statement about what a curator looked at in order to make an annotation to 
>> the root node. The ND evidence code is already a special case in that it 
>> can only be used for annotations to the root node.
>> 
>> Provided that the documentation is phrased in terms of curatorial process, 
>> i.e. the procedure required in order to be able to make the statement the a 
>> given aspect is unknown for a given gene, I'm OK with this restriction.
>> 
>> Note that I'm not volunteering to examine, or do any rewriting of, the ND 
>> documentation since I'm due to deliver within the coming week.
>> 
>> Thanks to everyone for a good discussion.
>> 
>> -Karen
>> 
>> 
>> 
>> 
>> On Mon, 17 Sep 2007, Jim Hu wrote:
>> 
>>> Makes sense to me (and I'm sure it won't be the last bad idea I throw out 
>>> there).
>>> 
>>> So... as a user, having access to instances where negative results have 
>>> been found, as in the no phenotype example, is useful.  Knowing that the 
>>> mutant has been made and looked at is valuable.  But I can see that it 
>>> probably doesn't belong in GO.
>>> 
>>> I think the distinction that you and Ben raise about curation progress vs. 
>>> annotation is important.  Perhaps curation progress really doesn't belong 
>>> at all.
>>> 
>>> Jim
>>> 
>>> 
>>> 
>>> On Sep 17, 2007, at 12:53 PM, Mike Cherry wrote:
>>> 
>>>> I think this is a bad idea.  ISS to another organism that has the root 
>>>> association is not useful.  That just means that in the other organism 
>>>> there was no data, you have to look for experiments in each organism.  A 
>>>> knockout with no phenotype is data, its a negative result yes but its 
>>>> data. You have data that there is no phenotype for that gene for the 
>>>> screen that was performed.
>>>> An association to the root was a convenience we used to show that we 
>>>> looked for a result.  Its not an annotation its a curation progress 
>>>> statement.  A note to say we are looking at all the genes.  I don't like 
>>>> the use of any experimental code for the root.
>>>> -Mike
>>>> On Sep 17, 2007, at 10:02 AM, Jim Hu wrote:
>>>>> On Sep 17, 2007, at 11:07 AM, Valerie Wood wrote:
>>>>>> I don't see how you can make an annotation to the root node using
>>>>>> RCA/IC/IMP/ISS or IDA?
>>>>> We haven't done these yet, but
>>>>> ISS - similarity to proteins annotated to the root node with ND in 
>>>>> another organism?
>>>>> IMP - What does one do for large scale knockout screens when a KO shows 
>>>>> no phenotype.  Someone did look, so it's not really ND, is it?
>>>>> IDA seems pretty hard to rationalize.  I can imagine negative results, 
>>>>> as in "previous analysis suggested that gene X has activity Y, but we 
>>>>> can't detect it"  but wouldn't that get a NOT modifier for the assayed 
>>>>> activity Y, if it was annotated at all?  I'm actually thinking of a case 
>>>>> where paper A says that an E. coli protein is a nuclease, and paper B 
>>>>> shows that the nuclease activity is a contaminant.  I'm thinking there 
>>>>> are the following choices if that results in not knowing the function of 
>>>>> the gene X product:
>>>>> * delete the annotation from paper A
>>>>> ** no annotation to the root node
>>>>> ** annotate to the root node
>>>>> *  add the annotation from paper B with a not Y
>>>>> ** no annotation to the root node
>>>>> ** annotate to the root node
>>>>> I recall discussing this kind of situation with Karen, but I'm not sure 
>>>>> that we covered how to handle the root node.  Does this change if the 
>>>>> information that the putative activity was a contaminant is not 
>>>>> published but the curator knows about it from a meeting or a personal 
>>>>> communication?
>>>>> Similarly, can one annotate to the root node with RCA if a computational 
>>>>> analysis shows that protein X does not have previously suggested 
>>>>> activity Y based on improved sophistication of motif analysis?  Again, 
>>>>> if this removes the only putative activity from an earlier analysis, 
>>>>> does the protein get a root node annotation, or does it get nothing? 
>>>>> Example, a protein is annotated by some project as a thioredoxin based 
>>>>> on good sequence similarity to the fold family members.  Later, someone 
>>>>> notices that the active site residues are missing.
>>>>> Jim
>>>>>> The ND means the curator has looked at all the papers for this gene 
>>>>>> (and for some databases checked the annotations to orthologs to see if 
>>>>>> any sensible inferences can be made), and as of the data the annotation 
>>>>>> was mane there is "no data".
>>>>>> We  wouldn't be able to do this with any of the other evidence codes.
>>>>>> Val
>>>>>> Suzanna Lewis wrote:
>>>>>>> After reading through this thread I see no strong reason for 
>>>>>>> requiring ND as the evidence code for annotations to the root.
>>>>>>> In fact, I'm now wondering why we have ND at all. Seems to me that 
>>>>>>> "no data" is a result. It is not the type of experiment that was 
>>>>>>> done. Maybe the only accurate use of ND is when we don't even know 
>>>>>>> what kind of experiment was carried out.
>>>>>>> -S
>>>>>>> On Sep 17, 2007, at 8:42 AM, Valerie Wood wrote:
>>>>>>>> So we don't all need to run the query......
>>>>>>>> biological_process Dictybase ISS 1
>>>>>>>> biological_process Dictybase ND 1313
>>>>>>>> biological_process FB ND 1022
>>>>>>>> biological_process GeneDB_Pfalciparum ND 702
>>>>>>>> biological_process GeneDB_Spombe ND 1021
>>>>>>>> biological_process GeneDB_Tbrucei ND 1087
>>>>>>>> biological_process GeneDB_Tbrucei TAS 1
>>>>>>>> biological_process GR_protein IC 11
>>>>>>>> biological_process MGI IDA 1
>>>>>>>> biological_process MGI IMP 2
>>>>>>>> biological_process MGI ND 1382
>>>>>>>> biological_process PseudoCAP IDA 13
>>>>>>>> biological_process PseudoCAP ISS 2
>>>>>>>> biological_process PseudoCAP RCA 26
>>>>>>>> biological_process RGD IEA 1
>>>>>>>> biological_process RGD ND 607
>>>>>>>> biological_process SGD IMP 1
>>>>>>>> biological_process SGD NAS 1
>>>>>>>> biological_process SGD ND 1429
>>>>>>>> biological_process SGD TAS 1
>>>>>>>> biological_process TAIR ND 11086
>>>>>>>> biological_process TAIR RCA 12
>>>>>>>> biological_process TAIR TAS 3
>>>>>>>> biological_process TIGR_CMR ND 19190
>>>>>>>> biological_process TIGR_Tba1 ND 194
>>>>>>>> biological_process UniProt IEA 6
>>>>>>>> biological_process UniProt ND 966
>>>>>>>> biological_process WB IMP 1326
>>>>>>>> biological_process WB ND 2
>>>>>>>> biological_process ZFIN ND 5269
>>>>>>>> cellular_component Dictybase ISS 3
>>>>>>>> cellular_component Dictybase ND 1551
>>>>>>>> cellular_component FB ISS 1
>>>>>>>> cellular_component FB ND 2058
>>>>>>>> cellular_component GeneDB_Pfalciparum ND 288
>>>>>>>> cellular_component GeneDB_Spombe ND 190
>>>>>>>> cellular_component GeneDB_Tbrucei NAS 2
>>>>>>>> cellular_component GeneDB_Tbrucei ND 1623
>>>>>>>> cellular_component GeneDB_Tbrucei TAS 1
>>>>>>>> cellular_component GR_protein TAS 8
>>>>>>>> cellular_component MGI ND 1362
>>>>>>>> cellular_component MGI TAS 1
>>>>>>>> cellular_component PseudoCAP IDA 13
>>>>>>>> cellular_component PseudoCAP ISS 2
>>>>>>>> cellular_component RGD ND 718
>>>>>>>> cellular_component SGD ND 972
>>>>>>>> cellular_component SGD TAS 1
>>>>>>>> cellular_component TAIR ND 9877
>>>>>>>> cellular_component TAIR TAS 12
>>>>>>>> cellular_component TIGR_CMR ND 14318
>>>>>>>> cellular_component TIGR_Tba1 NAS 2
>>>>>>>> cellular_component TIGR_Tba1 ND 184
>>>>>>>> cellular_component UniProt ND 1278
>>>>>>>> cellular_component WB ND 55
>>>>>>>> cellular_component ZFIN ND 6283
>>>>>>>> molecular_function Dictybase ND 1064
>>>>>>>> molecular_function FB ND 1935
>>>>>>>> molecular_function FB TAS 1
>>>>>>>> molecular_function GeneDB_Lmajor IEA 57
>>>>>>>> molecular_function GeneDB_Pfalciparum IEA 38
>>>>>>>> molecular_function GeneDB_Pfalciparum ND 789
>>>>>>>> molecular_function GeneDB_Spombe ND 1452
>>>>>>>> molecular_function GeneDB_Tbrucei IEA 44
>>>>>>>> molecular_function GeneDB_Tbrucei ND 977
>>>>>>>> molecular_function GeneDB_Tbrucei TAS 7
>>>>>>>> molecular_function GR_protein IEA 255
>>>>>>>> molecular_function GR_protein RCA 15
>>>>>>>> molecular_function MGI ND 1381
>>>>>>>> molecular_function PseudoCAP IDA 13
>>>>>>>> molecular_function PseudoCAP ISS 2
>>>>>>>> molecular_function PseudoCAP RCA 46
>>>>>>>> molecular_function RGD ND 701
>>>>>>>> molecular_function SGD ISS 4
>>>>>>>> molecular_function SGD NAS 1
>>>>>>>> molecular_function SGD ND 2166
>>>>>>>> molecular_function SGD TAS 19
>>>>>>>> molecular_function TAIR NAS 3
>>>>>>>> molecular_function TAIR ND 10095
>>>>>>>> molecular_function TAIR RCA 403
>>>>>>>> molecular_function TAIR TAS 72
>>>>>>>> molecular_function TIGR_CMR ND 19337
>>>>>>>> molecular_function TIGR_Tba1 ND 181
>>>>>>>> molecular_function TIGR_Tba1 TAS 7
>>>>>>>> molecular_function UniProt ND 1124
>>>>>>>> molecular_function WB NAS 1
>>>>>>>> molecular_function WB ND 51
>>>>>>>> molecular_function WB TAS 2
>>>>>>>> molecular_function ZFIN ND 4950
>>>>>>>> -- 
>>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research 
>>>>>>>> Limited, a charity registered in England with number 1021457 and a 
>>>>>>>> company registered in England with number 2742969, whose registered 
>>>>>>>> office is 215 Euston Road, London, NW1 2BE.
>>>>>> -- 
>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research 
>>>>>> Limited, a charity registered in England with number 1021457 and a 
>>>>>> company registered in England with number 2742969, whose registered 
>>>>>> office is 215 Euston Road, London, NW1 2BE.
>>>>> =====================================
>>>>> Jim Hu
>>>>> Associate Professor
>>>>> Dept. of Biochemistry and Biophysics
>>>>> 2128 TAMU
>>>>> Texas A&M Univ.
>>>>> College Station, TX 77843-2128
>>>>> 979-862-4054
>>> 
>>> =====================================
>>> Jim Hu
>>> Associate Professor
>>> Dept. of Biochemistry and Biophysics
>>> 2128 TAMU
>>> Texas A&M Univ.
>>> College Station, TX 77843-2128
>>> 979-862-4054
>>> 
>>> 
>



More information about the Go mailing list