[go] Requirement for all 'unknown' annotations to use ND code
Karen Christie
kchris at genome.Stanford.EDU
Mon Sep 17 15:35:11 PDT 2007
Hi,
responses inserted below
On Mon, 17 Sep 2007, Suzanna Lewis wrote:
>
> On Sep 17, 2007, at 11:47 AM, Karen Christie wrote:
>
>> Hi,
>>
>> The reason I brought this issue up was that I was very uncomfortable with
>> the rationale that people could use the ND evidence code as a way to find
>> the unknown annotations, or with having that purpose as a justification to
>> only allow the ND code for root annotations in our documentation. It seems
>> that we have come to consensus that we should not be saying anything about
>> this as a software shortcut to find unknown annotations.
>
> Whatever is decided. No one should do the above, ever. It was never, ever the
> justification for the ND-only-at-root restriction. So yes, we have consensus:
> we do -not- mention this in any documentation.
>
> I am convinced by what Ben and Mike have said. Yes we have overloaded the
> meaning of the evidence code here. ND is really curation status. What I'm
> still confused on Mike is the question I was trying to get at. You say
Annotating to the root node is definitely curation status, but I think
that ND is still an evidence code.
It's just different than all the other ones in the sense that the evidence
one has accumulated is not from a single reference, but rather from having
undertaken a search for information and not having found any. It may be
that the curator didn't find any papers at all, found a few papers none of
which provided any insight, or that there was no informative sequence
comparisons, but ND is still a statement about the basis of the
annotation, like all of the other evidence codes.
>>>>> A knockout with no phenotype is data, its a negative result yes but its
>>>>> data. You have data that there is no phenotype for that gene for the
>>>>> screen that was performed.
>
> So would this be an annotation to a root node with IMP?
To me, this situation is something that would generally not get annotated
at all, with one possible exception.
IF there was a STRONG expectation that knocking out a particular gene
would have a particular phenotype, one might opt to annotate it to a
relevant process term with the NOT qualifier. I should add though, that I
don't think I've ever done a NOT annotation on the basis of an IMP, but
that's the only annotation I can see that one might make from this type of
case where a null mutation has no phenotype.
>> The more recent discussion has been dealing with the issue that annotations
>> to the root node are a special case anyway, allowing us to track curation
>> progress. While we know that these aren't annotations of knowledge in the
>> same way as other annotations, I think the group has agreed many, many
>> times that we DO want a way to distinguish between genes that have been
>> looked but where nothing is known and genes that just haven't been curated
>> yet, so I think we're stuck with tracking curation progress in some way.
>>
>> In that sense, I can see a rationale for only allowing annotations to the
>> root node to be made with the ND code in that we are making a curatorial
>> statement about what a curator looked at in order to make an annotation to
>> the root node. The ND evidence code is already a special case in that it
>> can only be used for annotations to the root node.
>>
>> Provided that the documentation is phrased in terms of curatorial process,
>> i.e. the procedure required in order to be able to make the statement the a
>> given aspect is unknown for a given gene, I'm OK with this restriction.
>>
>> Note that I'm not volunteering to examine, or do any rewriting of, the ND
>> documentation since I'm due to deliver within the coming week.
>>
>> Thanks to everyone for a good discussion.
>>
>> -Karen
>>
>>
>>
>>
>> On Mon, 17 Sep 2007, Jim Hu wrote:
>>
>>> Makes sense to me (and I'm sure it won't be the last bad idea I throw out
>>> there).
>>>
>>> So... as a user, having access to instances where negative results have
>>> been found, as in the no phenotype example, is useful. Knowing that the
>>> mutant has been made and looked at is valuable. But I can see that it
>>> probably doesn't belong in GO.
>>>
>>> I think the distinction that you and Ben raise about curation progress vs.
>>> annotation is important. Perhaps curation progress really doesn't belong
>>> at all.
>>>
>>> Jim
>>>
>>>
>>>
>>> On Sep 17, 2007, at 12:53 PM, Mike Cherry wrote:
>>>
>>>> I think this is a bad idea. ISS to another organism that has the root
>>>> association is not useful. That just means that in the other organism
>>>> there was no data, you have to look for experiments in each organism. A
>>>> knockout with no phenotype is data, its a negative result yes but its
>>>> data. You have data that there is no phenotype for that gene for the
>>>> screen that was performed.
>>>> An association to the root was a convenience we used to show that we
>>>> looked for a result. Its not an annotation its a curation progress
>>>> statement. A note to say we are looking at all the genes. I don't like
>>>> the use of any experimental code for the root.
>>>> -Mike
>>>> On Sep 17, 2007, at 10:02 AM, Jim Hu wrote:
>>>>> On Sep 17, 2007, at 11:07 AM, Valerie Wood wrote:
>>>>>> I don't see how you can make an annotation to the root node using
>>>>>> RCA/IC/IMP/ISS or IDA?
>>>>> We haven't done these yet, but
>>>>> ISS - similarity to proteins annotated to the root node with ND in
>>>>> another organism?
>>>>> IMP - What does one do for large scale knockout screens when a KO shows
>>>>> no phenotype. Someone did look, so it's not really ND, is it?
>>>>> IDA seems pretty hard to rationalize. I can imagine negative results,
>>>>> as in "previous analysis suggested that gene X has activity Y, but we
>>>>> can't detect it" but wouldn't that get a NOT modifier for the assayed
>>>>> activity Y, if it was annotated at all? I'm actually thinking of a case
>>>>> where paper A says that an E. coli protein is a nuclease, and paper B
>>>>> shows that the nuclease activity is a contaminant. I'm thinking there
>>>>> are the following choices if that results in not knowing the function of
>>>>> the gene X product:
>>>>> * delete the annotation from paper A
>>>>> ** no annotation to the root node
>>>>> ** annotate to the root node
>>>>> * add the annotation from paper B with a not Y
>>>>> ** no annotation to the root node
>>>>> ** annotate to the root node
>>>>> I recall discussing this kind of situation with Karen, but I'm not sure
>>>>> that we covered how to handle the root node. Does this change if the
>>>>> information that the putative activity was a contaminant is not
>>>>> published but the curator knows about it from a meeting or a personal
>>>>> communication?
>>>>> Similarly, can one annotate to the root node with RCA if a computational
>>>>> analysis shows that protein X does not have previously suggested
>>>>> activity Y based on improved sophistication of motif analysis? Again,
>>>>> if this removes the only putative activity from an earlier analysis,
>>>>> does the protein get a root node annotation, or does it get nothing?
>>>>> Example, a protein is annotated by some project as a thioredoxin based
>>>>> on good sequence similarity to the fold family members. Later, someone
>>>>> notices that the active site residues are missing.
>>>>> Jim
>>>>>> The ND means the curator has looked at all the papers for this gene
>>>>>> (and for some databases checked the annotations to orthologs to see if
>>>>>> any sensible inferences can be made), and as of the data the annotation
>>>>>> was mane there is "no data".
>>>>>> We wouldn't be able to do this with any of the other evidence codes.
>>>>>> Val
>>>>>> Suzanna Lewis wrote:
>>>>>>> After reading through this thread I see no strong reason for
>>>>>>> requiring ND as the evidence code for annotations to the root.
>>>>>>> In fact, I'm now wondering why we have ND at all. Seems to me that
>>>>>>> "no data" is a result. It is not the type of experiment that was
>>>>>>> done. Maybe the only accurate use of ND is when we don't even know
>>>>>>> what kind of experiment was carried out.
>>>>>>> -S
>>>>>>> On Sep 17, 2007, at 8:42 AM, Valerie Wood wrote:
>>>>>>>> So we don't all need to run the query......
>>>>>>>> biological_process Dictybase ISS 1
>>>>>>>> biological_process Dictybase ND 1313
>>>>>>>> biological_process FB ND 1022
>>>>>>>> biological_process GeneDB_Pfalciparum ND 702
>>>>>>>> biological_process GeneDB_Spombe ND 1021
>>>>>>>> biological_process GeneDB_Tbrucei ND 1087
>>>>>>>> biological_process GeneDB_Tbrucei TAS 1
>>>>>>>> biological_process GR_protein IC 11
>>>>>>>> biological_process MGI IDA 1
>>>>>>>> biological_process MGI IMP 2
>>>>>>>> biological_process MGI ND 1382
>>>>>>>> biological_process PseudoCAP IDA 13
>>>>>>>> biological_process PseudoCAP ISS 2
>>>>>>>> biological_process PseudoCAP RCA 26
>>>>>>>> biological_process RGD IEA 1
>>>>>>>> biological_process RGD ND 607
>>>>>>>> biological_process SGD IMP 1
>>>>>>>> biological_process SGD NAS 1
>>>>>>>> biological_process SGD ND 1429
>>>>>>>> biological_process SGD TAS 1
>>>>>>>> biological_process TAIR ND 11086
>>>>>>>> biological_process TAIR RCA 12
>>>>>>>> biological_process TAIR TAS 3
>>>>>>>> biological_process TIGR_CMR ND 19190
>>>>>>>> biological_process TIGR_Tba1 ND 194
>>>>>>>> biological_process UniProt IEA 6
>>>>>>>> biological_process UniProt ND 966
>>>>>>>> biological_process WB IMP 1326
>>>>>>>> biological_process WB ND 2
>>>>>>>> biological_process ZFIN ND 5269
>>>>>>>> cellular_component Dictybase ISS 3
>>>>>>>> cellular_component Dictybase ND 1551
>>>>>>>> cellular_component FB ISS 1
>>>>>>>> cellular_component FB ND 2058
>>>>>>>> cellular_component GeneDB_Pfalciparum ND 288
>>>>>>>> cellular_component GeneDB_Spombe ND 190
>>>>>>>> cellular_component GeneDB_Tbrucei NAS 2
>>>>>>>> cellular_component GeneDB_Tbrucei ND 1623
>>>>>>>> cellular_component GeneDB_Tbrucei TAS 1
>>>>>>>> cellular_component GR_protein TAS 8
>>>>>>>> cellular_component MGI ND 1362
>>>>>>>> cellular_component MGI TAS 1
>>>>>>>> cellular_component PseudoCAP IDA 13
>>>>>>>> cellular_component PseudoCAP ISS 2
>>>>>>>> cellular_component RGD ND 718
>>>>>>>> cellular_component SGD ND 972
>>>>>>>> cellular_component SGD TAS 1
>>>>>>>> cellular_component TAIR ND 9877
>>>>>>>> cellular_component TAIR TAS 12
>>>>>>>> cellular_component TIGR_CMR ND 14318
>>>>>>>> cellular_component TIGR_Tba1 NAS 2
>>>>>>>> cellular_component TIGR_Tba1 ND 184
>>>>>>>> cellular_component UniProt ND 1278
>>>>>>>> cellular_component WB ND 55
>>>>>>>> cellular_component ZFIN ND 6283
>>>>>>>> molecular_function Dictybase ND 1064
>>>>>>>> molecular_function FB ND 1935
>>>>>>>> molecular_function FB TAS 1
>>>>>>>> molecular_function GeneDB_Lmajor IEA 57
>>>>>>>> molecular_function GeneDB_Pfalciparum IEA 38
>>>>>>>> molecular_function GeneDB_Pfalciparum ND 789
>>>>>>>> molecular_function GeneDB_Spombe ND 1452
>>>>>>>> molecular_function GeneDB_Tbrucei IEA 44
>>>>>>>> molecular_function GeneDB_Tbrucei ND 977
>>>>>>>> molecular_function GeneDB_Tbrucei TAS 7
>>>>>>>> molecular_function GR_protein IEA 255
>>>>>>>> molecular_function GR_protein RCA 15
>>>>>>>> molecular_function MGI ND 1381
>>>>>>>> molecular_function PseudoCAP IDA 13
>>>>>>>> molecular_function PseudoCAP ISS 2
>>>>>>>> molecular_function PseudoCAP RCA 46
>>>>>>>> molecular_function RGD ND 701
>>>>>>>> molecular_function SGD ISS 4
>>>>>>>> molecular_function SGD NAS 1
>>>>>>>> molecular_function SGD ND 2166
>>>>>>>> molecular_function SGD TAS 19
>>>>>>>> molecular_function TAIR NAS 3
>>>>>>>> molecular_function TAIR ND 10095
>>>>>>>> molecular_function TAIR RCA 403
>>>>>>>> molecular_function TAIR TAS 72
>>>>>>>> molecular_function TIGR_CMR ND 19337
>>>>>>>> molecular_function TIGR_Tba1 ND 181
>>>>>>>> molecular_function TIGR_Tba1 TAS 7
>>>>>>>> molecular_function UniProt ND 1124
>>>>>>>> molecular_function WB NAS 1
>>>>>>>> molecular_function WB ND 51
>>>>>>>> molecular_function WB TAS 2
>>>>>>>> molecular_function ZFIN ND 4950
>>>>>>>> --
>>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>>>>>>> Limited, a charity registered in England with number 1021457 and a
>>>>>>>> company registered in England with number 2742969, whose registered
>>>>>>>> office is 215 Euston Road, London, NW1 2BE.
>>>>>> --
>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>>>>> Limited, a charity registered in England with number 1021457 and a
>>>>>> company registered in England with number 2742969, whose registered
>>>>>> office is 215 Euston Road, London, NW1 2BE.
>>>>> =====================================
>>>>> Jim Hu
>>>>> Associate Professor
>>>>> Dept. of Biochemistry and Biophysics
>>>>> 2128 TAMU
>>>>> Texas A&M Univ.
>>>>> College Station, TX 77843-2128
>>>>> 979-862-4054
>>>
>>> =====================================
>>> Jim Hu
>>> Associate Professor
>>> Dept. of Biochemistry and Biophysics
>>> 2128 TAMU
>>> Texas A&M Univ.
>>> College Station, TX 77843-2128
>>> 979-862-4054
>>>
>>>
>
More information about the Go
mailing list