Epistemic formalism (was Re: [Phenoscape] Re: [go] evidence code ontology)

David Hill dph at informatics.jax.org
Wed Feb 6 14:10:10 PST 2008


Hi Everyone,

One issue I don't think we are considering here is the context of the 
experiment. Experiments are done contextually and are meant to capture 
information that is typical for the context. Until we capture context it 
is difficult to make a firm conclusion about one reference contradicting 
another. This is an issue I thought about when  we were working on what 
annotations really mean. They represent generalizations, but how typical 
are they? Can we detect when they are not typical? If we can detect when 
they are not typical, do we know why? Can we tell how typical they are? 
When do annotations really contradict one another, and when are they 
representing different information?

Publication #1 may say that an instance of a gene product is located an 
instance of a nucleus of an instance of a given cell type because the 
author saw it in a microscope. We then annotate using the corresponding 
types for the gene product and the cellular component from the 
ontologies. We would pick the evidence code IDA. In another publication 
#2 the authors may say an instance of a gene product is not found in an 
instance of a nucleus in an instance of different cell type because he 
does not see it there (in fact, he may see it somewhere else) under the 
microscope. We could then annotate using the corresponding types for the 
gene product and the cellular component from the ontologies but use NOT 
in the annotation. We can't conclude that publication #1 contradicts 
publication #2. We can conclude that the gene product is not always 
found in the nucleus. We are missing the key piece of contextual 
information, the cell type. What we can conclude is that finding the 
gene product in the nucleus might not be typical. We also might conclude 
that the gene products location in the nucleus is typical for the first 
cell type, but not for the second. So how do we decide when the 
annotations we make represent the typical cases? Sometimes we find that 
they don't. The accumulation of evidence over time tells us how typical 
they really are.

Another simple case of context is is two authors measuring the 
expression of a gene at the RNA level. If one author measures by 
Northern Blot, they may conclude the gene is not expressed. If another 
author uses a very sensitive assay such as RT-PCR they may detect 
expression of the gene. Would you consider this contradictory? I'd 
conclude that you can't detect it by Northern, but you can by RT-PCR. 
I'd also conclude that it is expressed. So in a way the conclusion made 
by the first author is incorrect, but it is consistent with the context 
in which he did the experiment.

I have thought hard about how as a biologist, I decide when I think 
experiments hold for all contexts and when they don't. What it comes 
down to is accumulated knowledge that is based on repeated and various 
experiments.Contextual information and the variety of contexts 
influences my opinion as to whether an experiment shows a 'typical' 
result. Somehow we need to be able to capture this process when we 
interpret annotations.

David

Larry Hunter wrote:
>
> On Feb 6, 2008, at 12:52 PM, Chris Mungall wrote:
>
>> Let's be clear about what you're asking for.
>>
>> If we have two assertions:
>>
>> [1] R(X,Y)
>> [2] R(X,not-Y)
>>
>> Where assertion [1] is supported by e1, and assertion [2] is 
>> supported by e2.
>>
>> e1 and e2, on the surface, contradict one another (this situation is 
>> actually a bit more subtle than this, it depends on how we treat [1] 
>> and [2]).
>>
>> You would like relations such as has_evidence, between the assertion 
>> and the evidence, and a contradicts relation between evidences that 
>> is entailed by the assertions?
>
> Not quite.
>
> I was actually thinking that the epistemic relationships would be 
> between the evidence and the propositions.  I wouldn't say the 
> "contradicts" relationship is between the evidences, but between 
> hypotheses and evidences.   And, of course, depending on what R is, 
> R(X, Y) may or may not be incompatible with R(X, not-Y).  Let me try 
> an alternate formulation to make sure I am being clear, trying to hew 
> to your notation.
>
> Let's start with instances.  Imagine we have made GO annotations use 
> explicit relations, as I described in my email addressed to Sue this 
> morning, and we have an instance level assertion (called [1]) of a 
> gene participating in a process:
>
> [1] gene1 participates-in process1
>
> Further imagine that this would have been annotated "inferred from 
> direct assay" with a pointer to PMID1 as evidence.  Glossing over what 
> the relationship "inferred from direct assay" really means for the 
> moment, and just treating it as a kind of evidential support, we might 
> want to assert:
>
> PMID1 supports [1]
>
> Now imagine a second publication that comes along that provides 
> evidence (perhaps also "inferred from direct assay") that gene1 does 
> NOT participate in process1.  Currently there would be another GO 
> annotation created, much like the first, but with the NOT qualifier.  
> Then I would suggest something like:
>
> PMID2 contradicts [1]
>
> In this case, we have one hypothesis (gene1 participates-in process1), 
> two pieces of evidence, and two epistemic relationships (one between 
> each piece of evidence and the hypothesis).
>
> It would not be difficult to make subclasses of supports (and 
> contradicts) that reflected the evidence codes as they are currently 
> used (e.g. supports-via-direct-assay, 
> contradicts-via-mutant-phenotype), although the many kinds of 
> relationships between evidence and hypotheses (many of which require 
> inference by someone along the way) suggest that a really good 
> taxonomy would be non-trivial to create.
>
> It would also not be difficult to use this structure to capture 
> evidence that is finer grained than the journal article (as 
> represented by PMID above), say using the contents of a particular 
> figure, table, or result statement as the evidence.  I think the OBI 
> DENRIE hierarchy nicely captures the things that can be evidence for a 
> scientific hypothesis, and subsumes each of the above.  It does make 
> evidence a kind of information entity, but the alternatives to that 
> all strike me as problematic.   OBI also has a "hypothesis" term.  
> Although there are some problems with the definition as it currently 
> stands  (see 
> https://sourceforge.net/tracker/?func=detail&atid=886178&aid=1887478&group_id=177891), 
> it may ultimately be the right domain for these epistemic relations.
>
> It would also not be difficult to make both "supports" and 
> "contradicts" children of a relationship like "is-relevant-to" (Barry 
> suggested "is-about").  Such a system would be immediately useful to 
> support all kinds of tools that could be helpful to biologists -- not 
> the least of which would be a query that says "show me all the 
> evidence relevant to the role of this gene in that process").  An 
> instance store that was represented using this formalism I think could 
> offer quite a boon in the utility of the annotation work.
>
> The situation with universals is analogous to the instance case above, 
> although we have never before mentioned evidence (as codes or 
> otherwise) regarding relationships among universals.  The reason for 
> that is the "reality-based" desiderata, and the consequent assumption 
> that its contents never need evidence (and can't be contradicted).   
> The proposal for a set of epistemic terms with a range over OBI 
> DENRIEs and with a domain of OBI hypotheses seems plausible to me at 
> this point.
>
> There are at least two somewhat objections that I can see that need to 
> be addressed.  First, these relations are most clearly second-order, 
> and cannot be represented in OWL-DL (although seem to offer no 
> challenges in OWL-FULL).  Second, it's not clear if every proposition 
> with which one might want to associate evidence is properly considered 
> a subclass of OBI hypothesis.   If we were to turn all of the GO 
> annotations into explicit relationship assertions (using "participates 
> in", "is located in" etc.) would we want all of those propositions to 
> be subclasses of OBI hypothesis?   If not, we need to come up with 
> something else to define the range of the epistemic relations.
>
> Larry




More information about the Go mailing list