[Go] addition of localization specific process terms ?
David Hill
dph at informatics.jax.org
Thu Mar 5 04:15:42 PST 2009
I agree with Chris on this. There have always been 2 proposals to handle
the details in the information that go waaaaaay back to the beginning
when we used to ask 'are we going to include anatomy in the ontology?'
The fist precompositional approach is to put the terms in the ontology.
The second would be to use a post-compositional strategy through
structured annotations. We have column 16 for this usage to connect GO
to external ontologies. Do we also want to do this for internal ontologies?
I also agree with Chris about the discrimination of when to make terms
and when not to make terms. There should be a GUIDELNE that says if the
process isn't really different, then perhaps we should leave the
representation as something post-compositional. This is the same
rational Tanya and Chris and I used during the 'regulates' project to
justify a term like 'regulation of transcription involved in forebrain
patterning' but not a term like 'transcription involved in forebrain
patterning'. At some point, we can have a procedure to make these
bona-fide terms. It should be easy to do this looking at the structured
annotations.
Co-annotation in and of itself doesn't convey the specific information
in a precompositional term or in a structured annotation, but we should
keep in mind that we need to find ways to explore co-annotations as
well. They are the bread and butter of hypothesis generations because
they can be used to integrate the various pieces of information that
derive from different sources.
And lastly (sorry about being so long-winded), I also agree that the
biggest challenge we now face is in the representation of the ontology
to the user. With interontology links and cross-products basically in
place, we now need to figure out good ways to slice and dice the
ontology to give the user the view of the ontology in the context that
they are interested in. We need to present the information in a way that
our brain usually presents information. We don't think about every
aspect of an object or something that is happening, we think about it in
a context. We choose the context almost subconsciously. We are not at a
point to try to guess the context for users yet, but we might be able to
give them choices of the views that they want to see. One place to
start would be to use the cross-product elements to provide a choice.
David
Chris Mungall wrote:
>
> On Mar 4, 2009, at 9:02 AM, Valerie Wood wrote:
>
>> I agree that the aa specific tRNA aminoacylation process terms are
>> excessively granular (although I have used them), as they are
>> equivalent to the function terms.
>
> ah good point, but I think this is a separate issue entirely -- which
> we have started to address, re process-function links
>
>> I still think we need terms for mitochondrial tRNA aminoacylation
>> because, although the process is the same (and sometimes the gene
>> products involved) the process has different target genes and hence
>> different biological consequences (and phenotypes).
>
> This seems reasonable to me.
>
>> For your simple search, (i.e to retrieve genes involved in
>> mitochondrial amino acylation) a combination of mitochondria and tRNA
>> aminoacylation would work fine.
>
> except that a tool doesn't know for sure that, say, Pombe grs1
> participates in tRNA aminoacylation *in the mt*, based on existing
> grs1 annotations. The mt localization could be for the purposes of
> executing a different function.
>
>> However this is not the major use of GO. Increasingly GO is used for
>> hypothesis generating exercises with a complete gene set, these
>> combination searches are not helpful, you need to be able to look for
>> enrichment at the level of process, none of the enrichment tools
>> (including the one in AmiGO) can perform these inter ontology analyses.
>
> Good point. I certainly do not advocate doing anything that would in
> any way diminish the capabilities of current tools, without providing
> any alternative
>
>> In a genome wide set you would not be able to detect (for example)
>> that the set of essential genes in S. cerevisiae is enriched for
>> translation components BUT not for mitochondrial translation
>> components, but that in pombe the mitochondrial translation
>> components are also essential (this is a real example).
>
> Interesting. On the flipside, do you have any intuition as to whether
> we are missing useful enrichment results because we have not
> sufficiently pre-composed?
>
> So this points to another, possibly equally good criteria for deciding
> whether a term should be pre-composed in the ontology: is it useful in
> hypothesis generation?
>
> My feeling is that whilst the two criteria are related, we should
> de-couple these, because we can't know in advance what the interesting
> hypotheses are -- the hypothesis generation should come from the data.
> The set of interesting terms are almost certainly larger than the set
> we can hope to pre-compose.
>
> But where does this leave everyday practical tools? In fact there is a
> solution which does not involve pre-composing the full set of
> biologically instantiated cross-products. We can take the full set of
> terms composed at annotation time, and generate an "on-the-fly
> analysis ontology", using a reasoner to compute the additional links.
>
> For example, let's assume for a moment Jim convinces us that
> translation really is the same in the mitochondrion, and the
> pre-composition is not warranted. We would obsolete "translation in
> mt" and replace existing annotations to co-annotations to translation"
> and "mt", with the process annotation having "occurs_in(mt)" filled in
> in col16.
>
> Then, as part of the release cycle we would take annotations with
> col16 filled in, using this to generate on-the-fly terms such as
> "translation in mt" with IDs clearly indicated such that these are not
> permanent GO IDs, then as part of the release cycle the reasoner would
> fill in links in the on-the-fly ontology such that "GO:temp
> translation in mt" is_a "protein metabolism in cytoplasm" (for
> example). Annotations using col16 could then be mapped to these temp
> IDs. As far as tools are concerned, this is *exactly* the same as
> working with existing GO terms and annotations, except the graph would
> be a bit bigger and more tangled (not necessarily a concern for
> summarising enrichment results). Certain compositions such as
> "translation in mt" may come up in analyses, and that's fine. Other
> compositions won't, but that's fine too.
>
> Of course, the infrastructure to support this would not spring into
> existence overnight, and there would need to be considerable
> documentation because people tend to have conceptual problems with
> this sort of thing, even though it's no harder than ordering pizzas
> with your own toppings. There are issues with scale when we apply this
> to other xps, such as anatomy, but if we focus the discussion on the
> BP x CC xp set for now, it's scalable.
>
> This isn't the only way this would work. Tools could use the core GO
> and col16 together, but it's more work for the tool developers.
>
> So in summary I would propose the following course of action:
>
> - canonize the "is it different" principle as a basis for choosing
> whether to pre-compose; however
> - be ultra-conservative in obsoleting existing pre-composed terms
> - be fairly liberal in composing new BPxCC terms (we can always map to
> annotation xps later)
>
> This is really just an affirmation of the status quo. Val will
> continue to see "mt translation" in her term enrichment analyses.
>
> Then in addition:
>
> - work aggressively towards all groups using col16 for CC localization
> in BP annotations
> - tools can ignore col16 and they are no worse off than they are now
> - we demonstrate how this can be used to provide enhanced results
> above and beyond what is possible now, perhaps working with a few
> select external tool developers
>
> When this is in place, and people are comfortable with the concepts
> once they see the tools being used, then we can examine the criteria
> for pre-composition again. I think we can then apply the
> is-it-different criteria more consistently without worrying about
> losing enrichment results.
>
>>
>> Val
>>
>>
>> Harold Drabkin wrote:
>>
>>> Yes, and one rule of thumb was to think about whether the actual
>>> process is different rather than where it is.
>>> In Jim's example, the processing of tRNA charging (actually one
>>> enzyme) is the same whether it is in the mitochondria or
>>> chloroplast: amino acid gets activated with ATP, then transfer of
>>> the AA from AA-AMP to tRNA (all with one enzyme). The only time this
>>> process is "different" is in the caseswhere, for example, there is
>>> not a gln RS, but you can get gln-tRNA by first making a glu-tRNA,
>>> then amination (by a separate gene_product) to gln-tRNA. However,
>>> the first step (the making of glu-tRNA, the "charging" is STILL the
>>> same".
>>>
>>> It makes more sense in this case to annotate concurrently. I'd find
>>> genes annotated by asking for genes annotated to mitochondria AND
>>> tRNA aminoacylation and look at what I get. A single term for each
>>> and every process that takes into account where it is kind of
>>> defeats the purpose of having the three ontologies I would think?
>>>
>>>
>>>
>>> Jim Hu wrote:
>>>
>>>> On Mar 4, 2009, at 2:38 AM, Valerie Wood wrote:
>>>>
>>>>> Because of all of the arguments in favour mentioned by Karen and
>>>>> Chris I thought it was always necessary and required for curators
>>>>> to make the more granular annotation in these cases. We decided
>>>>> long ago that proliferation of the ontology was not an issue when
>>>>> pitched against accurate capture of biology, and I wasn't aware
>>>>> that it was ever GO philosophy not to capture compartment specific
>>>>> processes in this way.
>>>>
>>>>
>>>> I wasn't involved in GO when this was decided, but as someone who
>>>> does stuff on the software side as well as the annotation side, I
>>>> think proliferation of the ontology should be an issue that is not
>>>> dismissed so lightly. I just posted a similar concern on the SF
>>>> item on children of protein folding, and I've taken that position
>>>> on the binding terms as well.
>>>>
>>>> The number of components where translation occurs is much smaller
>>>> than the number of ligands for binding or proteins for folding.
>>>> But translation has lots of children. I see that there is already
>>>> proliferation of mitochondrial child terms among these, including
>>>> 20 terms for tRNA charging. There are other child branches that
>>>> don't have precomposed mitochondrial child terms. If it was up to
>>>> me, I'd obsolete things like
>>>> GO:0070143_!_mitochondrial_alanyl-tRNA_aminoacylation rather than
>>>> making more children for everything else.
>>>>
>>>> This strikes me as being way too much like adding back sensu terms.
>>>>
>>>> <snip>
>>>>
>>>>>> However, mitochondrial translation is fundamentally different
>>>>>> from nuclear translation because a different genetic code is used.
>>>>>
>>>>
>>>> But is this a fundamental difference? Codon reassignments have
>>>> occurred many times across a variety of species. This includes
>>>> prokaryotes and cytosolic eukaryotic systems as well as
>>>> mitochondria. And not all mitochondria use the same noncanonical
>>>> code. See:
>>>>
>>>> http://www.nature.com/nrg/journal/v2/n1/full/nrg0101_049a.html
>>>>
>>>> Just look at Figure 2 if you don't want to read the whole thing.
>>>> And that's just taking Chris' instance of what is supposed to be a
>>>> clear cut example. Is translation in the silkworm silk gland
>>>> fundamentally different enough? In that case, specialized tRNA
>>>> genes are overexpressed to deal with the massive use of alanine in
>>>> the silk protein.
>>>>
>>>> I find the argument that one can't do an AND with some tools to be
>>>> more of an argument to improve the tools than an argument to do
>>>> extensive precomposition. If we have to build GO practice around
>>>> the weakest tools, then we should also do explicit annotation all
>>>> the way up to root for every term, to handle tools that don't use
>>>> the true path rule. I'm NOT advocating that!!
>>>>
>>>> Jim
>>>>
>>>> =====================================
>>>> Jim Hu
>>>> Associate Professor
>>>> Dept. of Biochemistry and Biophysics
>>>> 2128 TAMU
>>>> Texas A&M Univ.
>>>> College Station, TX 77843-2128
>>>> 979-862-4054
>>>>
>>>>
>>>> _______________________________________________
>>>> Go mailing list
>>>> Go at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome Research
>> Limited, a charity registered in England with number 1021457 and a
>> company registered in England with number 2742969, whose registered
>> office is 215 Euston Road, London, NW1 2BE.
>
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go
--
David P. Hill, Ph.D.
Bioinformatics Scientist: Ontology Development
Gene Ontology Consortium
The Jackson Laboratory
www.geneontology.org
www.informatics.jax.org
tel:207-288-6430
More information about the Go
mailing list