[Go] addition of localization specific process terms ?
Chris Mungall
cjm at berkeleybop.org
Wed Mar 4 13:31:54 PST 2009
On Mar 4, 2009, at 9:02 AM, Valerie Wood wrote:
> I agree that the aa specific tRNA aminoacylation process terms are
> excessively granular (although I have used them), as they are
> equivalent to the function terms.
ah good point, but I think this is a separate issue entirely -- which
we have started to address, re process-function links
> I still think we need terms for mitochondrial tRNA aminoacylation
> because, although the process is the same (and sometimes the gene
> products involved) the process has different target genes and hence
> different biological consequences (and phenotypes).
This seems reasonable to me.
> For your simple search, (i.e to retrieve genes involved in
> mitochondrial amino acylation) a combination of mitochondria and
> tRNA aminoacylation would work fine.
except that a tool doesn't know for sure that, say, Pombe grs1
participates in tRNA aminoacylation *in the mt*, based on existing
grs1 annotations. The mt localization could be for the purposes of
executing a different function.
> However this is not the major use of GO. Increasingly GO is used for
> hypothesis generating exercises with a complete gene set, these
> combination searches are not helpful, you need to be able to look
> for enrichment at the level of process, none of the enrichment tools
> (including the one in AmiGO) can perform these inter ontology
> analyses.
Good point. I certainly do not advocate doing anything that would in
any way diminish the capabilities of current tools, without providing
any alternative
> In a genome wide set you would not be able to detect (for example)
> that the set of essential genes in S. cerevisiae is enriched for
> translation components BUT not for mitochondrial translation
> components, but that in pombe the mitochondrial translation
> components are also essential (this is a real example).
Interesting. On the flipside, do you have any intuition as to whether
we are missing useful enrichment results because we have not
sufficiently pre-composed?
So this points to another, possibly equally good criteria for deciding
whether a term should be pre-composed in the ontology: is it useful in
hypothesis generation?
My feeling is that whilst the two criteria are related, we should de-
couple these, because we can't know in advance what the interesting
hypotheses are -- the hypothesis generation should come from the data.
The set of interesting terms are almost certainly larger than the set
we can hope to pre-compose.
But where does this leave everyday practical tools? In fact there is a
solution which does not involve pre-composing the full set of
biologically instantiated cross-products. We can take the full set of
terms composed at annotation time, and generate an "on-the-fly
analysis ontology", using a reasoner to compute the additional links.
For example, let's assume for a moment Jim convinces us that
translation really is the same in the mitochondrion, and the pre-
composition is not warranted. We would obsolete "translation in mt"
and replace existing annotations to co-annotations to translation" and
"mt", with the process annotation having "occurs_in(mt)" filled in in
col16.
Then, as part of the release cycle we would take annotations with
col16 filled in, using this to generate on-the-fly terms such as
"translation in mt" with IDs clearly indicated such that these are not
permanent GO IDs, then as part of the release cycle the reasoner would
fill in links in the on-the-fly ontology such that "GO:temp
translation in mt" is_a "protein metabolism in cytoplasm" (for
example). Annotations using col16 could then be mapped to these temp
IDs. As far as tools are concerned, this is *exactly* the same as
working with existing GO terms and annotations, except the graph would
be a bit bigger and more tangled (not necessarily a concern for
summarising enrichment results). Certain compositions such as
"translation in mt" may come up in analyses, and that's fine. Other
compositions won't, but that's fine too.
Of course, the infrastructure to support this would not spring into
existence overnight, and there would need to be considerable
documentation because people tend to have conceptual problems with
this sort of thing, even though it's no harder than ordering pizzas
with your own toppings. There are issues with scale when we apply this
to other xps, such as anatomy, but if we focus the discussion on the
BP x CC xp set for now, it's scalable.
This isn't the only way this would work. Tools could use the core GO
and col16 together, but it's more work for the tool developers.
So in summary I would propose the following course of action:
- canonize the "is it different" principle as a basis for choosing
whether to pre-compose; however
- be ultra-conservative in obsoleting existing pre-composed terms
- be fairly liberal in composing new BPxCC terms (we can always map to
annotation xps later)
This is really just an affirmation of the status quo. Val will
continue to see "mt translation" in her term enrichment analyses.
Then in addition:
- work aggressively towards all groups using col16 for CC localization
in BP annotations
- tools can ignore col16 and they are no worse off than they are now
- we demonstrate how this can be used to provide enhanced results
above and beyond what is possible now, perhaps working with a few
select external tool developers
When this is in place, and people are comfortable with the concepts
once they see the tools being used, then we can examine the criteria
for pre-composition again. I think we can then apply the is-it-
different criteria more consistently without worrying about losing
enrichment results.
>
> Val
>
>
> Harold Drabkin wrote:
>
>> Yes, and one rule of thumb was to think about whether the actual
>> process is different rather than where it is.
>> In Jim's example, the processing of tRNA charging (actually one
>> enzyme) is the same whether it is in the mitochondria or
>> chloroplast: amino acid gets activated with ATP, then transfer of
>> the AA from AA-AMP to tRNA (all with one enzyme). The only time
>> this process is "different" is in the caseswhere, for example,
>> there is not a gln RS, but you can get gln-tRNA by first making a
>> glu-tRNA, then amination (by a separate gene_product) to gln-tRNA.
>> However, the first step (the making of glu-tRNA, the "charging" is
>> STILL the same".
>>
>> It makes more sense in this case to annotate concurrently. I'd
>> find genes annotated by asking for genes annotated to mitochondria
>> AND tRNA aminoacylation and look at what I get. A single term for
>> each and every process that takes into account where it is kind of
>> defeats the purpose of having the three ontologies I would think?
>>
>>
>>
>> Jim Hu wrote:
>>
>>> On Mar 4, 2009, at 2:38 AM, Valerie Wood wrote:
>>>
>>>> Because of all of the arguments in favour mentioned by Karen and
>>>> Chris I thought it was always necessary and required for
>>>> curators to make the more granular annotation in these cases. We
>>>> decided long ago that proliferation of the ontology was not an
>>>> issue when pitched against accurate capture of biology, and I
>>>> wasn't aware that it was ever GO philosophy not to capture
>>>> compartment specific processes in this way.
>>>
>>>
>>> I wasn't involved in GO when this was decided, but as someone who
>>> does stuff on the software side as well as the annotation side, I
>>> think proliferation of the ontology should be an issue that is not
>>> dismissed so lightly. I just posted a similar concern on the SF
>>> item on children of protein folding, and I've taken that position
>>> on the binding terms as well.
>>>
>>> The number of components where translation occurs is much smaller
>>> than the number of ligands for binding or proteins for folding.
>>> But translation has lots of children. I see that there is already
>>> proliferation of mitochondrial child terms among these, including
>>> 20 terms for tRNA charging. There are other child branches that
>>> don't have precomposed mitochondrial child terms. If it was up to
>>> me, I'd obsolete things like GO:0070143_!_mitochondrial_alanyl-
>>> tRNA_aminoacylation rather than making more children for
>>> everything else.
>>>
>>> This strikes me as being way too much like adding back sensu terms.
>>>
>>> <snip>
>>>
>>>>> However, mitochondrial translation is fundamentally different
>>>>> from nuclear translation because a different genetic code is
>>>>> used.
>>>>
>>>
>>> But is this a fundamental difference? Codon reassignments have
>>> occurred many times across a variety of species. This includes
>>> prokaryotes and cytosolic eukaryotic systems as well as
>>> mitochondria. And not all mitochondria use the same noncanonical
>>> code. See:
>>>
>>> http://www.nature.com/nrg/journal/v2/n1/full/nrg0101_049a.html
>>>
>>> Just look at Figure 2 if you don't want to read the whole thing.
>>> And that's just taking Chris' instance of what is supposed to be a
>>> clear cut example. Is translation in the silkworm silk gland
>>> fundamentally different enough? In that case, specialized tRNA
>>> genes are overexpressed to deal with the massive use of alanine in
>>> the silk protein.
>>>
>>> I find the argument that one can't do an AND with some tools to be
>>> more of an argument to improve the tools than an argument to do
>>> extensive precomposition. If we have to build GO practice around
>>> the weakest tools, then we should also do explicit annotation all
>>> the way up to root for every term, to handle tools that don't use
>>> the true path rule. I'm NOT advocating that!!
>>>
>>> Jim
>>>
>>> =====================================
>>> Jim Hu
>>> Associate Professor
>>> Dept. of Biochemistry and Biophysics
>>> 2128 TAMU
>>> Texas A&M Univ.
>>> College Station, TX 77843-2128
>>> 979-862-4054
>>>
>>>
>>> _______________________________________________
>>> Go mailing list
>>> Go at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/go
>>
>>
>>
>>
>>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
More information about the Go
mailing list