[Go] addition of localization specific process terms ?
David Hill
dph at informatics.jax.org
Tue Mar 24 10:11:48 PDT 2009
From the Consortium minutes of September 2002
We reaffirmed that gene products should not appear as concepts
(i.e. as ontology terms). But under some circumstances it is
acceptable to mention gene products within ontology terms. The issue
to be resolved is how fine-grained we should be in children of
"protein biosynthesis," "protein binding," and some others.
Many of the children of "protein binding" and of "protein
biosynthesis" mention specific individual proteins; see the MGI
handout for a list of terms that have come into question.
There is an additional concern with protein biosynthesis terms: many
of the too-specific ones added recently are actually intended to
capture the results of experiments that measure levels of specific
proteins, but do not distinguish effects on translation (the
restricted definition of "protein biosynthesis," which is what we use
in GO, and have implicitly decided to keep using) from effects on
other steps in the overall process of making a protein
(e.g. transcription, modification).
We thought that adding terms for binding to (or biosynthesis of) any
specific protein was reasonably consistent with the logic we apply
when considering new terms, but we questioned the utility of having
many many very specific terms.
We agreed that we would keep or add terms that represent different
mechanisms, such as "covalent protein binding" and "non-covalent
protein binding" (hypothetical examples) or "viral protein
biosynthesis."
Michael came up with a two-part test; we can keep/add a "protein X
biosynthesis" term if both criteria are met:
1. There is something specific about the biosynthesis of
protein X, i.e. there are gene products involved in X biosynthesis but
not general protein biosynthesis.
2. The proposed term is not redundant with any other process
term. For example, we will make "glycoprotein biosynthesis" obsolete
because it is redundant with "protein glycosylation."
The same test can be applied to binding, transport, etc.
But how to avoid losing information? Curators often want to capture
what is known, as when an experiment detects binding to a particular
protein substrate or altered levels of a specific gene product.
The coffee break "Round Table" discussion led to a proposal:
eventually make children of "protein binding" obsolete, and instead
use annotation to indicate which protein is bound by the gene product
of interest. The annotation would use the generic "protein binding" GO
term, and a new column in the gene_association file where we can store
an ID for the protein that is bound.
Inevitably, though, there's a catch: the world is not yet ready for us
to implement this in all situations. If the gene product being
annotated binds a class of proteins -- the example was actin -- rather
than a single protein, we're SOL for the present. In time there will
be UniProt IDs representing protein families, but that could take
months or even a year or two. There was some discussion of what to do
in the meantime; the conclusion was to apply a couple more tests to
identify terms that we should keep for now but make obsolete
later. First, check over annotations that use the term; second, check
whether the term has any children. Annotations will help us figure out
whether the term meets the first criterion of the two-part test. A
term that has children is most likely a useful grouping term.
The same considerations, and possible future solution, apply to
"protein X biosynthesis." To address the issue of experiments that
detect changes in levels of a particular protein, we have decided to
consider adding terms for "gene expression" and regulation of same,
but further discussion is required before we add them (I suspect that
counter-arguments will be raised). If they are added, the new
gene_association column could be used with them in the same way as
proposed for protein binding.
Alexander Diehl wrote:
> Jim,
>
> I don't see the need to prepare ChEBI x binding. The use of the term
> "protein binding" has always been done post-compositionally by MGI and
> others, and similarly the use of "binding" in general can be done
> post-compositionally. There are some more specific children of
> protein binding, some of which may be more useful than others, but
> certainly I would not encourage more without a sound argument. The GO
> is not perfect, but the existing mechanisms for term requests and
> generation are pretty good about weeding out unnecessary additions.
> If necessary we can propose annotation standards for GO that specify
> post-composition for "binding" related terms, as we do already for
> "protein binding."
>
> While I agree with Jen a new fast term browser might help folks who
> have trouble finding terms, in general finding terms does become
> faster with experience. And although the MGI term browser is
> primitive in many respects, by default it always shows children of a
> given term, which many other browsers do not, thus saving a step when
> searching for more specific terms.
>
> -- Alex
>
>
> Jim Hu wrote:
>> In general, I like precomposition too. But for binding, and to a
>> lesser extent location, I don't like the idea of having parent terms
>> with thousands of children. The terms like regulation of translation
>> of gene X mRNA are terrifying to me. I noticed that somewhere on
>> wiki.geneontology.org, there's a statement that GO will never do
>> those kinds of terms by precomposition, but a few terms like that are
>> already in GO, and there was recently a sourceforge item about
>> protein chaperones for specific gene products.
>> I usually find terms by searching for a keyword combination,
>> navigating to a particular term, and then browsing up and down the
>> ontology. Do others not do the browsing part? I think that's where
>> the massive expansion is most problematic.
>>
>> I see what you mean about time, but requesting a new term is also a
>> time barrier to annotation.
>>
>> Perhaps a test version of the ontology could be automatically
>> generated with ChEBI x binding, and people could see if my intuitions
>> or everyone else's are correct. In general, I suspect that people
>> want precomposition for their own annotations and are annoyed at the
>> excess terms that they don't see themselves ever using. E. coli
>> being the most distant from everyone else in the phylogeny may be why
>> I'm where I am on this! ;)
>>
>> Jim
>>
>> On Mar 24, 2009, at 8:38 AM, Alexander Diehl wrote:
>>
>>> I want to add my agreement to the words of Val and David. It is
>>> much simpler to use a pre-composed existing term in annotation. One
>>> aspect of the annotation process I feel is over looked as we add
>>> more complexity to the annotation process is that post-composition
>>> adds a significant bit of time to the annotation process, resulting
>>> in fewer annotations overall and lower metrics for the database and
>>> grant. While it is important to do detailed and correct annotations
>>> whenever possible, anything we can to do to increase throughput,
>>> such as precomposing likely terms, is beneficial. I'm not saying we
>>> should add all possible combinations of X and Y, just the
>>> appropriate ones. This is one of the main reasons for having
>>> annotators lead ontology development and holding ontology content
>>> meetings where expert biologists can discuss processes actually seen
>>> in nature, so that the appropriate combinations of X and Y are added.
>>>
>>> And knowing which pre-composed terms to use is a matter of training
>>> and experience, both in general biology, and in annotation. There's
>>> no way around it.
>>>
>>> -- Alex
>>>
>>>
>>> val at sanger.ac.uk <mailto:val at sanger.ac.uk> wrote:
>>>> I agree, it is far better to have pre-composed terms if possible,
>>>> especially for new curators.
>>>> As we encourage annotation to the most specific term possible it is
>>>> hard
>>>> to overlook the precomposed terms, because we (I hope) always check
>>>> the
>>>> child terms).
>>>>
>>>> Val
>>>>
>>>>
>>>>> From someone who has been annotating using a lot of
>>>>> pre-composition as
>>>>> well as post-composition for a reasonably long time; although
>>>>> there is
>>>>> an initial activation energy to get a pre-composed term into the
>>>>> ontology, once they are there, they are much easier to use than to
>>>>> look
>>>>> up things in multiple ontologies for post-composition.
>>>>>
>>>>> The key to finding pre-composed terms easily is to have a good way of
>>>>> viewing the ontology.
>>>>>
>>>>> my 2c
>>>>>
>>>>> D
>>>>>
>>>>>> I would like to hear the opinion of some of the annotators here. Is
>>>>>> excessive pre-coordination a concern for curation?
>>>
--
David P. Hill, Ph.D.
Bioinformatics Scientist: Ontology Development
Gene Ontology Consortium
The Jackson Laboratory
www.geneontology.org
www.informatics.jax.org
tel:207-288-6430
More information about the Go
mailing list