[Go] addition of localization specific process terms ?
Karen Christie
kchris at genome.stanford.edu
Mon Mar 23 16:09:38 PDT 2009
Maybe we should talk about this topic at the GO meeting. While there was
lots of discussion, I never really got a sense of what I should actually
do now, in terms of when, or when not, to request new "pre-composed"
terms.
I guess I'll put this on the agenda.
-Karen
On Mon, 23 Mar 2009, Chris Mungall wrote:
>
> On Mar 4, 2009, at 12:21 PM, Chris Mungall wrote:
>
>>
>> On Mar 4, 2009, at 10:33 AM, Jim Hu wrote:
>>
>>> On Mar 4, 2009, at 11:49 AM, Chris Mungall wrote:
>>>
>>>> On Mar 4, 2009, at 7:59 AM, Jim Hu wrote:
>>>>
>>>>> On Mar 4, 2009, at 2:38 AM, Valerie Wood wrote:
>>>>>
>>>>>> Because of all of the arguments in favour mentioned by Karen and Chris
>>>>>> I thought it was always necessary and required for curators to make
>>>>>> the more granular annotation in these cases. We decided long ago that
>>>>>> proliferation of the ontology was not an issue when pitched against
>>>>>> accurate capture of biology, and I wasn't aware that it was ever GO
>>>>>> philosophy not to capture compartment specific processes in this way.
>>>>>
>>>>> I wasn't involved in GO when this was decided, but as someone who does
>>>>> stuff on the software side as well as the annotation side, I think
>>>>> proliferation of the ontology should be an issue that is not dismissed
>>>>> so lightly.
>>>>
>>>> What are your concerns in particular?
>>>>>
>>>
>>> My two concerns are the obvious ones, nothing particularly sophisticated:
>>>
>>> 1) performance, especially of web-based tools that have to display GO with
>>> short processing times. IIRC, AmiGO has had this problem - traversing the
>>> ontology to find all the children and annotations to children is slow
>>> enough that Mike had to write a cron job to kill excess db queries that
>>> came from users getting impatient and reloading the page while the
>>> traversal was in progress. As the ontology gets big, these traversals
>>> take longer. Maybe there are more efficient algorithms to deal with this,
>>> maybe AJAX partially makes this tolerable, and maybe the problem is the
>>> same with post-composition. But it seems to me that at some point sheer
>>> size has a performance hit.
>>>
>>> 2) User interface. When I browse the ontology to look for the appropriate
>>> terms to do an annotation, there are nodes that would be unreadable if
>>> precomposition was being done consistently. Fortunately it isn't being
>>> done consistently at present. For example, look at the children of the
>>> positive and negative regulation terms in the process ontology. There are
>>> terms in there for mRNAs for specific genes (oskar and bicoid)! That
>>> strikes me as being completely insane... if implemented for all regulated
>>> genes in all organisms, that node would have hundreds of thousands of
>>> children - it would be a large subset of UniProt/Genbank all at one level.
>>> Or worse, because many genes would be present at multiple overpopulated
>>> nodes in GO.
>
> I previously addressed this from an end-user point of view. But as Jim
> mentions in the sf tracker item about binding, it's also important to
> consider this from the curation point of view.
>
> Jim's point is that increased pre-coordination in the ontology makes it
> harder for curators, because it will take longer to hone in on the most
> appropriate term for an annotation.
>
> Whilst I can see that obviously there is some correlation between ontology
> size and time to find a term, I'm wondering the extent to which this is a
> problem. I would have expected that most annotation systems used at the MODs
> and UniProtKB would utilize some kind of term completion rather than the
> curator manually traversing down the graph. Also, if the curators are
> expected to post-compose using col 16, then they have *two* terms to find:
> for example to annotate "PEP binding" they would find the most specific term
> in GO *and* the relevant CHEBI terms (and finding terms in CHEBI is probably
> harder than finding terms in GO)
>
> But I don't annotate so I'm not sure.
>
> I would like to hear the opinion of some of the annotators here. Is excessive
> pre-coordination a concern for curation?
>
> I think it would be good if at the meeting a representative curator from each
> of the main annotation producing groups were to comment on the various
> situations in which pre-composed terms vs col 16 are preferred.
>
>> I would also add another concern that others often bring up:
>>
>> 3) Difficulty in maintaining the correct parentage in the ontology (Karen
>> brought this up in her email)
>>
>> However, I would respond to this and say that as we gain confidence in
>> using the cross-product definitions and the reasoner to automate this
>> procedure it becomes less of a concern (not yet eliminated, but less of a
>> concern).
>>
>> For example, there used to be massive errors in the regulation graph, but
>> we now use the reasoner and the regulation xps in batch frequently, and as
>> soon as OE2 is released we can directly incorporate this directly into the
>> ontology editing cycle. Thanks to Midori's efforts we are making a lot of
>> progress on the more difficult BPxCC composite terms, and I feel we will
>> soon be able to manage the hierarchy for these terms automatically, making
>> pre-composition less of a worry here:
>>
>> http://wiki.geneontology.org/index.php/XP:biological_process_xp_cellular_component
>>>
>>> I shudder to think what the graph representations would look like.
>>>
>>>
>>> <snip>
>>>>> I find the argument that one can't do an AND with some tools to be more
>>>>> of an argument to improve the tools than an argument to do extensive
>>>>> precomposition. If we have to build GO practice around the weakest
>>>>> tools, then we should also do explicit annotation all the way up to root
>>>>> for every term, to handle tools that don't use the true path rule. I'm
>>>>> NOT advocating that!!
>>>>
>>>> I agree that we shouldn't avoid doing the right thing because of the
>>>> weakest tools. I think we should have a plan for how we can support
>>>> tools, but I think we first need to agree roughly on what the right thing
>>>> is..
>>>
>>> I think everyone agrees with this!
>>>
>>> Jim
>>>
>>>
>>>>
>>>>>
>>>>> Jim
>>>>>
>>>>> =====================================
>>>>> Jim Hu
>>>>> Associate Professor
>>>>> Dept. of Biochemistry and Biophysics
>>>>> 2128 TAMU
>>>>> Texas A&M Univ.
>>>>> College Station, TX 77843-2128
>>>>> 979-862-4054
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> =====================================
>>> Jim Hu
>>> Associate Professor
>>> Dept. of Biochemistry and Biophysics
>>> 2128 TAMU
>>> Texas A&M Univ.
>>> College Station, TX 77843-2128
>>> 979-862-4054
>>>
>>>
>>>
>>
>> _______________________________________________
>> Go mailing list
>> Go at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/go
>>
>
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go
More information about the Go
mailing list