[Go] addition of localization specific process terms ?

David Hill dph at informatics.jax.org
Mon Mar 23 16:37:21 PDT 2009


It's on the agenda in ontology development, although we may want to move 
its timing.

David

Karen Christie wrote:
> I actually meant the " when to instantiate localization specific 
> process terms" issue, though that is perhaps tied up in the col 16 and 
> 17 discussion too.
>
> -Karen
>
>
> On Mon, 23 Mar 2009, Chris Mungall wrote:
>
>>
>> Thanks Karen
>>
>> I guess it makes sense to talk about col 16 (and 17 whilst we are 
>> there anyway) before the binding discussion?
>>
>> On Mar 23, 2009, at 4:09 PM, Karen Christie wrote:
>>
>>> Maybe we should talk about this topic at the GO meeting. While there 
>>> was lots of discussion, I never really got a sense of what I should 
>>> actually do now, in terms of when, or when not, to request new 
>>> "pre-composed" terms.
>>>
>>> I guess I'll put this on the agenda.
>>>
>>> -Karen
>>>
>>>
>>> On Mon, 23 Mar 2009, Chris Mungall wrote:
>>>
>>>>
>>>> On Mar 4, 2009, at 12:21 PM, Chris Mungall wrote:
>>>>
>>>>> On Mar 4, 2009, at 10:33 AM, Jim Hu wrote:
>>>>>> On Mar 4, 2009, at 11:49 AM, Chris Mungall wrote:
>>>>>>> On Mar 4, 2009, at 7:59 AM, Jim Hu wrote:
>>>>>>>> On Mar 4, 2009, at 2:38 AM, Valerie Wood wrote:
>>>>>>>>> Because of all of the arguments in favour  mentioned by Karen 
>>>>>>>>> and Chris I  thought it was always necessary and required for 
>>>>>>>>> curators to make the more granular annotation in these cases. 
>>>>>>>>> We decided long ago that proliferation of the ontology was not 
>>>>>>>>> an issue when pitched against accurate capture of biology, 
>>>>>>>>> and  I wasn't aware that it was ever GO philosophy not to 
>>>>>>>>> capture compartment specific processes in this way.
>>>>>>>> I wasn't involved in GO when this was decided, but as someone 
>>>>>>>> who does stuff on the software side as well as the annotation 
>>>>>>>> side, I think proliferation of the ontology should be an issue 
>>>>>>>> that is not dismissed so lightly.
>>>>>>> What are your concerns in particular?
>>>>>> My two concerns are the obvious ones, nothing particularly 
>>>>>> sophisticated:
>>>>>> 1) performance, especially of web-based tools that have to 
>>>>>> display GO with short processing times.  IIRC, AmiGO has had this 
>>>>>> problem - traversing the ontology to find all the children and 
>>>>>> annotations to children is slow enough that Mike had to write a 
>>>>>> cron job to kill excess db queries that came from users getting 
>>>>>> impatient and reloading the page while the traversal was in 
>>>>>> progress.  As the ontology gets big, these traversals take 
>>>>>> longer.  Maybe there are more efficient algorithms to deal with 
>>>>>> this, maybe AJAX partially makes this tolerable, and maybe the 
>>>>>> problem is the same with post-composition.  But it seems to me 
>>>>>> that at some point sheer size has a performance hit.
>>>>>> 2) User interface.  When I browse the ontology to look for the 
>>>>>> appropriate terms to do an annotation, there are nodes that would 
>>>>>> be unreadable if precomposition was being done consistently.  
>>>>>> Fortunately it isn't being done consistently at present.  For 
>>>>>> example, look at the children of the positive and negative 
>>>>>> regulation terms in the process ontology.  There are terms in 
>>>>>> there for mRNAs for specific genes (oskar and bicoid)!  That 
>>>>>> strikes me as being completely insane... if implemented for all 
>>>>>> regulated genes in all organisms, that node would have hundreds 
>>>>>> of thousands of children - it would be a large subset of 
>>>>>> UniProt/Genbank all at one level. Or worse, because many genes 
>>>>>> would be present at multiple overpopulated nodes in GO.
>>>>
>>>> I previously addressed this from an end-user point of view. But as 
>>>> Jim mentions in the sf tracker item about binding, it's also 
>>>> important to consider this from the curation point of view.
>>>>
>>>> Jim's point is that increased pre-coordination in the ontology 
>>>> makes it harder for curators, because it will take longer to hone 
>>>> in on the most appropriate term for an annotation.
>>>>
>>>> Whilst I can see that obviously there is some correlation between 
>>>> ontology size and time to find a term, I'm wondering the extent to 
>>>> which this is a problem. I would have expected that most annotation 
>>>> systems used at the MODs and UniProtKB would utilize some kind of 
>>>> term completion rather than the curator manually traversing down 
>>>> the graph. Also, if the curators are expected to post-compose using 
>>>> col 16, then they have *two* terms to find: for example to annotate 
>>>> "PEP binding" they would find the most specific term in GO *and* 
>>>> the relevant CHEBI terms (and finding terms in CHEBI is probably 
>>>> harder than finding terms in GO)
>>>>
>>>> But I don't annotate so I'm not sure.
>>>>
>>>> I would like to hear the opinion of some of the annotators here. Is 
>>>> excessive pre-coordination a concern for curation?
>>>>
>>>> I think it would be good if at the meeting a representative curator 
>>>> from each of the main annotation producing groups were to comment 
>>>> on the various situations in which pre-composed terms vs col 16 are 
>>>> preferred.
>>>>
>>>>> I would also add another concern that others often bring up:
>>>>> 3) Difficulty in maintaining the correct parentage in the ontology 
>>>>> (Karen brought this up in her email)
>>>>> However, I would respond to this and say that as we gain 
>>>>> confidence in using the cross-product definitions and the reasoner 
>>>>> to automate this procedure it becomes less of a concern (not yet 
>>>>> eliminated, but less of a concern).
>>>>> For example, there used to be massive errors in the regulation 
>>>>> graph, but we now use the reasoner and the regulation xps in batch 
>>>>> frequently, and as soon as OE2 is released we can directly 
>>>>> incorporate this directly into the ontology editing cycle. Thanks 
>>>>> to Midori's efforts we are making a lot of progress on the more 
>>>>> difficult BPxCC composite terms, and I feel we will soon be able 
>>>>> to manage the hierarchy for these terms automatically, making 
>>>>> pre-composition less of a worry here:
>>>>>
>>>>>     http://wiki.geneontology.org/index.php/XP:biological_process_xp_cellular_component 
>>>>>
>>>>>> I shudder to think what the graph representations would look like.
>>>>>> <snip>
>>>>>>>> I find the argument that one can't do an AND with some tools to 
>>>>>>>> be more of an argument to improve the tools than an argument to 
>>>>>>>> do extensive precomposition.  If we have to build GO practice 
>>>>>>>> around the weakest tools, then we should also do explicit 
>>>>>>>> annotation all the way up to root for every term, to handle 
>>>>>>>> tools that don't use the true path rule.  I'm NOT advocating 
>>>>>>>> that!!
>>>>>>> I agree that we shouldn't avoid doing the right thing because of 
>>>>>>> the weakest tools. I think we should have a plan for how we can 
>>>>>>> support tools, but I think we first need to agree roughly on 
>>>>>>> what the right thing is..
>>>>>> I think everyone agrees with this!
>>>>>> Jim
>>>>>>>> Jim
>>>>>>>> =====================================
>>>>>>>> Jim Hu
>>>>>>>> Associate Professor
>>>>>>>> Dept. of Biochemistry and Biophysics
>>>>>>>> 2128 TAMU
>>>>>>>> Texas A&M Univ.
>>>>>>>> College Station, TX 77843-2128
>>>>>>>> 979-862-4054
>>>>>> =====================================
>>>>>> Jim Hu
>>>>>> Associate Professor
>>>>>> Dept. of Biochemistry and Biophysics
>>>>>> 2128 TAMU
>>>>>> Texas A&M Univ.
>>>>>> College Station, TX 77843-2128
>>>>>> 979-862-4054
>>>>> _______________________________________________
>>>>> Go mailing list
>>>>> Go at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>
>>>> _______________________________________________
>>>> Go mailing list
>>>> Go at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go

-- 
David P. Hill, Ph.D.
Bioinformatics Scientist: Ontology Development
Gene Ontology Consortium
The Jackson Laboratory
www.geneontology.org
www.informatics.jax.org
tel:207-288-6430



More information about the Go mailing list