[Go] addition of localization specific process terms ?

Karen Christie kchris at genome.stanford.edu
Mon Mar 23 16:27:48 PDT 2009


I actually meant the " when to instantiate localization specific process 
terms" issue, though that is perhaps tied up in the col 16 and 17 
discussion too.

-Karen


On Mon, 23 Mar 2009, Chris Mungall wrote:

>
> Thanks Karen
>
> I guess it makes sense to talk about col 16 (and 17 whilst we are there 
> anyway) before the binding discussion?
>
> On Mar 23, 2009, at 4:09 PM, Karen Christie wrote:
>
>> Maybe we should talk about this topic at the GO meeting. While there was 
>> lots of discussion, I never really got a sense of what I should actually do 
>> now, in terms of when, or when not, to request new "pre-composed" terms.
>> 
>> I guess I'll put this on the agenda.
>> 
>> -Karen
>> 
>> 
>> On Mon, 23 Mar 2009, Chris Mungall wrote:
>> 
>>> 
>>> On Mar 4, 2009, at 12:21 PM, Chris Mungall wrote:
>>> 
>>>> On Mar 4, 2009, at 10:33 AM, Jim Hu wrote:
>>>>> On Mar 4, 2009, at 11:49 AM, Chris Mungall wrote:
>>>>>> On Mar 4, 2009, at 7:59 AM, Jim Hu wrote:
>>>>>>> On Mar 4, 2009, at 2:38 AM, Valerie Wood wrote:
>>>>>>>> Because of all of the arguments in favour  mentioned by Karen and 
>>>>>>>> Chris I  thought it was always necessary and required for curators to 
>>>>>>>> make the more granular annotation in these cases. We decided long ago 
>>>>>>>> that proliferation of the ontology was not an issue when pitched 
>>>>>>>> against accurate capture of biology, and  I wasn't aware that it was 
>>>>>>>> ever GO philosophy not to capture compartment specific processes in 
>>>>>>>> this way.
>>>>>>> I wasn't involved in GO when this was decided, but as someone who does 
>>>>>>> stuff on the software side as well as the annotation side, I think 
>>>>>>> proliferation of the ontology should be an issue that is not dismissed 
>>>>>>> so lightly.
>>>>>> What are your concerns in particular?
>>>>> My two concerns are the obvious ones, nothing particularly 
>>>>> sophisticated:
>>>>> 1) performance, especially of web-based tools that have to display GO 
>>>>> with short processing times.  IIRC, AmiGO has had this problem - 
>>>>> traversing the ontology to find all the children and annotations to 
>>>>> children is slow enough that Mike had to write a cron job to kill excess 
>>>>> db queries that came from users getting impatient and reloading the page 
>>>>> while the traversal was in progress.  As the ontology gets big, these 
>>>>> traversals take longer.  Maybe there are more efficient algorithms to 
>>>>> deal with this, maybe AJAX partially makes this tolerable, and maybe the 
>>>>> problem is the same with post-composition.  But it seems to me that at 
>>>>> some point sheer size has a performance hit.
>>>>> 2) User interface.  When I browse the ontology to look for the 
>>>>> appropriate terms to do an annotation, there are nodes that would be 
>>>>> unreadable if precomposition was being done consistently.  Fortunately 
>>>>> it isn't being done consistently at present.  For example, look at the 
>>>>> children of the positive and negative regulation terms in the process 
>>>>> ontology.  There are terms in there for mRNAs for specific genes (oskar 
>>>>> and bicoid)!  That strikes me as being completely insane... if 
>>>>> implemented for all regulated genes in all organisms, that node would 
>>>>> have hundreds of thousands of children - it would be a large subset of 
>>>>> UniProt/Genbank all at one level. Or worse, because many genes would be 
>>>>> present at multiple overpopulated nodes in GO.
>>> 
>>> I previously addressed this from an end-user point of view. But as Jim 
>>> mentions in the sf tracker item about binding, it's also important to 
>>> consider this from the curation point of view.
>>> 
>>> Jim's point is that increased pre-coordination in the ontology makes it 
>>> harder for curators, because it will take longer to hone in on the most 
>>> appropriate term for an annotation.
>>> 
>>> Whilst I can see that obviously there is some correlation between ontology 
>>> size and time to find a term, I'm wondering the extent to which this is a 
>>> problem. I would have expected that most annotation systems used at the 
>>> MODs and UniProtKB would utilize some kind of term completion rather than 
>>> the curator manually traversing down the graph. Also, if the curators are 
>>> expected to post-compose using col 16, then they have *two* terms to find: 
>>> for example to annotate "PEP binding" they would find the most specific 
>>> term in GO *and* the relevant CHEBI terms (and finding terms in CHEBI is 
>>> probably harder than finding terms in GO)
>>> 
>>> But I don't annotate so I'm not sure.
>>> 
>>> I would like to hear the opinion of some of the annotators here. Is 
>>> excessive pre-coordination a concern for curation?
>>> 
>>> I think it would be good if at the meeting a representative curator from 
>>> each of the main annotation producing groups were to comment on the 
>>> various situations in which pre-composed terms vs col 16 are preferred.
>>> 
>>>> I would also add another concern that others often bring up:
>>>> 3) Difficulty in maintaining the correct parentage in the ontology (Karen 
>>>> brought this up in her email)
>>>> However, I would respond to this and say that as we gain confidence in 
>>>> using the cross-product definitions and the reasoner to automate this 
>>>> procedure it becomes less of a concern (not yet eliminated, but less of a 
>>>> concern).
>>>> For example, there used to be massive errors in the regulation graph, but 
>>>> we now use the reasoner and the regulation xps in batch frequently, and 
>>>> as soon as OE2 is released we can directly incorporate this directly into 
>>>> the ontology editing cycle. Thanks to Midori's efforts we are making a 
>>>> lot of progress on the more difficult BPxCC composite terms, and I feel 
>>>> we will soon be able to manage the hierarchy for these terms 
>>>> automatically, making pre-composition less of a worry here:
>>>>
>>>> 	http://wiki.geneontology.org/index.php/XP:biological_process_xp_cellular_component
>>>>> I shudder to think what the graph representations would look like.
>>>>> <snip>
>>>>>>> I find the argument that one can't do an AND with some tools to be 
>>>>>>> more of an argument to improve the tools than an argument to do 
>>>>>>> extensive precomposition.  If we have to build GO practice around the 
>>>>>>> weakest tools, then we should also do explicit annotation all the way 
>>>>>>> up to root for every term, to handle tools that don't use the true 
>>>>>>> path rule.  I'm NOT advocating that!!
>>>>>> I agree that we shouldn't avoid doing the right thing because of the 
>>>>>> weakest tools. I think we should have a plan for how we can support 
>>>>>> tools, but I think we first need to agree roughly on what the right 
>>>>>> thing is..
>>>>> I think everyone agrees with this!
>>>>> Jim
>>>>>>> Jim
>>>>>>> =====================================
>>>>>>> Jim Hu
>>>>>>> Associate Professor
>>>>>>> Dept. of Biochemistry and Biophysics
>>>>>>> 2128 TAMU
>>>>>>> Texas A&M Univ.
>>>>>>> College Station, TX 77843-2128
>>>>>>> 979-862-4054
>>>>> =====================================
>>>>> Jim Hu
>>>>> Associate Professor
>>>>> Dept. of Biochemistry and Biophysics
>>>>> 2128 TAMU
>>>>> Texas A&M Univ.
>>>>> College Station, TX 77843-2128
>>>>> 979-862-4054
>>>> _______________________________________________
>>>> Go mailing list
>>>> Go at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>> 
>>> _______________________________________________
>>> Go mailing list
>>> Go at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/go
>


More information about the Go mailing list