[Go] addition of localization specific process terms ?

Chris Mungall cjm at berkeleybop.org
Mon Mar 23 14:19:24 PDT 2009


On Mar 4, 2009, at 12:21 PM, Chris Mungall wrote:

>
> On Mar 4, 2009, at 10:33 AM, Jim Hu wrote:
>
>> On Mar 4, 2009, at 11:49 AM, Chris Mungall wrote:
>>
>>> On Mar 4, 2009, at 7:59 AM, Jim Hu wrote:
>>>
>>>> On Mar 4, 2009, at 2:38 AM, Valerie Wood wrote:
>>>>
>>>>> Because of all of the arguments in favour  mentioned by Karen  
>>>>> and Chris I  thought it was always necessary and required for  
>>>>> curators to make the more granular annotation in these cases. We  
>>>>> decided long ago that proliferation of the ontology was not an  
>>>>> issue when pitched against accurate capture of biology, and  I  
>>>>> wasn't aware that it was ever GO philosophy not to capture  
>>>>> compartment specific processes in this way.
>>>>
>>>> I wasn't involved in GO when this was decided, but as someone who  
>>>> does stuff on the software side as well as the annotation side, I  
>>>> think proliferation of the ontology should be an issue that is  
>>>> not dismissed so lightly.
>>>
>>> What are your concerns in particular?
>>>>
>>
>> My two concerns are the obvious ones, nothing particularly  
>> sophisticated:
>>
>> 1) performance, especially of web-based tools that have to display  
>> GO with short processing times.  IIRC, AmiGO has had this problem -  
>> traversing the ontology to find all the children and annotations to  
>> children is slow enough that Mike had to write a cron job to kill  
>> excess db queries that came from users getting impatient and  
>> reloading the page while the traversal was in progress.  As the  
>> ontology gets big, these traversals take longer.  Maybe there are  
>> more efficient algorithms to deal with this, maybe AJAX partially  
>> makes this tolerable, and maybe the problem is the same with post- 
>> composition.  But it seems to me that at some point sheer size has  
>> a performance hit.
>>
>> 2) User interface.  When I browse the ontology to look for the  
>> appropriate terms to do an annotation, there are nodes that would  
>> be unreadable if precomposition was being done consistently.   
>> Fortunately it isn't being done consistently at present.  For  
>> example, look at the children of the positive and negative  
>> regulation terms in the process ontology.  There are terms in there  
>> for mRNAs for specific genes (oskar and bicoid)!  That strikes me  
>> as being completely insane... if implemented for all regulated  
>> genes in all organisms, that node would have hundreds of thousands  
>> of children - it would be a large subset of UniProt/Genbank all at  
>> one level.  Or worse, because many genes would be present at  
>> multiple overpopulated nodes in GO.

I previously addressed this from an end-user point of view. But as Jim  
mentions in the sf tracker item about binding, it's also important to  
consider this from the curation point of view.

Jim's point is that increased pre-coordination in the ontology makes  
it harder for curators, because it will take longer to hone in on the  
most appropriate term for an annotation.

Whilst I can see that obviously there is some correlation between  
ontology size and time to find a term, I'm wondering the extent to  
which this is a problem. I would have expected that most annotation  
systems used at the MODs and UniProtKB would utilize some kind of term  
completion rather than the curator manually traversing down the graph.  
Also, if the curators are expected to post-compose using col 16, then  
they have *two* terms to find: for example to annotate "PEP binding"  
they would find the most specific term in GO *and* the relevant CHEBI  
terms (and finding terms in CHEBI is probably harder than finding  
terms in GO)

But I don't annotate so I'm not sure.

I would like to hear the opinion of some of the annotators here. Is  
excessive pre-coordination a concern for curation?

I think it would be good if at the meeting a representative curator  
from each of the main annotation producing groups were to comment on  
the various situations in which pre-composed terms vs col 16 are  
preferred.

> I would also add another concern that others often bring up:
>
> 3) Difficulty in maintaining the correct parentage in the ontology  
> (Karen brought this up in her email)
>
> However, I would respond to this and say that as we gain confidence  
> in using the cross-product definitions and the reasoner to automate  
> this procedure it becomes less of a concern (not yet eliminated, but  
> less of a concern).
>
> For example, there used to be massive errors in the regulation  
> graph, but we now use the reasoner and the regulation xps in batch  
> frequently, and as soon as OE2 is released we can directly  
> incorporate this directly into the ontology editing cycle. Thanks to  
> Midori's efforts we are making a lot of progress on the more  
> difficult BPxCC composite terms, and I feel we will soon be able to  
> manage the hierarchy for these terms automatically, making pre- 
> composition less of a worry here:
>
> 	http://wiki.geneontology.org/index.php/XP:biological_process_xp_cellular_component
>>
>> I shudder to think what the graph representations would look like.
>>
>>
>> <snip>
>>>> I find the argument that one can't do an AND with some tools to  
>>>> be more of an argument to improve the tools than an argument to  
>>>> do extensive precomposition.  If we have to build GO practice  
>>>> around the weakest tools, then we should also do explicit  
>>>> annotation all the way up to root for every term, to handle tools  
>>>> that don't use the true path rule.  I'm NOT advocating that!!
>>>
>>> I agree that we shouldn't avoid doing the right thing because of  
>>> the weakest tools. I think we should have a plan for how we can  
>>> support tools, but I think we first need to agree roughly on what  
>>> the right thing is..
>>
>> I think everyone agrees with this!
>>
>> Jim
>>
>>
>>>
>>>>
>>>> Jim
>>>>
>>>> =====================================
>>>> Jim Hu
>>>> Associate Professor
>>>> Dept. of Biochemistry and Biophysics
>>>> 2128 TAMU
>>>> Texas A&M Univ.
>>>> College Station, TX 77843-2128
>>>> 979-862-4054
>>>>
>>>>
>>>>
>>>
>>
>> =====================================
>> Jim Hu
>> Associate Professor
>> Dept. of Biochemistry and Biophysics
>> 2128 TAMU
>> Texas A&M Univ.
>> College Station, TX 77843-2128
>> 979-862-4054
>>
>>
>>
>
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go
>



More information about the Go mailing list