[Go] addition of localization specific process terms ?

Chris Mungall cjm at berkeleybop.org
Mon Mar 23 16:22:05 PDT 2009


Thanks Karen

I guess it makes sense to talk about col 16 (and 17 whilst we are  
there anyway) before the binding discussion?

On Mar 23, 2009, at 4:09 PM, Karen Christie wrote:

> Maybe we should talk about this topic at the GO meeting. While there  
> was lots of discussion, I never really got a sense of what I should  
> actually do now, in terms of when, or when not, to request new "pre- 
> composed" terms.
>
> I guess I'll put this on the agenda.
>
> -Karen
>
>
> On Mon, 23 Mar 2009, Chris Mungall wrote:
>
>>
>> On Mar 4, 2009, at 12:21 PM, Chris Mungall wrote:
>>
>>> On Mar 4, 2009, at 10:33 AM, Jim Hu wrote:
>>>> On Mar 4, 2009, at 11:49 AM, Chris Mungall wrote:
>>>>> On Mar 4, 2009, at 7:59 AM, Jim Hu wrote:
>>>>>> On Mar 4, 2009, at 2:38 AM, Valerie Wood wrote:
>>>>>>> Because of all of the arguments in favour  mentioned by Karen  
>>>>>>> and Chris I  thought it was always necessary and required for  
>>>>>>> curators to make the more granular annotation in these cases.  
>>>>>>> We decided long ago that proliferation of the ontology was not  
>>>>>>> an issue when pitched against accurate capture of biology,  
>>>>>>> and  I wasn't aware that it was ever GO philosophy not to  
>>>>>>> capture compartment specific processes in this way.
>>>>>> I wasn't involved in GO when this was decided, but as someone  
>>>>>> who does stuff on the software side as well as the annotation  
>>>>>> side, I think proliferation of the ontology should be an issue  
>>>>>> that is not dismissed so lightly.
>>>>> What are your concerns in particular?
>>>> My two concerns are the obvious ones, nothing particularly  
>>>> sophisticated:
>>>> 1) performance, especially of web-based tools that have to  
>>>> display GO with short processing times.  IIRC, AmiGO has had this  
>>>> problem - traversing the ontology to find all the children and  
>>>> annotations to children is slow enough that Mike had to write a  
>>>> cron job to kill excess db queries that came from users getting  
>>>> impatient and reloading the page while the traversal was in  
>>>> progress.  As the ontology gets big, these traversals take  
>>>> longer.  Maybe there are more efficient algorithms to deal with  
>>>> this, maybe AJAX partially makes this tolerable, and maybe the  
>>>> problem is the same with post-composition.  But it seems to me  
>>>> that at some point sheer size has a performance hit.
>>>> 2) User interface.  When I browse the ontology to look for the  
>>>> appropriate terms to do an annotation, there are nodes that would  
>>>> be unreadable if precomposition was being done consistently.   
>>>> Fortunately it isn't being done consistently at present.  For  
>>>> example, look at the children of the positive and negative  
>>>> regulation terms in the process ontology.  There are terms in  
>>>> there for mRNAs for specific genes (oskar and bicoid)!  That  
>>>> strikes me as being completely insane... if implemented for all  
>>>> regulated genes in all organisms, that node would have hundreds  
>>>> of thousands of children - it would be a large subset of UniProt/ 
>>>> Genbank all at one level. Or worse, because many genes would be  
>>>> present at multiple overpopulated nodes in GO.
>>
>> I previously addressed this from an end-user point of view. But as  
>> Jim mentions in the sf tracker item about binding, it's also  
>> important to consider this from the curation point of view.
>>
>> Jim's point is that increased pre-coordination in the ontology  
>> makes it harder for curators, because it will take longer to hone  
>> in on the most appropriate term for an annotation.
>>
>> Whilst I can see that obviously there is some correlation between  
>> ontology size and time to find a term, I'm wondering the extent to  
>> which this is a problem. I would have expected that most annotation  
>> systems used at the MODs and UniProtKB would utilize some kind of  
>> term completion rather than the curator manually traversing down  
>> the graph. Also, if the curators are expected to post-compose using  
>> col 16, then they have *two* terms to find: for example to annotate  
>> "PEP binding" they would find the most specific term in GO *and*  
>> the relevant CHEBI terms (and finding terms in CHEBI is probably  
>> harder than finding terms in GO)
>>
>> But I don't annotate so I'm not sure.
>>
>> I would like to hear the opinion of some of the annotators here. Is  
>> excessive pre-coordination a concern for curation?
>>
>> I think it would be good if at the meeting a representative curator  
>> from each of the main annotation producing groups were to comment  
>> on the various situations in which pre-composed terms vs col 16 are  
>> preferred.
>>
>>> I would also add another concern that others often bring up:
>>> 3) Difficulty in maintaining the correct parentage in the ontology  
>>> (Karen brought this up in her email)
>>> However, I would respond to this and say that as we gain  
>>> confidence in using the cross-product definitions and the reasoner  
>>> to automate this procedure it becomes less of a concern (not yet  
>>> eliminated, but less of a concern).
>>> For example, there used to be massive errors in the regulation  
>>> graph, but we now use the reasoner and the regulation xps in batch  
>>> frequently, and as soon as OE2 is released we can directly  
>>> incorporate this directly into the ontology editing cycle. Thanks  
>>> to Midori's efforts we are making a lot of progress on the more  
>>> difficult BPxCC composite terms, and I feel we will soon be able  
>>> to manage the hierarchy for these terms automatically, making pre- 
>>> composition less of a worry here:
>>>
>>> 	http://wiki.geneontology.org/index.php/XP:biological_process_xp_cellular_component
>>>> I shudder to think what the graph representations would look like.
>>>> <snip>
>>>>>> I find the argument that one can't do an AND with some tools to  
>>>>>> be more of an argument to improve the tools than an argument to  
>>>>>> do extensive precomposition.  If we have to build GO practice  
>>>>>> around the weakest tools, then we should also do explicit  
>>>>>> annotation all the way up to root for every term, to handle  
>>>>>> tools that don't use the true path rule.  I'm NOT advocating  
>>>>>> that!!
>>>>> I agree that we shouldn't avoid doing the right thing because of  
>>>>> the weakest tools. I think we should have a plan for how we can  
>>>>> support tools, but I think we first need to agree roughly on  
>>>>> what the right thing is..
>>>> I think everyone agrees with this!
>>>> Jim
>>>>>> Jim
>>>>>> =====================================
>>>>>> Jim Hu
>>>>>> Associate Professor
>>>>>> Dept. of Biochemistry and Biophysics
>>>>>> 2128 TAMU
>>>>>> Texas A&M Univ.
>>>>>> College Station, TX 77843-2128
>>>>>> 979-862-4054
>>>> =====================================
>>>> Jim Hu
>>>> Associate Professor
>>>> Dept. of Biochemistry and Biophysics
>>>> 2128 TAMU
>>>> Texas A&M Univ.
>>>> College Station, TX 77843-2128
>>>> 979-862-4054
>>> _______________________________________________
>>> Go mailing list
>>> Go at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/go
>>
>> _______________________________________________
>> Go mailing list
>> Go at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/go
>



More information about the Go mailing list