[Go] addition of localization specific process terms ?

Jim Hu jimhu at tamu.edu
Mon Mar 23 16:28:35 PDT 2009


Debby and I will be participating via remote, so I hope our schedules  
work out. :(

Jim

On Mar 23, 2009, at 6:22 PM, Chris Mungall wrote:

>
> Thanks Karen
>
> I guess it makes sense to talk about col 16 (and 17 whilst we are  
> there anyway) before the binding discussion?
>
> On Mar 23, 2009, at 4:09 PM, Karen Christie wrote:
>
>> Maybe we should talk about this topic at the GO meeting. While  
>> there was lots of discussion, I never really got a sense of what I  
>> should actually do now, in terms of when, or when not, to request  
>> new "pre-composed" terms.
>>
>> I guess I'll put this on the agenda.
>>
>> -Karen
>>
>>
>> On Mon, 23 Mar 2009, Chris Mungall wrote:
>>
>>>
>>> On Mar 4, 2009, at 12:21 PM, Chris Mungall wrote:
>>>
>>>> On Mar 4, 2009, at 10:33 AM, Jim Hu wrote:
>>>>> On Mar 4, 2009, at 11:49 AM, Chris Mungall wrote:
>>>>>> On Mar 4, 2009, at 7:59 AM, Jim Hu wrote:
>>>>>>> On Mar 4, 2009, at 2:38 AM, Valerie Wood wrote:
>>>>>>>> Because of all of the arguments in favour  mentioned by Karen  
>>>>>>>> and Chris I  thought it was always necessary and required for  
>>>>>>>> curators to make the more granular annotation in these cases.  
>>>>>>>> We decided long ago that proliferation of the ontology was  
>>>>>>>> not an issue when pitched against accurate capture of  
>>>>>>>> biology, and  I wasn't aware that it was ever GO philosophy  
>>>>>>>> not to capture compartment specific processes in this way.
>>>>>>> I wasn't involved in GO when this was decided, but as someone  
>>>>>>> who does stuff on the software side as well as the annotation  
>>>>>>> side, I think proliferation of the ontology should be an issue  
>>>>>>> that is not dismissed so lightly.
>>>>>> What are your concerns in particular?
>>>>> My two concerns are the obvious ones, nothing particularly  
>>>>> sophisticated:
>>>>> 1) performance, especially of web-based tools that have to  
>>>>> display GO with short processing times.  IIRC, AmiGO has had  
>>>>> this problem - traversing the ontology to find all the children  
>>>>> and annotations to children is slow enough that Mike had to  
>>>>> write a cron job to kill excess db queries that came from users  
>>>>> getting impatient and reloading the page while the traversal was  
>>>>> in progress.  As the ontology gets big, these traversals take  
>>>>> longer.  Maybe there are more efficient algorithms to deal with  
>>>>> this, maybe AJAX partially makes this tolerable, and maybe the  
>>>>> problem is the same with post-composition.  But it seems to me  
>>>>> that at some point sheer size has a performance hit.
>>>>> 2) User interface.  When I browse the ontology to look for the  
>>>>> appropriate terms to do an annotation, there are nodes that  
>>>>> would be unreadable if precomposition was being done  
>>>>> consistently.  Fortunately it isn't being done consistently at  
>>>>> present.  For example, look at the children of the positive and  
>>>>> negative regulation terms in the process ontology.  There are  
>>>>> terms in there for mRNAs for specific genes (oskar and bicoid)!   
>>>>> That strikes me as being completely insane... if implemented for  
>>>>> all regulated genes in all organisms, that node would have  
>>>>> hundreds of thousands of children - it would be a large subset  
>>>>> of UniProt/Genbank all at one level. Or worse, because many  
>>>>> genes would be present at multiple overpopulated nodes in GO.
>>>
>>> I previously addressed this from an end-user point of view. But as  
>>> Jim mentions in the sf tracker item about binding, it's also  
>>> important to consider this from the curation point of view.
>>>
>>> Jim's point is that increased pre-coordination in the ontology  
>>> makes it harder for curators, because it will take longer to hone  
>>> in on the most appropriate term for an annotation.
>>>
>>> Whilst I can see that obviously there is some correlation between  
>>> ontology size and time to find a term, I'm wondering the extent to  
>>> which this is a problem. I would have expected that most  
>>> annotation systems used at the MODs and UniProtKB would utilize  
>>> some kind of term completion rather than the curator manually  
>>> traversing down the graph. Also, if the curators are expected to  
>>> post-compose using col 16, then they have *two* terms to find: for  
>>> example to annotate "PEP binding" they would find the most  
>>> specific term in GO *and* the relevant CHEBI terms (and finding  
>>> terms in CHEBI is probably harder than finding terms in GO)
>>>
>>> But I don't annotate so I'm not sure.
>>>
>>> I would like to hear the opinion of some of the annotators here.  
>>> Is excessive pre-coordination a concern for curation?
>>>
>>> I think it would be good if at the meeting a representative  
>>> curator from each of the main annotation producing groups were to  
>>> comment on the various situations in which pre-composed terms vs  
>>> col 16 are preferred.
>>>
>>>> I would also add another concern that others often bring up:
>>>> 3) Difficulty in maintaining the correct parentage in the  
>>>> ontology (Karen brought this up in her email)
>>>> However, I would respond to this and say that as we gain  
>>>> confidence in using the cross-product definitions and the  
>>>> reasoner to automate this procedure it becomes less of a concern  
>>>> (not yet eliminated, but less of a concern).
>>>> For example, there used to be massive errors in the regulation  
>>>> graph, but we now use the reasoner and the regulation xps in  
>>>> batch frequently, and as soon as OE2 is released we can directly  
>>>> incorporate this directly into the ontology editing cycle. Thanks  
>>>> to Midori's efforts we are making a lot of progress on the more  
>>>> difficult BPxCC composite terms, and I feel we will soon be able  
>>>> to manage the hierarchy for these terms automatically, making pre- 
>>>> composition less of a worry here:
>>>>
>>>> 	http://wiki.geneontology.org/index.php/XP:biological_process_xp_cellular_component
>>>>> I shudder to think what the graph representations would look like.
>>>>> <snip>
>>>>>>> I find the argument that one can't do an AND with some tools  
>>>>>>> to be more of an argument to improve the tools than an  
>>>>>>> argument to do extensive precomposition.  If we have to build  
>>>>>>> GO practice around the weakest tools, then we should also do  
>>>>>>> explicit annotation all the way up to root for every term, to  
>>>>>>> handle tools that don't use the true path rule.  I'm NOT  
>>>>>>> advocating that!!
>>>>>> I agree that we shouldn't avoid doing the right thing because  
>>>>>> of the weakest tools. I think we should have a plan for how we  
>>>>>> can support tools, but I think we first need to agree roughly  
>>>>>> on what the right thing is..
>>>>> I think everyone agrees with this!
>>>>> Jim
>>>>>>> Jim
>>>>>>> =====================================
>>>>>>> Jim Hu
>>>>>>> Associate Professor
>>>>>>> Dept. of Biochemistry and Biophysics
>>>>>>> 2128 TAMU
>>>>>>> Texas A&M Univ.
>>>>>>> College Station, TX 77843-2128
>>>>>>> 979-862-4054
>>>>> =====================================
>>>>> Jim Hu
>>>>> Associate Professor
>>>>> Dept. of Biochemistry and Biophysics
>>>>> 2128 TAMU
>>>>> Texas A&M Univ.
>>>>> College Station, TX 77843-2128
>>>>> 979-862-4054
>>>> _______________________________________________
>>>> Go mailing list
>>>> Go at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>
>>> _______________________________________________
>>> Go mailing list
>>> Go at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/go
>>
>

=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://fafner.stanford.edu/pipermail/go/attachments/20090323/ded33b86/attachment-0001.html>


More information about the Go mailing list