[Go] addition of localization specific process terms ?

Gwinn Giglio, Michelle mgiglio at SOM.UMARYLAND.EDU
Tue Mar 24 10:10:15 PDT 2009



Hi,

I know this discussion is tabled until the meeting, I just want to make a
small comment on what Chris said earlier about the MODs and UniProt having
access to tools that will make finding the terms easier.  I agree that is
probably true, not to mention the MOD's expertise with the GO will make it
much easier for them to find terms.  But when we approach this discussion, I
want to remind people that we need to remember the needs of annotators
beyond the MODs.  One of the GO goals its to get more people doing GO
annotation.  Therefore we need to strike a careful balance between
complexity and usability.  Whenever possible it would be great for the GO to
provide tools (browsing and annotation) that make all of this easy for
novice annotators no matter how complex the underlying systems are (but I
realize this is much easier said than done).   We have to remember that if
things get too complicated, people won't even try to use GO.

Michelle



On 3/23/09 7:37 PM, "David Hill" <dph at informatics.jax.org> wrote:

> It's on the agenda in ontology development, although we may want to move
> its timing.
> 
> David
> 
> Karen Christie wrote:
>> I actually meant the " when to instantiate localization specific
>> process terms" issue, though that is perhaps tied up in the col 16 and
>> 17 discussion too.
>> 
>> -Karen
>> 
>> 
>> On Mon, 23 Mar 2009, Chris Mungall wrote:
>> 
>>> 
>>> Thanks Karen
>>> 
>>> I guess it makes sense to talk about col 16 (and 17 whilst we are
>>> there anyway) before the binding discussion?
>>> 
>>> On Mar 23, 2009, at 4:09 PM, Karen Christie wrote:
>>> 
>>>> Maybe we should talk about this topic at the GO meeting. While there
>>>> was lots of discussion, I never really got a sense of what I should
>>>> actually do now, in terms of when, or when not, to request new
>>>> "pre-composed" terms.
>>>> 
>>>> I guess I'll put this on the agenda.
>>>> 
>>>> -Karen
>>>> 
>>>> 
>>>> On Mon, 23 Mar 2009, Chris Mungall wrote:
>>>> 
>>>>> 
>>>>> On Mar 4, 2009, at 12:21 PM, Chris Mungall wrote:
>>>>> 
>>>>>> On Mar 4, 2009, at 10:33 AM, Jim Hu wrote:
>>>>>>> On Mar 4, 2009, at 11:49 AM, Chris Mungall wrote:
>>>>>>>> On Mar 4, 2009, at 7:59 AM, Jim Hu wrote:
>>>>>>>>> On Mar 4, 2009, at 2:38 AM, Valerie Wood wrote:
>>>>>>>>>> Because of all of the arguments in favour  mentioned by Karen
>>>>>>>>>> and Chris I  thought it was always necessary and required for
>>>>>>>>>> curators to make the more granular annotation in these cases.
>>>>>>>>>> We decided long ago that proliferation of the ontology was not
>>>>>>>>>> an issue when pitched against accurate capture of biology,
>>>>>>>>>> and  I wasn't aware that it was ever GO philosophy not to
>>>>>>>>>> capture compartment specific processes in this way.
>>>>>>>>> I wasn't involved in GO when this was decided, but as someone
>>>>>>>>> who does stuff on the software side as well as the annotation
>>>>>>>>> side, I think proliferation of the ontology should be an issue
>>>>>>>>> that is not dismissed so lightly.
>>>>>>>> What are your concerns in particular?
>>>>>>> My two concerns are the obvious ones, nothing particularly
>>>>>>> sophisticated:
>>>>>>> 1) performance, especially of web-based tools that have to
>>>>>>> display GO with short processing times.  IIRC, AmiGO has had this
>>>>>>> problem - traversing the ontology to find all the children and
>>>>>>> annotations to children is slow enough that Mike had to write a
>>>>>>> cron job to kill excess db queries that came from users getting
>>>>>>> impatient and reloading the page while the traversal was in
>>>>>>> progress.  As the ontology gets big, these traversals take
>>>>>>> longer.  Maybe there are more efficient algorithms to deal with
>>>>>>> this, maybe AJAX partially makes this tolerable, and maybe the
>>>>>>> problem is the same with post-composition.  But it seems to me
>>>>>>> that at some point sheer size has a performance hit.
>>>>>>> 2) User interface.  When I browse the ontology to look for the
>>>>>>> appropriate terms to do an annotation, there are nodes that would
>>>>>>> be unreadable if precomposition was being done consistently.
>>>>>>> Fortunately it isn't being done consistently at present.  For
>>>>>>> example, look at the children of the positive and negative
>>>>>>> regulation terms in the process ontology.  There are terms in
>>>>>>> there for mRNAs for specific genes (oskar and bicoid)!  That
>>>>>>> strikes me as being completely insane... if implemented for all
>>>>>>> regulated genes in all organisms, that node would have hundreds
>>>>>>> of thousands of children - it would be a large subset of
>>>>>>> UniProt/Genbank all at one level. Or worse, because many genes
>>>>>>> would be present at multiple overpopulated nodes in GO.
>>>>> 
>>>>> I previously addressed this from an end-user point of view. But as
>>>>> Jim mentions in the sf tracker item about binding, it's also
>>>>> important to consider this from the curation point of view.
>>>>> 
>>>>> Jim's point is that increased pre-coordination in the ontology
>>>>> makes it harder for curators, because it will take longer to hone
>>>>> in on the most appropriate term for an annotation.
>>>>> 
>>>>> Whilst I can see that obviously there is some correlation between
>>>>> ontology size and time to find a term, I'm wondering the extent to
>>>>> which this is a problem. I would have expected that most annotation
>>>>> systems used at the MODs and UniProtKB would utilize some kind of
>>>>> term completion rather than the curator manually traversing down
>>>>> the graph. Also, if the curators are expected to post-compose using
>>>>> col 16, then they have *two* terms to find: for example to annotate
>>>>> "PEP binding" they would find the most specific term in GO *and*
>>>>> the relevant CHEBI terms (and finding terms in CHEBI is probably
>>>>> harder than finding terms in GO)
>>>>> 
>>>>> But I don't annotate so I'm not sure.
>>>>> 
>>>>> I would like to hear the opinion of some of the annotators here. Is
>>>>> excessive pre-coordination a concern for curation?
>>>>> 
>>>>> I think it would be good if at the meeting a representative curator
>>>>> from each of the main annotation producing groups were to comment
>>>>> on the various situations in which pre-composed terms vs col 16 are
>>>>> preferred.
>>>>> 
>>>>>> I would also add another concern that others often bring up:
>>>>>> 3) Difficulty in maintaining the correct parentage in the ontology
>>>>>> (Karen brought this up in her email)
>>>>>> However, I would respond to this and say that as we gain
>>>>>> confidence in using the cross-product definitions and the reasoner
>>>>>> to automate this procedure it becomes less of a concern (not yet
>>>>>> eliminated, but less of a concern).
>>>>>> For example, there used to be massive errors in the regulation
>>>>>> graph, but we now use the reasoner and the regulation xps in batch
>>>>>> frequently, and as soon as OE2 is released we can directly
>>>>>> incorporate this directly into the ontology editing cycle. Thanks
>>>>>> to Midori's efforts we are making a lot of progress on the more
>>>>>> difficult BPxCC composite terms, and I feel we will soon be able
>>>>>> to manage the hierarchy for these terms automatically, making
>>>>>> pre-composition less of a worry here:
>>>>>> 
>>>>>>     
>>>>>> http://wiki.geneontology.org/index.php/XP:biological_process_xp_cellular_
>>>>>> component 
>>>>>> 
>>>>>>> I shudder to think what the graph representations would look like.
>>>>>>> <snip>
>>>>>>>>> I find the argument that one can't do an AND with some tools to
>>>>>>>>> be more of an argument to improve the tools than an argument to
>>>>>>>>> do extensive precomposition.  If we have to build GO practice
>>>>>>>>> around the weakest tools, then we should also do explicit
>>>>>>>>> annotation all the way up to root for every term, to handle
>>>>>>>>> tools that don't use the true path rule.  I'm NOT advocating
>>>>>>>>> that!!
>>>>>>>> I agree that we shouldn't avoid doing the right thing because of
>>>>>>>> the weakest tools. I think we should have a plan for how we can
>>>>>>>> support tools, but I think we first need to agree roughly on
>>>>>>>> what the right thing is..
>>>>>>> I think everyone agrees with this!
>>>>>>> Jim
>>>>>>>>> Jim
>>>>>>>>> =====================================
>>>>>>>>> Jim Hu
>>>>>>>>> Associate Professor
>>>>>>>>> Dept. of Biochemistry and Biophysics
>>>>>>>>> 2128 TAMU
>>>>>>>>> Texas A&M Univ.
>>>>>>>>> College Station, TX 77843-2128
>>>>>>>>> 979-862-4054
>>>>>>> =====================================
>>>>>>> Jim Hu
>>>>>>> Associate Professor
>>>>>>> Dept. of Biochemistry and Biophysics
>>>>>>> 2128 TAMU
>>>>>>> Texas A&M Univ.
>>>>>>> College Station, TX 77843-2128
>>>>>>> 979-862-4054
>>>>>> _______________________________________________
>>>>>> Go mailing list
>>>>>> Go at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>>>> 
>>>>> _______________________________________________
>>>>> Go mailing list
>>>>> Go at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/go
>>> 
>> _______________________________________________
>> Go mailing list
>> Go at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/go



More information about the Go mailing list