[Go] addition of localization specific process terms ?

Karen Christie kchris at genome.stanford.edu
Mon Mar 23 16:09:38 PDT 2009


Maybe we should talk about this topic at the GO meeting. While there was 
lots of discussion, I never really got a sense of what I should actually 
do now, in terms of when, or when not, to request new "pre-composed" 
terms.

I guess I'll put this on the agenda.

-Karen


On Mon, 23 Mar 2009, Chris Mungall wrote:

>
> On Mar 4, 2009, at 12:21 PM, Chris Mungall wrote:
>
>> 
>> On Mar 4, 2009, at 10:33 AM, Jim Hu wrote:
>> 
>>> On Mar 4, 2009, at 11:49 AM, Chris Mungall wrote:
>>> 
>>>> On Mar 4, 2009, at 7:59 AM, Jim Hu wrote:
>>>> 
>>>>> On Mar 4, 2009, at 2:38 AM, Valerie Wood wrote:
>>>>> 
>>>>>> Because of all of the arguments in favour  mentioned by Karen and Chris 
>>>>>> I  thought it was always necessary and required for curators to make 
>>>>>> the more granular annotation in these cases. We decided long ago that 
>>>>>> proliferation of the ontology was not an issue when pitched against 
>>>>>> accurate capture of biology, and  I wasn't aware that it was ever GO 
>>>>>> philosophy not to capture compartment specific processes in this way.
>>>>> 
>>>>> I wasn't involved in GO when this was decided, but as someone who does 
>>>>> stuff on the software side as well as the annotation side, I think 
>>>>> proliferation of the ontology should be an issue that is not dismissed 
>>>>> so lightly.
>>>> 
>>>> What are your concerns in particular?
>>>>> 
>>> 
>>> My two concerns are the obvious ones, nothing particularly sophisticated:
>>> 
>>> 1) performance, especially of web-based tools that have to display GO with 
>>> short processing times.  IIRC, AmiGO has had this problem - traversing the 
>>> ontology to find all the children and annotations to children is slow 
>>> enough that Mike had to write a cron job to kill excess db queries that 
>>> came from users getting impatient and reloading the page while the 
>>> traversal was in progress.  As the ontology gets big, these traversals 
>>> take longer.  Maybe there are more efficient algorithms to deal with this, 
>>> maybe AJAX partially makes this tolerable, and maybe the problem is the 
>>> same with post-composition.  But it seems to me that at some point sheer 
>>> size has a performance hit.
>>> 
>>> 2) User interface.  When I browse the ontology to look for the appropriate 
>>> terms to do an annotation, there are nodes that would be unreadable if 
>>> precomposition was being done consistently.  Fortunately it isn't being 
>>> done consistently at present.  For example, look at the children of the 
>>> positive and negative regulation terms in the process ontology.  There are 
>>> terms in there for mRNAs for specific genes (oskar and bicoid)!  That 
>>> strikes me as being completely insane... if implemented for all regulated 
>>> genes in all organisms, that node would have hundreds of thousands of 
>>> children - it would be a large subset of UniProt/Genbank all at one level. 
>>> Or worse, because many genes would be present at multiple overpopulated 
>>> nodes in GO.
>
> I previously addressed this from an end-user point of view. But as Jim 
> mentions in the sf tracker item about binding, it's also important to 
> consider this from the curation point of view.
>
> Jim's point is that increased pre-coordination in the ontology makes it 
> harder for curators, because it will take longer to hone in on the most 
> appropriate term for an annotation.
>
> Whilst I can see that obviously there is some correlation between ontology 
> size and time to find a term, I'm wondering the extent to which this is a 
> problem. I would have expected that most annotation systems used at the MODs 
> and UniProtKB would utilize some kind of term completion rather than the 
> curator manually traversing down the graph. Also, if the curators are 
> expected to post-compose using col 16, then they have *two* terms to find: 
> for example to annotate "PEP binding" they would find the most specific term 
> in GO *and* the relevant CHEBI terms (and finding terms in CHEBI is probably 
> harder than finding terms in GO)
>
> But I don't annotate so I'm not sure.
>
> I would like to hear the opinion of some of the annotators here. Is excessive 
> pre-coordination a concern for curation?
>
> I think it would be good if at the meeting a representative curator from each 
> of the main annotation producing groups were to comment on the various 
> situations in which pre-composed terms vs col 16 are preferred.
>
>> I would also add another concern that others often bring up:
>> 
>> 3) Difficulty in maintaining the correct parentage in the ontology (Karen 
>> brought this up in her email)
>> 
>> However, I would respond to this and say that as we gain confidence in 
>> using the cross-product definitions and the reasoner to automate this 
>> procedure it becomes less of a concern (not yet eliminated, but less of a 
>> concern).
>> 
>> For example, there used to be massive errors in the regulation graph, but 
>> we now use the reasoner and the regulation xps in batch frequently, and as 
>> soon as OE2 is released we can directly incorporate this directly into the 
>> ontology editing cycle. Thanks to Midori's efforts we are making a lot of 
>> progress on the more difficult BPxCC composite terms, and I feel we will 
>> soon be able to manage the hierarchy for these terms automatically, making 
>> pre-composition less of a worry here:
>>
>> 	http://wiki.geneontology.org/index.php/XP:biological_process_xp_cellular_component
>>> 
>>> I shudder to think what the graph representations would look like.
>>> 
>>> 
>>> <snip>
>>>>> I find the argument that one can't do an AND with some tools to be more 
>>>>> of an argument to improve the tools than an argument to do extensive 
>>>>> precomposition.  If we have to build GO practice around the weakest 
>>>>> tools, then we should also do explicit annotation all the way up to root 
>>>>> for every term, to handle tools that don't use the true path rule.  I'm 
>>>>> NOT advocating that!!
>>>> 
>>>> I agree that we shouldn't avoid doing the right thing because of the 
>>>> weakest tools. I think we should have a plan for how we can support 
>>>> tools, but I think we first need to agree roughly on what the right thing 
>>>> is..
>>> 
>>> I think everyone agrees with this!
>>> 
>>> Jim
>>> 
>>> 
>>>> 
>>>>> 
>>>>> Jim
>>>>> 
>>>>> =====================================
>>>>> Jim Hu
>>>>> Associate Professor
>>>>> Dept. of Biochemistry and Biophysics
>>>>> 2128 TAMU
>>>>> Texas A&M Univ.
>>>>> College Station, TX 77843-2128
>>>>> 979-862-4054
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> =====================================
>>> Jim Hu
>>> Associate Professor
>>> Dept. of Biochemistry and Biophysics
>>> 2128 TAMU
>>> Texas A&M Univ.
>>> College Station, TX 77843-2128
>>> 979-862-4054
>>> 
>>> 
>>> 
>> 
>> _______________________________________________
>> Go mailing list
>> Go at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/go
>> 
>
> _______________________________________________
> Go mailing list
> Go at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/go


More information about the Go mailing list