[Go] generic GO slim question

D'Eustachio, Peter Peter.D'Eustachio at nyumc.org
Wed Jun 17 07:35:32 PDT 2009


I don't think a slim is a union. My amateur understanding is that they were originally constructed to allow all of the GO annotations of proteins for a species to be grouped into a smaller number of buckets, so that one could, for example, make pie charts showing that the fraction of proteins involved in "metabolism" is larger in this organism than in that one, or that the genes overexpressed after some stress are disproportionately involved in "DNA repair". The buckets were chosen to be high-level enough so that a substantial number of proteins would end up in each (allowing better statistics on those comparisons) and also meaningful to human biologists. I don't know whether anyone ever took an initial slim and used computational strategies to tune it, to try to optimize the distribution of proteins over buckets.



Peter



-----Original Message-----
From: go-bounces at genome.stanford.edu [mailto:go-bounces at genome.stanford.edu] On Behalf Of Jim Hu
Sent: Wednesday, June 17, 2009 10:23 AM
To: Judith Blake
Cc: GO mailing list
Subject: Re: [Go] generic GO slim question



Including interleaved terms - I am guessing this means parents of the terms used for the annotations - would it be correct to think of Slices/Subsets as unions and Slims as intersections?  If yes, then I think it could be automated, at least as a test to see if the result is something that matches expectations.  Was this not formalized when the slims were first created?



Jim



On Jun 17, 2009, at 8:56 AM, Judith Blake wrote:





Maybe we should distinguish 'slim' from 'slice'

Slim:  High level grouping terms

Slice:  Set of all terms (and interleaved terms?) that have been used in annotation for some subset of organisms.

????

Judy


On 6/17/09 6:43 AM, "Jane Lomax" <jane at ebi.ac.uk> wrote:

That's right - case 2 is more like the the prokaryotic 'subset' we
currently have. It has over 9000 terms so not really a slim, more like a
slice. There's probably demand for both, but there is a maintenance
overhead - more so for the second category.

Jane

Valerie Wood wrote:
> As an example of (1.) fission yeast uses 3361 different terms in
> total, 3127 of these are used for manual annotation (I was looking at
> this today) so the 'slim' would be quite 'fat' in this case.
> Val
>
>
> Judith Blake wrote:
>
>> Hi Jim,
>>
>> I think you bring forward the two different approaches to slims....
>>
>>    1. High level terms, typically fewer than 20, that can be used to
>>       look at overall distribution of gene attributes of a genome set.
>>
>>
>> 2. Set of terms that have been used in a particular context... Used
>> to annotate a prokaryotic protein...as in your example.
>>
>> Something to keep in mind.
>>
>> For me, the first case, with a few high level terms, is a 'slimming'
>> more apparently than the 2nd.
>>
>> Judy
>>
>>
>> On 6/16/09 12:59 PM, "Jim Hu" <jimhu at tamu.edu> wrote:
>>
>>     From what I can tell about the discussions of slims I've heard at
>>     GOC meetings, part of the problem is that maintaining them is an
>>     extra task that no one really has time to do. Which makes me
>>     wonder if slimming can be automated in some way. For example,
>>     anything that is used for a manual annotation of a prokaryote
>>     would go in the prokaryotic slim.
>>
>>     Jim
>>
>>
>>     On Jun 14, 2009, at 4:54 AM, Valerie Wood wrote:
>>
>>
>>         How was it decided which terms to include in the generic GO
>> slim?
>>
>>         There have been discussions previously about what makes a
>>         useful and relevent generic GO slim (but no agreement).
>>         However, it seems that at the very least the terms should be
>>         i) general, and ii) high level terms which constitute major
>>         cellular processes (and therefore areas of research) should be
>>         included.
>>
>>         So, I was wondering why the following terms are in the slim (I
>>         have included the TOTAL number of annotations for all
>>         organisms in parenthases)
>>
>>         i) plastid translation [1]
>>         ii) lead ion binding [2]
>>         iii) cytoplasmic chromosome [28]
>>         iv) neurotransmitter transporter [55]
>>
>>         Conversely the following biologically important "general"
>>         terms (at least from a single celled organism perprective) ,
>>         are absent from the generic GO slim
>>
>>         i) DNA replication [1685]
>>         ii) DNA repair [1934]
>>         iii) transmembrane transport [814]
>>         iv) ribosome biogenesis [1849]
>>         v) cytokinesis [1049]
>>         vi) cytoskeletal organization [2311]
>>         and others.
>>
>>         In addition, there is an obsolete molecular function term in
>>         the slim (chaperone regulator activity)
>>
>>         I wondered whether the contents of the slim need to be to make
>>         it more useful. I realise it isn't easy to make a slim which
>>         is good for all organisms. If this is the case perhaps we
>>         should consider abandoning the "generic generic" slim and
>>         define more useful individual generic slims for prokaryotes,
>>         eukaryotic unicellular, and multicellular orgs?
>>
>>         We might not agree about the utility of a "generic slim" but
>>         these are used a lot as they are the default slims used by
>>         AmiGO, and the Princeton generic GO term mapper.......They
>>         should provide a good overview of the known biology of any
>>         organism. They should probably provide a starting point for
>>         people who wish to refine to make their own slim and include
>>         more specific terms for their area of interest, and remove
>>         terms which are not useful. I am trying to write a tutorial
>>         which includes how to select terms for a slim to give complete
>>         coverage for their organism, and refine to make a more
>>         specific slim, but the the generic slim doesn't seem to
>>         provide very good example for a starting point.
>>
>>         Val
>>
>>
>>
>>
>>
>>
>
>
>


--
Dr Jane Lomax
GO Editorial Office
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridgeshire, UK
CB10 1SD

p: +44 1223 492516
f: +44 1223 494468





=====================================

Jim Hu

Associate Professor

Dept. of Biochemistry and Biophysics

2128 TAMU

Texas A&M Univ.

College Station, TX 77843-2128

979-862-4054







</PRE>
<html>
<body>
------------------------------------------------------------<br />
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.<br />
=================================
</body>
</html>
<PRE>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://fafner.stanford.edu/pipermail/go/attachments/20090617/b5bb242d/attachment.html>


More information about the Go mailing list