[Annotation] phenotype or GO-still struggling

Valerie Wood val at sanger.ac.uk
Fri Jul 11 07:48:26 PDT 2008


Hi Judy/ Harold,

In both of these examples (your heart development in the power point, 
and Harolds ribosomal example), we would make these  annotations using 
current practices (so I don't think we are being inconsistent here). I 
have a similar example to Harolds where a subunit of  
RNA polymerase II plays a specialized role in cell separation. This is 
what the data shows and this is fine.

What Karen and I are saying is that not EVERY annotation which can be 
made from a phenotype deserves a process annotation in the context of 
all of the available information.

Some processes which initially appear to be due to a particular 
phenotype turn out to be downstream effects based on subsequent 
information. We feel in these cases, where the effect is *known* to be 
*indirect* effect of an upstream process, then the process annotation 
based on this phenotype should be removed. It seems increasingly that it 
is not helpful for our communities using GO to make every annotation for 
the phenotype, if they are subsequently shown to be a result of an 
upstream process. This is the feedback I have got from my community, and 
makes more sense of global analysis.

Sometimes the observations initially attributed to cell division 
defects  are actually known to be due to defects in DNA repair or 
replication  because replication is late and cytokinesis  too early cell 
division is compromised. There are many more dependencies on rRNA 
processing and translation.

If it is NOT clear (reported) that the phenotype is due to the upstream 
process, then the IMP process from phenotype would still be valid.This 
shows a different level of knowledge which can be captured by a curator 
when more information is available. The phenotypes in these cases are 
still captured as appropriate.

Probably we have more cases like this because yeast are better studied, 
and there are many dependencies in cell biology. SGD may have some 
better examples as they have more legacy data.

Val





Judith Blake wrote:
> Hi,
> I sent a response with ppt and it's waiting to be moderated
>
> J
>
> Harold Drabkin wrote:
>>
>> On the other hand, we have to be careful about applying what we think 
>> we know to ignore what a  mutant phenotypes is telling you, because 
>> things can be complicated. .I just finished looking at one of the 
>> ribosomal proteins, Rpl10. There is  very little mouse data, but from 
>> skimming some other references (human), it appears to be originally 
>> identified in a screen for tumor suppressors. It is unclear why. It 
>> appears to be a protein that associates with the large subunit after 
>> the subunit is exported from the nucleus.  However, there is some 
>> reference to it's release from the 60S ribosomal subunit as a 
>> mechanism of transcript-specific translational control. This might 
>> have been reflected in the search for tumor suppressors. Yet another 
>> paper describes it is a  zinc-binding transcription regulatory 
>> protein: which can bind to c-Jun i ( this binding is dependent upon 
>> zinc ions and phosphorylation by protein kinase C ). Haven't looked 
>> at those papers in detail;   But there is something interesting going 
>> on (no one has done a KO in mouse that I can find which might tell us 
>> a bit more), and I'm not at all sure one should rule out that it 
>> participates in other processes other than the one obvious from it's 
>> name. Just grist for the mill.
>>
>> h
>>
>>
>> Valerie Wood wrote:
>>> I agree completely with Karen/SGD and this has been the procedure I 
>>> have always followed.
>>> In the absence of any other information, a mutant phenotype is 
>>> frequently used to infer a specific process.  Once more information 
>>> is available it often becomes clear that this is a downstream 
>>> (indirect affect).
>>> For example defects in ribosome biogenesis and translation  and 
>>> general translation will often have plieotrophic affects which are 
>>> indirect, as it will affect nearly every process downstream (for 
>>> example there are associated downstream effects in chromosome 
>>> segregation, cell division, and in multicellular organisms,  
>>> multiple developmental processes). This does not mean that a 
>>> biologist would expect to see the annotations to these processes 
>>> once the upstream process is known. If we did follow this logic, 
>>> then we would find that all genes involved in translation, ribosome 
>>> biogenesis and general replication would eventually become annotated 
>>> to most other processes.
>>>
>>> Another classic example from yeast is vacuolar targeting. Many 
>>> mutants result in defects which result in  proteins usually 
>>> localized to the vacuole becoming mislocalised and were initially 
>>> interpreted as a defect in protein targeting. It has since become 
>>> clear that many of these defects are very far upstream of the 
>>> vacuolar targeting pathway, and this is just a downstream 
>>> consequence of things being mis folded, mis transcribed etc. 
>>> Subsequently these annotations have gradually been removed as better 
>>> information has become available.
>>>
>>> On the other hand, mutations in a gene may have phenotypic effects 
>>> which you DO want to capture as processes (for example the effects 
>>> of phenylalanine hydroxylase on skin pigmentation etc). However you 
>>> would not necessarily want to curate the effect of a gene involved 
>>> in all translation initation in a developmental process from a high 
>>> throughput screen (once better information was avaiable). In Doug's 
>>> example I would also follow Karen's suggestion and make the 
>>> annotation if this is possibly specific transcription for the 
>>> pathway (i.e specific to a subset of genes), but if the defect is 
>>> definately general transcription I would not make the annotation.
>>>
>>> Not caputuring EVERY phenotype using biological process should not 
>>> be considered underannotation. The purpose of GO process annotations 
>>> is to capture processes not phenotypes. Sometimes phenotypes are 
>>> direct indicators of the process a gene is involved in sometimes 
>>> they are not.
>>> A major consequence of making these ubiquitous annotations is that 
>>> can distort   genome wide analysis (not improve it), and  this is 
>>> often the case when annotations  come from high throughput screens 
>>> and early experiments. Over the past couple of years cerevisiae and 
>>> pombe have done a lot of 'tidying' of these legacy annotations, and 
>>> the genome-wide GO data is much improved and useful as a result.
>>>
>>> This is also why annotations  to orthologs made using ISS should 
>>> only be made by a curator on a gene by gene basis and not by an 
>>> automated process. A curator is able to assess all of the available 
>>> information to make an ISS annotation (from different organisms) and 
>>> distinguish between current annotations and legacy annotations.
>>>
>>> One way to distinguish these is whether the targets are generic (i.e 
>>> every gene ) or specific (a subset of genes). If the genes targets 
>>> are a subset of genes then the annotations is  probably valid.
>>>
>>> Val
>>>
>>> Karen Christie <kchris at genome.stanford.edu> wrote: 
>>>> I don't think the GOC has ever had a policy, or even a 
>>>> recommendation, that process annotations should be made from all 
>>>> mutant phenotypes, nor do I think that it should.
>>>>
>>>> For example, SGD is currently working on annotating phenotypes for 
>>>> Cell Division Cycle (CDC) mutants, i.e. mutations which cause a 
>>>> cell cycle arrest phenotype. Here are some of the ones I worked on 
>>>> yesterday:
>>>>
>>>>     CDC60   leucyl tRNA synthetase
>>>>     PRT1    Subunit of eIF3
>>>>     ALA1    alanyl-tRNA synthetase
>>>>     CDC65   mitochondrial tRNA-Glu
>>>>     SPT16   Subunit FACT transcription elongation complex
>>>>
>>>> I don't think that anyone in the yeast community would expect or 
>>>> want to see any of these genes annotated to a GO process related to 
>>>> the cell cycle. There are lots of examples of where a mutant 
>>>> phenotype is due to some downstream effect and not due to the 
>>>> primary defect.
>>>>
>>>> So, at SGD, we try to focus on the primary process. Obviously, we 
>>>> don't always know, but once we do, we like to avoid making GO 
>>>> annotations for processes that are known to be downstream, rather 
>>>> than direct, results of the mutation.
>>>>
>>>> For Doug's specific example, if comparative data suggested that the 
>>>> gene was a specific regulatory transcription factor, I'd probably 
>>>> be inclined to go ahead and make specific process annotations. 
>>>> However, if comparative data suggested that it was related to a Pol 
>>>> II general transcription factor, I might not want to make a GO 
>>>> process annotation to such a specific process.
>>>>
>>>> At all of the Annotation Camps, we've always said that one should 
>>>> be careful when making annotations from mutant phenotypes. At both 
>>>> of the public ones, the question has come up of how much to 
>>>> annotate from mutant phenotypes. The answer we've given has been 
>>>> that if one only has a mutant phenotype to annotated from, then 
>>>> make the best annotations you can. However, be aware that as you 
>>>> learn more, you may find that some of the mutant phenotypes are 
>>>> indirect results rather than something the gene product is directly 
>>>> involved in, and that in these cases you may choose to remove 
>>>> process annotations based on these phenotypes.
>>>>
>>>> I think this is still good advice, that curator judgement should 
>>>> play a role in deciding whether a GO process annotation is merited 
>>>> from any particular mutant phenotype.
>>>>
>>>> -Karen
>>>>
>>>>
>>>> On Sun, 6 Jul 2008, Judith Blake wrote:
>>>>
>>>>   
>>>>> I can understand the duplication of effort, but since the GO and 
>>>>> phenotype annotations aren't co-mingled in GOdb, the SGD genes 
>>>>> would I think appear under-annotated if the effect of the gene on 
>>>>> phenotype is not curated in BP. For comparative genomics studies 
>>>>> using GO, this would be missing, yet available in the literature, 
>>>>> information.
>>>>>
>>>>> for mouse, the phenotype data is effectively 'disfunction' data, 
>>>>> so the phenotype annotation reflects a different view from the GO 
>>>>> annotation.
>>>>>
>>>>> Judy
>>>>>
>>>>> Julie Park wrote:
>>>>>     
>>>>>> Hi Doug,
>>>>>>
>>>>>> SGD's practice on this is that if it is known that what is being 
>>>>>> observed is a secondary/downstream effect, then we only capture 
>>>>>> it via phenotypes and not as a GO process.  However, if the gene 
>>>>>> product in question is not well characterized or there is a 
>>>>>> conflict in the literature about whether it is a direct or 
>>>>>> indirect involvement then we would give it a GO annotation.
>>>>>>
>>>>>> We've made a decision to use GO to try and capture the primary 
>>>>>> role of a gene product as much as possible and to reduce the 
>>>>>> duplication of effort required to capture data both in GO and as 
>>>>>> phenotypes.
>>>>>>
>>>>>> Just our take on things.
>>>>>>
>>>>>> Regards,
>>>>>> -Julie
>>>>>>
>>>>>>
>>>>>> On Jul 3, 2008, at 3:16 PM, Doug howe wrote:
>>>>>>
>>>>>>       
>>>>>>> Hi David,
>>>>>>> It still seems like there is a line that has to be drawn somewhere.
>>>>>>> We've talked in the past about the scope of a process...when 
>>>>>>> does it
>>>>>>> start and when does it end?  A gene that has as it's primary role
>>>>>>> regulation of transcription (perhaps binds DNA etc. etc.) may 
>>>>>>> have a
>>>>>>> secondary effect upon eye morphogenesis.  However, the process 
>>>>>>> of eye
>>>>>>> morphogenesis does not start with the binding of such a gene to a
>>>>>>> regulatory sequence...it is a downstream consequence....and 
>>>>>>> perhaps it
>>>>>>> is the gene who's expression is being regulated that is really 
>>>>>>> involved
>>>>>>> in the downstream process.  It seems like there is a significant 
>>>>>>> amount
>>>>>>> of redundant curation work to do if we always annotate both GO and
>>>>>>> phenotype using the same GO process terms.  I'm not strongly 
>>>>>>> opposed to
>>>>>>> such annotations, I just want to revisit the discussion and see if
>>>>>>> anyone has other views on the issue.
>>>>>>> -Doug
>>>>>>>
>>>>>>> David Hill wrote:
>>>>>>>         
>>>>>>>> Doug,
>>>>>>>>
>>>>>>>> I do this all the time. I just finished systematically doing 
>>>>>>>> all the homeobox genes in mouse. Many of them are annotated to 
>>>>>>>> things like pattern specification. I think in the future, it 
>>>>>>>> will be very nice to know these are playing roles in regulating 
>>>>>>>> transcription but that regulation is fundamental in other 
>>>>>>>> processes as well.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> Doug howe wrote:
>>>>>>>>           
>>>>>>>>> I'm still struggling with the issue of whether to make a GO 
>>>>>>>>> annotation (processes in particular) or only phenotype 
>>>>>>>>> annotation.  The zebrafish literature is replete with mutant 
>>>>>>>>> papers that often describe phenotypes involving eyes, otic 
>>>>>>>>> vesicles, or pharyngeal arches, organ development etc.   
>>>>>>>>> Often, the IEA annotations for a gene seems to indicate that 
>>>>>>>>> the gene is binding DNA, and may be some sort of 
>>>>>>>>> transcriptional regulator. Should such a gene be annotated 
>>>>>>>>> with GO terms like 'otic vesicle development', or 'eye 
>>>>>>>>> morphogenesis', or should that be left for phenotype annotations?
>>>>>>>>>
>>>>>>>>>               
>>>>>>> -- 
>>>>>>> Doug Howe, Ph.D.
>>>>>>> ZFIN Scientific Curator
>>>>>>> Zebrafish Nomenclature Coordinator
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Annotation mailing list
>>>>>>> Annotation at geneontology.org
>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>           
>>>>>> _______________________________________________
>>>>>> Annotation mailing list
>>>>>> Annotation at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>         
>>>>> _______________________________________________
>>>>> Annotation mailing list
>>>>> Annotation at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>
>>>>>       
>>>> _______________________________________________
>>>> Annotation mailing list
>>>> Annotation at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>
>>>>
>>>>     
>>>
>>>   
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> _______________________________________________
>>> Annotation mailing list
>>> Annotation at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>   
>>
>> _______________________________________________
>> Annotation mailing list
>> Annotation at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/annotation
>
>
>
>
>


-- 
---------------------------------------------------------------------------
Valerie Wood			 Tel: 01223 496909
S. pombe Genome Project		 Fax: 01223 494919 		       
Wellcome Trust Sanger Institute	 email: val at sanger.ac.uk
Wellcome Trust Genome Campus	 http://www.genedb.org/genedb/pombe 
Hinxton, Cambridge, CB10 1HH	 http://www.sanger.ac.uk/Projects/S_pombe



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


More information about the Annotation mailing list