[Annotation] phenotype or GO-still struggling

Judith Blake jblake at informatics.jax.org
Tue Jul 15 12:05:28 PDT 2008


Frankly I don't understand why we are having this discussion when we 
agree that IMPs are valid annotations.  Of course one might re-evaluate 
IMPs with new data.  I don't understand why this is big news.

Judy

Karen Christie wrote:
> Hi,
>
> I completely agree with what Val said, including that in both Judy's 
> and Harold's examples, I agree that it is appropriate to make 
> annotations based on current knowledge.
>
> I also sometimes come across genes where there is evidence that 
> shows/suggests that it has multiple roles. Most recently REX2, which 
> is involved in 3'-end processing of various different 
> nuclear/nucleolar RNAs, and is also localized to the mitochondrion 
> where its role is not clear but it interacts genetically with TRZ1, 
> the tRNA 3'-end processing endonuclease. In cases like these, we make 
> all the annotations that are supported by experimental evidence, even 
> those that may be surprising.
>
> However, much more frequently, I come across cases where the original 
> idea about what a gene did, based solely on its mutant phenotype, is 
> later shown to be due to a downstream effect of the mutation or to an 
> artifact of the experimental system. In these types of cases, we 
> choose not to represent these mutant phenotypes as GO annotations. 
> Some examples of known cases for cerevisiae are below.
>
> So, as more is known about a given gene, it seems it is often 
> appropriate to reevaluate whether old annotations based on IMP are 
> still valid. Sometimes they are and other times they really seem 
> inappropriate in light of new knowledge. For multicellular organisms, 
> many of these developmental mutants may well turn out to be genes 
> specifically involved in that developmental process. I went to a talk 
> a couple months ago that said that mammals have 100-1000 fold more 
> specific regulatory transcription factors than yeast. A mutant in one 
> of these might be quite informative as to which processes it regulates.
>
> However, there will surely also be cases where there is something else 
> is occurring. For example, a human disease called SCID (Severe 
> Combined Immune Deficiency) is caused by deficiency of the enzyme 
> adenosine deaminase (ADA). However, I'm not sure one would want to say 
> that ADA is involved in immune cell development; it is generally 
> active through the body. Rather, when ADA is defective, a toxic 
> intermediate builds up and immune and other rapidly dividing cells are 
> most sensitive. As the specific effect of ADA mutantions on immune 
> cells is a pathology, rather than a normal process, it seems outwith 
> the scope of GO to annotate the immune cell effect of ADA mutants.
>
> SGD has also started having large sets of high-throughput mutant 
> phenotypes data. We have found that many of these screens identify 
> large sets of genes with a given phenotype. However, based on the 
> knowledge of what many of these genes do, we have become rather leary 
> of making GO annotations wholesale from these large mutant phenotype 
> studies because the mutant phenotype doesn't seem to be a very 
> specific indicator of the process the gene is involved in. We're 
> seeing a lot of these now. We've basically decided that though we are 
> quite happy to put these into our phenotype curation wholesale, we are 
> not comfortable in making GO annotations based on these large scale 
> phenotype screens.
>
> -Karen
>
> Some specific examples for cerevisiae:
>
> 1. cell cycle arrest phenotypes: people looked for things with cell
> division cycle (cdc) arrest phenotypes in order to find cell cycle
> regulators. Some cdc mutants actually are cell cycle regulators. However,
> the collection of cdc mutants also includes:
> - tRNAs
> - tRNA synthetases
> - an Hsp90 co-chaperone
> - general transcription regulators, e.g. components of the Paf1
>         transcription regulatory complex, members of the CCR4/NOT complex
> - things involved in response to mating pheromone (which do cause G1
>         arrest, but which are not thought to be cell cycle regulators)
> - eIF4E cytoplasmic mRNA cap binding protein required for translation
>
> Inhibition of ribosome synthesis can also produce cell cycle arrest. A
> couple different U3 snoRNA associated complexes involved in the first
> stages of rRNA processing and small ribosomal subunit assembly have
> recently been characterized in yeast. U3 and many, if not most, of the
> proteins are conserved. Depeleting for most of the individual protein
> components of these complexes produces cell cycle arrest.
>
> So, while SGD would be quite happy to have a cell cycle arrest 
> -phenotype-
> annotated for every gene, we don't really want to go on and make a GO
> process annotation to cell cycle for many of these genes.
>
> 2. splicing vs translation - A lot of things that turn out to be 
> involved in splicing of nuclear mRNAs were originally characterized as 
> being involved in translation. This turns out to be due to the unusual 
> distribution of introns in S. cerevisiae. Only about 270 genes, out of 
> 6000, contain introns, and these are predominantly found in protein 
> coding genes. Thus splicing defects have a very immediate effect on 
> translation due to loss of production of ribosomal proteins.
>
> In light of the knowledge of why splicing mutants cause translation 
> defects, we don't want to make GO process annotations to 
> translation-related terms for splicing genes even if they do produce a 
> translation-specific phenotype.
>
> 3. AAR2 - This gene was originally thought to be specifically required 
> for splicing of MATa1 mRNA because mutant extracts appeared 
> specifically defective in splicing this mRNA. It turns out to be due 
> to the fact that MATa1 has 2 introns, while almost all other genes 
> only have 1, which meant that the assay system ran out of splicing 
> components when MATa1 was used, but not when any of the other test 
> pre-mRNAs were used. There was actually a specific GO term (GO:0006377 
> - MATa1 (A1) pre-mRNA splicing) based on the original mutant 
> characterization of this gene.
>
> It turns out that Aar2 is actually part of general splicing factor U5
> snRNP, and thus required for splicing generally. GO:0006377 was obsoleted
> because a MATa1-specific splicing process does not occur.
>
>
>
> On Fri, 11 Jul 2008, Valerie Wood wrote:
>
>>
>> Hi Judy/ Harold,
>>
>> In both of these examples (your heart development in the power point, 
>> and Harolds ribosomal example), we would make these  annotations 
>> using current practices (so I don't think we are being inconsistent 
>> here). I have a similar example to Harolds where a subunit of  RNA 
>> polymerase II plays a specialized role in cell separation. This is 
>> what the data shows and this is fine.
>>
>> What Karen and I are saying is that not EVERY annotation which can be 
>> made from a phenotype deserves a process annotation in the context of 
>> all of the available information.
>>
>> Some processes which initially appear to be due to a particular 
>> phenotype turn out to be downstream effects based on subsequent 
>> information. We feel in these cases, where the effect is *known* to 
>> be *indirect* effect of an upstream process, then the process 
>> annotation based on this phenotype should be removed. It seems 
>> increasingly that it is not helpful for our communities using GO to 
>> make every annotation for the phenotype, if they are subsequently 
>> shown to be a result of an upstream process. This is the feedback I 
>> have got from my community, and makes more sense of global analysis.
>>
>> Sometimes the observations initially attributed to cell division 
>> defects  are actually known to be due to defects in DNA repair or 
>> replication  because replication is late and cytokinesis  too early 
>> cell division is compromised. There are many more dependencies on 
>> rRNA processing and translation.
>>
>> If it is NOT clear (reported) that the phenotype is due to the 
>> upstream process, then the IMP process from phenotype would still be 
>> valid.This shows a different level of knowledge which can be captured 
>> by a curator when more information is available. The phenotypes in 
>> these cases are still captured as appropriate.
>>
>> Probably we have more cases like this because yeast are better 
>> studied, and there are many dependencies in cell biology. SGD may 
>> have some better examples as they have more legacy data.
>>
>> Val
>>
>>
>>
>>
>>
>> Judith Blake wrote:
>>> Hi,
>>> I sent a response with ppt and it's waiting to be moderated
>>>
>>> J
>>>
>>> Harold Drabkin wrote:
>>>>
>>>> On the other hand, we have to be careful about applying what we 
>>>> think we know to ignore what a  mutant phenotypes is telling you, 
>>>> because things can be complicated. .I just finished looking at one 
>>>> of the ribosomal proteins, Rpl10. There is  very little mouse data, 
>>>> but from skimming some other references (human), it appears to be 
>>>> originally identified in a screen for tumor suppressors. It is 
>>>> unclear why. It appears to be a protein that associates with the 
>>>> large subunit after the subunit is exported from the nucleus.  
>>>> However, there is some reference to it's release from the 60S 
>>>> ribosomal subunit as a mechanism of transcript-specific 
>>>> translational control. This might have been reflected in the search 
>>>> for tumor suppressors. Yet another paper describes it is a 
>>>> zinc-binding transcription regulatory protein: which can bind to 
>>>> c-Jun i ( this binding is dependent upon zinc ions and 
>>>> phosphorylation by protein kinase C ). Haven't looked at those 
>>>> papers in detail;   But there is something interesting going on (no 
>>>> one has done a KO in mouse that I can find which might tell us a 
>>>> bit more), and I'm not at all sure one should rule out that it 
>>>> participates in other processes other than the one obvious from 
>>>> it's name. Just grist for the mill.
>>>>
>>>> h
>>>>
>>>>
>>>> Valerie Wood wrote:
>>>>> I agree completely with Karen/SGD and this has been the procedure 
>>>>> I have always followed.
>>>>> In the absence of any other information, a mutant phenotype is 
>>>>> frequently used to infer a specific process.  Once more 
>>>>> information is available it often becomes clear that this is a 
>>>>> downstream (indirect affect).
>>>>> For example defects in ribosome biogenesis and translation  and 
>>>>> general translation will often have plieotrophic affects which are 
>>>>> indirect, as it will affect nearly every process downstream (for 
>>>>> example there are associated downstream effects in chromosome 
>>>>> segregation, cell division, and in multicellular organisms,  
>>>>> multiple developmental processes). This does not mean that a 
>>>>> biologist would expect to see the annotations to these processes 
>>>>> once the upstream process is known. If we did follow this logic, 
>>>>> then we would find that all genes involved in translation, 
>>>>> ribosome biogenesis and general replication would eventually 
>>>>> become annotated to most other processes.
>>>>>
>>>>> Another classic example from yeast is vacuolar targeting. Many 
>>>>> mutants result in defects which result in  proteins usually 
>>>>> localized to the vacuole becoming mislocalised and were initially 
>>>>> interpreted as a defect in protein targeting. It has since become 
>>>>> clear that many of these defects are very far upstream of the 
>>>>> vacuolar targeting pathway, and this is just a downstream 
>>>>> consequence of things being mis folded, mis transcribed etc. 
>>>>> Subsequently these annotations have gradually been removed as 
>>>>> better information has become available.
>>>>>
>>>>> On the other hand, mutations in a gene may have phenotypic effects 
>>>>> which you DO want to capture as processes (for example the effects 
>>>>> of phenylalanine hydroxylase on skin pigmentation etc). However 
>>>>> you would not necessarily want to curate the effect of a gene 
>>>>> involved in all translation initation in a developmental process 
>>>>> from a high throughput screen (once better information was 
>>>>> avaiable). In Doug's example I would also follow Karen's 
>>>>> suggestion and make the annotation if this is possibly specific 
>>>>> transcription for the pathway (i.e specific to a subset of genes), 
>>>>> but if the defect is definately general transcription I would not 
>>>>> make the annotation.
>>>>>
>>>>> Not caputuring EVERY phenotype using biological process should not 
>>>>> be considered underannotation. The purpose of GO process 
>>>>> annotations is to capture processes not phenotypes. Sometimes 
>>>>> phenotypes are direct indicators of the process a gene is involved 
>>>>> in sometimes they are not.
>>>>> A major consequence of making these ubiquitous annotations is that 
>>>>> can distort   genome wide analysis (not improve it), and  this is 
>>>>> often the case when annotations  come from high throughput screens 
>>>>> and early experiments. Over the past couple of years cerevisiae 
>>>>> and pombe have done a lot of 'tidying' of these legacy 
>>>>> annotations, and the genome-wide GO data is much improved and 
>>>>> useful as a result.
>>>>>
>>>>> This is also why annotations  to orthologs made using ISS should 
>>>>> only be made by a curator on a gene by gene basis and not by an 
>>>>> automated process. A curator is able to assess all of the 
>>>>> available information to make an ISS annotation (from different 
>>>>> organisms) and distinguish between current annotations and legacy 
>>>>> annotations.
>>>>>
>>>>> One way to distinguish these is whether the targets are generic 
>>>>> (i.e every gene ) or specific (a subset of genes). If the genes 
>>>>> targets are a subset of genes then the annotations is  probably 
>>>>> valid.
>>>>>
>>>>> Val
>>>>>
>>>>> Karen Christie <kchris at genome.stanford.edu> wrote:
>>>>>> I don't think the GOC has ever had a policy, or even a 
>>>>>> recommendation, that process annotations should be made from all 
>>>>>> mutant phenotypes, nor do I think that it should.
>>>>>>
>>>>>> For example, SGD is currently working on annotating phenotypes 
>>>>>> for Cell Division Cycle (CDC) mutants, i.e. mutations which cause 
>>>>>> a cell cycle arrest phenotype. Here are some of the ones I worked 
>>>>>> on yesterday:
>>>>>>
>>>>>>     CDC60   leucyl tRNA synthetase
>>>>>>     PRT1    Subunit of eIF3
>>>>>>     ALA1    alanyl-tRNA synthetase
>>>>>>     CDC65   mitochondrial tRNA-Glu
>>>>>>     SPT16   Subunit FACT transcription elongation complex
>>>>>>
>>>>>> I don't think that anyone in the yeast community would expect or 
>>>>>> want to see any of these genes annotated to a GO process related 
>>>>>> to the cell cycle. There are lots of examples of where a mutant 
>>>>>> phenotype is due to some downstream effect and not due to the 
>>>>>> primary defect.
>>>>>>
>>>>>> So, at SGD, we try to focus on the primary process. Obviously, we 
>>>>>> don't always know, but once we do, we like to avoid making GO 
>>>>>> annotations for processes that are known to be downstream, rather 
>>>>>> than direct, results of the mutation.
>>>>>>
>>>>>> For Doug's specific example, if comparative data suggested that 
>>>>>> the gene was a specific regulatory transcription factor, I'd 
>>>>>> probably be inclined to go ahead and make specific process 
>>>>>> annotations. However, if comparative data suggested that it was 
>>>>>> related to a Pol II general transcription factor, I might not 
>>>>>> want to make a GO process annotation to such a specific process.
>>>>>>
>>>>>> At all of the Annotation Camps, we've always said that one should 
>>>>>> be careful when making annotations from mutant phenotypes. At 
>>>>>> both of the public ones, the question has come up of how much to 
>>>>>> annotate from mutant phenotypes. The answer we've given has been 
>>>>>> that if one only has a mutant phenotype to annotated from, then 
>>>>>> make the best annotations you can. However, be aware that as you 
>>>>>> learn more, you may find that some of the mutant phenotypes are 
>>>>>> indirect results rather than something the gene product is 
>>>>>> directly involved in, and that in these cases you may choose to 
>>>>>> remove process annotations based on these phenotypes.
>>>>>>
>>>>>> I think this is still good advice, that curator judgement should 
>>>>>> play a role in deciding whether a GO process annotation is 
>>>>>> merited from any particular mutant phenotype.
>>>>>>
>>>>>> -Karen
>>>>>>
>>>>>>
>>>>>> On Sun, 6 Jul 2008, Judith Blake wrote:
>>>>>>
>>>>>>
>>>>>>> I can understand the duplication of effort, but since the GO and 
>>>>>>> phenotype annotations aren't co-mingled in GOdb, the SGD genes 
>>>>>>> would I think appear under-annotated if the effect of the gene 
>>>>>>> on phenotype is not curated in BP. For comparative genomics 
>>>>>>> studies using GO, this would be missing, yet available in the 
>>>>>>> literature, information.
>>>>>>>
>>>>>>> for mouse, the phenotype data is effectively 'disfunction' data, 
>>>>>>> so the phenotype annotation reflects a different view from the 
>>>>>>> GO annotation.
>>>>>>>
>>>>>>> Judy
>>>>>>>
>>>>>>> Julie Park wrote:
>>>>>>>
>>>>>>>> Hi Doug,
>>>>>>>>
>>>>>>>> SGD's practice on this is that if it is known that what is 
>>>>>>>> being observed is a secondary/downstream effect, then we only 
>>>>>>>> capture it via phenotypes and not as a GO process.  However, if 
>>>>>>>> the gene product in question is not well characterized or there 
>>>>>>>> is a conflict in the literature about whether it is a direct or 
>>>>>>>> indirect involvement then we would give it a GO annotation.
>>>>>>>>
>>>>>>>> We've made a decision to use GO to try and capture the primary 
>>>>>>>> role of a gene product as much as possible and to reduce the 
>>>>>>>> duplication of effort required to capture data both in GO and 
>>>>>>>> as phenotypes.
>>>>>>>>
>>>>>>>> Just our take on things.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> -Julie
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jul 3, 2008, at 3:16 PM, Doug howe wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi David,
>>>>>>>>> It still seems like there is a line that has to be drawn 
>>>>>>>>> somewhere.
>>>>>>>>> We've talked in the past about the scope of a process...when 
>>>>>>>>> does it
>>>>>>>>> start and when does it end?  A gene that has as it's primary role
>>>>>>>>> regulation of transcription (perhaps binds DNA etc. etc.) may 
>>>>>>>>> have a
>>>>>>>>> secondary effect upon eye morphogenesis.  However, the process 
>>>>>>>>> of eye
>>>>>>>>> morphogenesis does not start with the binding of such a gene to a
>>>>>>>>> regulatory sequence...it is a downstream consequence....and 
>>>>>>>>> perhaps it
>>>>>>>>> is the gene who's expression is being regulated that is really 
>>>>>>>>> involved
>>>>>>>>> in the downstream process.  It seems like there is a 
>>>>>>>>> significant amount
>>>>>>>>> of redundant curation work to do if we always annotate both GO 
>>>>>>>>> and
>>>>>>>>> phenotype using the same GO process terms.  I'm not strongly 
>>>>>>>>> opposed to
>>>>>>>>> such annotations, I just want to revisit the discussion and 
>>>>>>>>> see if
>>>>>>>>> anyone has other views on the issue.
>>>>>>>>> -Doug
>>>>>>>>>
>>>>>>>>> David Hill wrote:
>>>>>>>>>
>>>>>>>>>> Doug,
>>>>>>>>>>
>>>>>>>>>> I do this all the time. I just finished systematically doing 
>>>>>>>>>> all the homeobox genes in mouse. Many of them are annotated 
>>>>>>>>>> to things like pattern specification. I think in the future, 
>>>>>>>>>> it will be very nice to know these are playing roles in 
>>>>>>>>>> regulating transcription but that regulation is fundamental 
>>>>>>>>>> in other processes as well.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>> Doug howe wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm still struggling with the issue of whether to make a GO 
>>>>>>>>>>> annotation (processes in particular) or only phenotype 
>>>>>>>>>>> annotation. The zebrafish literature is replete with mutant 
>>>>>>>>>>> papers that often describe phenotypes involving eyes, otic 
>>>>>>>>>>> vesicles, or pharyngeal arches, organ development etc.   
>>>>>>>>>>> Often, the IEA annotations for a gene seems to indicate that 
>>>>>>>>>>> the gene is binding DNA, and may be some sort of 
>>>>>>>>>>> transcriptional regulator. Should such a gene be annotated 
>>>>>>>>>>> with GO terms like 'otic vesicle development', or 'eye 
>>>>>>>>>>> morphogenesis', or should that be left for phenotype 
>>>>>>>>>>> annotations?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Doug Howe, Ph.D.
>>>>>>>>> ZFIN Scientific Curator
>>>>>>>>> Zebrafish Nomenclature Coordinator
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Annotation mailing list
>>>>>>>>> Annotation at geneontology.org
>>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Annotation mailing list
>>>>>>>> Annotation at geneontology.org
>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Annotation mailing list
>>>>>>> Annotation at geneontology.org
>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Annotation mailing list
>>>>>> Annotation at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------ 
>>>>>
>>>>> _______________________________________________
>>>>> Annotation mailing list
>>>>> Annotation at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>
>>>>
>>>> _______________________________________________
>>>> Annotation mailing list
>>>> Annotation at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>
>>>
>>>
>>>
>>>
>>
>>
>> -- 
>> --------------------------------------------------------------------------- 
>>
>> Valerie Wood             Tel: 01223 496909
>> S. pombe Genome Project         Fax: 01223 494919 Wellcome Trust 
>> Sanger Institute     email: val at sanger.ac.uk
>> Wellcome Trust Genome Campus     http://www.genedb.org/genedb/pombe 
>> Hinxton, Cambridge, CB10 1HH     
>> http://www.sanger.ac.uk/Projects/S_pombe
>>
>>
>>
>> -- 
>> The Wellcome Trust Sanger Institute is operated by Genome Research 
>> Limited, a charity registered in England with number 1021457 and a 
>> company registered in England with number 2742969, whose registered 
>> office is 215 Euston Road, London, NW1 2BE. 
>> _______________________________________________
>> Annotation mailing list
>> Annotation at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>




More information about the Annotation mailing list