[Annotation] phenotype or GO-still struggling

Karen Christie kchris at genome.stanford.edu
Tue Jul 15 12:18:22 PDT 2008


There's a big difference between saying some GO annotations with IMP 
evidence are valid and that all IMP evidence should be annotated in GO.

-Karen

On Tue, 15 Jul 2008, Judith Blake wrote:

> Frankly I don't understand why we are having this discussion when we agree 
> that IMPs are valid annotations.  Of course one might re-evaluate IMPs with 
> new data.  I don't understand why this is big news.
>
> Judy
>
> Karen Christie wrote:
>> Hi,
>> 
>> I completely agree with what Val said, including that in both Judy's and 
>> Harold's examples, I agree that it is appropriate to make annotations based 
>> on current knowledge.
>> 
>> I also sometimes come across genes where there is evidence that 
>> shows/suggests that it has multiple roles. Most recently REX2, which is 
>> involved in 3'-end processing of various different nuclear/nucleolar RNAs, 
>> and is also localized to the mitochondrion where its role is not clear but 
>> it interacts genetically with TRZ1, the tRNA 3'-end processing 
>> endonuclease. In cases like these, we make all the annotations that are 
>> supported by experimental evidence, even those that may be surprising.
>> 
>> However, much more frequently, I come across cases where the original idea 
>> about what a gene did, based solely on its mutant phenotype, is later shown 
>> to be due to a downstream effect of the mutation or to an artifact of the 
>> experimental system. In these types of cases, we choose not to represent 
>> these mutant phenotypes as GO annotations. Some examples of known cases for 
>> cerevisiae are below.
>> 
>> So, as more is known about a given gene, it seems it is often appropriate 
>> to reevaluate whether old annotations based on IMP are still valid. 
>> Sometimes they are and other times they really seem inappropriate in light 
>> of new knowledge. For multicellular organisms, many of these developmental 
>> mutants may well turn out to be genes specifically involved in that 
>> developmental process. I went to a talk a couple months ago that said that 
>> mammals have 100-1000 fold more specific regulatory transcription factors 
>> than yeast. A mutant in one of these might be quite informative as to which 
>> processes it regulates.
>> 
>> However, there will surely also be cases where there is something else is 
>> occurring. For example, a human disease called SCID (Severe Combined Immune 
>> Deficiency) is caused by deficiency of the enzyme adenosine deaminase 
>> (ADA). However, I'm not sure one would want to say that ADA is involved in 
>> immune cell development; it is generally active through the body. Rather, 
>> when ADA is defective, a toxic intermediate builds up and immune and other 
>> rapidly dividing cells are most sensitive. As the specific effect of ADA 
>> mutantions on immune cells is a pathology, rather than a normal process, it 
>> seems outwith the scope of GO to annotate the immune cell effect of ADA 
>> mutants.
>> 
>> SGD has also started having large sets of high-throughput mutant phenotypes 
>> data. We have found that many of these screens identify large sets of genes 
>> with a given phenotype. However, based on the knowledge of what many of 
>> these genes do, we have become rather leary of making GO annotations 
>> wholesale from these large mutant phenotype studies because the mutant 
>> phenotype doesn't seem to be a very specific indicator of the process the 
>> gene is involved in. We're seeing a lot of these now. We've basically 
>> decided that though we are quite happy to put these into our phenotype 
>> curation wholesale, we are not comfortable in making GO annotations based 
>> on these large scale phenotype screens.
>> 
>> -Karen
>> 
>> Some specific examples for cerevisiae:
>> 
>> 1. cell cycle arrest phenotypes: people looked for things with cell
>> division cycle (cdc) arrest phenotypes in order to find cell cycle
>> regulators. Some cdc mutants actually are cell cycle regulators. However,
>> the collection of cdc mutants also includes:
>> - tRNAs
>> - tRNA synthetases
>> - an Hsp90 co-chaperone
>> - general transcription regulators, e.g. components of the Paf1
>>         transcription regulatory complex, members of the CCR4/NOT complex
>> - things involved in response to mating pheromone (which do cause G1
>>         arrest, but which are not thought to be cell cycle regulators)
>> - eIF4E cytoplasmic mRNA cap binding protein required for translation
>> 
>> Inhibition of ribosome synthesis can also produce cell cycle arrest. A
>> couple different U3 snoRNA associated complexes involved in the first
>> stages of rRNA processing and small ribosomal subunit assembly have
>> recently been characterized in yeast. U3 and many, if not most, of the
>> proteins are conserved. Depeleting for most of the individual protein
>> components of these complexes produces cell cycle arrest.
>> 
>> So, while SGD would be quite happy to have a cell cycle arrest -phenotype-
>> annotated for every gene, we don't really want to go on and make a GO
>> process annotation to cell cycle for many of these genes.
>> 
>> 2. splicing vs translation - A lot of things that turn out to be involved 
>> in splicing of nuclear mRNAs were originally characterized as being 
>> involved in translation. This turns out to be due to the unusual 
>> distribution of introns in S. cerevisiae. Only about 270 genes, out of 
>> 6000, contain introns, and these are predominantly found in protein coding 
>> genes. Thus splicing defects have a very immediate effect on translation 
>> due to loss of production of ribosomal proteins.
>> 
>> In light of the knowledge of why splicing mutants cause translation 
>> defects, we don't want to make GO process annotations to 
>> translation-related terms for splicing genes even if they do produce a 
>> translation-specific phenotype.
>> 
>> 3. AAR2 - This gene was originally thought to be specifically required for 
>> splicing of MATa1 mRNA because mutant extracts appeared specifically 
>> defective in splicing this mRNA. It turns out to be due to the fact that 
>> MATa1 has 2 introns, while almost all other genes only have 1, which meant 
>> that the assay system ran out of splicing components when MATa1 was used, 
>> but not when any of the other test pre-mRNAs were used. There was actually 
>> a specific GO term (GO:0006377 - MATa1 (A1) pre-mRNA splicing) based on the 
>> original mutant characterization of this gene.
>> 
>> It turns out that Aar2 is actually part of general splicing factor U5
>> snRNP, and thus required for splicing generally. GO:0006377 was obsoleted
>> because a MATa1-specific splicing process does not occur.
>> 
>> 
>> 
>> On Fri, 11 Jul 2008, Valerie Wood wrote:
>> 
>>> 
>>> Hi Judy/ Harold,
>>> 
>>> In both of these examples (your heart development in the power point, and 
>>> Harolds ribosomal example), we would make these  annotations using current 
>>> practices (so I don't think we are being inconsistent here). I have a 
>>> similar example to Harolds where a subunit of  RNA polymerase II plays a 
>>> specialized role in cell separation. This is what the data shows and this 
>>> is fine.
>>> 
>>> What Karen and I are saying is that not EVERY annotation which can be made 
>>> from a phenotype deserves a process annotation in the context of all of 
>>> the available information.
>>> 
>>> Some processes which initially appear to be due to a particular phenotype 
>>> turn out to be downstream effects based on subsequent information. We feel 
>>> in these cases, where the effect is *known* to be *indirect* effect of an 
>>> upstream process, then the process annotation based on this phenotype 
>>> should be removed. It seems increasingly that it is not helpful for our 
>>> communities using GO to make every annotation for the phenotype, if they 
>>> are subsequently shown to be a result of an upstream process. This is the 
>>> feedback I have got from my community, and makes more sense of global 
>>> analysis.
>>> 
>>> Sometimes the observations initially attributed to cell division defects 
>>> are actually known to be due to defects in DNA repair or replication 
>>> because replication is late and cytokinesis  too early cell division is 
>>> compromised. There are many more dependencies on rRNA processing and 
>>> translation.
>>> 
>>> If it is NOT clear (reported) that the phenotype is due to the upstream 
>>> process, then the IMP process from phenotype would still be valid.This 
>>> shows a different level of knowledge which can be captured by a curator 
>>> when more information is available. The phenotypes in these cases are 
>>> still captured as appropriate.
>>> 
>>> Probably we have more cases like this because yeast are better studied, 
>>> and there are many dependencies in cell biology. SGD may have some better 
>>> examples as they have more legacy data.
>>> 
>>> Val
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Judith Blake wrote:
>>>> Hi,
>>>> I sent a response with ppt and it's waiting to be moderated
>>>> 
>>>> J
>>>> 
>>>> Harold Drabkin wrote:
>>>>> 
>>>>> On the other hand, we have to be careful about applying what we think we 
>>>>> know to ignore what a  mutant phenotypes is telling you, because things 
>>>>> can be complicated. .I just finished looking at one of the ribosomal 
>>>>> proteins, Rpl10. There is  very little mouse data, but from skimming 
>>>>> some other references (human), it appears to be originally identified in 
>>>>> a screen for tumor suppressors. It is unclear why. It appears to be a 
>>>>> protein that associates with the large subunit after the subunit is 
>>>>> exported from the nucleus.  However, there is some reference to it's 
>>>>> release from the 60S ribosomal subunit as a mechanism of 
>>>>> transcript-specific translational control. This might have been 
>>>>> reflected in the search for tumor suppressors. Yet another paper 
>>>>> describes it is a zinc-binding transcription regulatory protein: which 
>>>>> can bind to c-Jun i ( this binding is dependent upon zinc ions and 
>>>>> phosphorylation by protein kinase C ). Haven't looked at those papers in 
>>>>> detail;   But there is something interesting going on (no one has done a 
>>>>> KO in mouse that I can find which might tell us a bit more), and I'm not 
>>>>> at all sure one should rule out that it participates in other processes 
>>>>> other than the one obvious from it's name. Just grist for the mill.
>>>>> 
>>>>> h
>>>>> 
>>>>> 
>>>>> Valerie Wood wrote:
>>>>>> I agree completely with Karen/SGD and this has been the procedure I 
>>>>>> have always followed.
>>>>>> In the absence of any other information, a mutant phenotype is 
>>>>>> frequently used to infer a specific process.  Once more information is 
>>>>>> available it often becomes clear that this is a downstream (indirect 
>>>>>> affect).
>>>>>> For example defects in ribosome biogenesis and translation  and general 
>>>>>> translation will often have plieotrophic affects which are indirect, as 
>>>>>> it will affect nearly every process downstream (for example there are 
>>>>>> associated downstream effects in chromosome segregation, cell division, 
>>>>>> and in multicellular organisms,  multiple developmental processes). 
>>>>>> This does not mean that a biologist would expect to see the annotations 
>>>>>> to these processes once the upstream process is known. If we did follow 
>>>>>> this logic, then we would find that all genes involved in translation, 
>>>>>> ribosome biogenesis and general replication would eventually become 
>>>>>> annotated to most other processes.
>>>>>> 
>>>>>> Another classic example from yeast is vacuolar targeting. Many mutants 
>>>>>> result in defects which result in  proteins usually localized to the 
>>>>>> vacuole becoming mislocalised and were initially interpreted as a 
>>>>>> defect in protein targeting. It has since become clear that many of 
>>>>>> these defects are very far upstream of the vacuolar targeting pathway, 
>>>>>> and this is just a downstream consequence of things being mis folded, 
>>>>>> mis transcribed etc. Subsequently these annotations have gradually been 
>>>>>> removed as better information has become available.
>>>>>> 
>>>>>> On the other hand, mutations in a gene may have phenotypic effects 
>>>>>> which you DO want to capture as processes (for example the effects of 
>>>>>> phenylalanine hydroxylase on skin pigmentation etc). However you would 
>>>>>> not necessarily want to curate the effect of a gene involved in all 
>>>>>> translation initation in a developmental process from a high throughput 
>>>>>> screen (once better information was avaiable). In Doug's example I 
>>>>>> would also follow Karen's suggestion and make the annotation if this is 
>>>>>> possibly specific transcription for the pathway (i.e specific to a 
>>>>>> subset of genes), but if the defect is definately general transcription 
>>>>>> I would not make the annotation.
>>>>>> 
>>>>>> Not caputuring EVERY phenotype using biological process should not be 
>>>>>> considered underannotation. The purpose of GO process annotations is to 
>>>>>> capture processes not phenotypes. Sometimes phenotypes are direct 
>>>>>> indicators of the process a gene is involved in sometimes they are not.
>>>>>> A major consequence of making these ubiquitous annotations is that can 
>>>>>> distort   genome wide analysis (not improve it), and  this is often the 
>>>>>> case when annotations  come from high throughput screens and early 
>>>>>> experiments. Over the past couple of years cerevisiae and pombe have 
>>>>>> done a lot of 'tidying' of these legacy annotations, and the 
>>>>>> genome-wide GO data is much improved and useful as a result.
>>>>>> 
>>>>>> This is also why annotations  to orthologs made using ISS should only 
>>>>>> be made by a curator on a gene by gene basis and not by an automated 
>>>>>> process. A curator is able to assess all of the available information 
>>>>>> to make an ISS annotation (from different organisms) and distinguish 
>>>>>> between current annotations and legacy annotations.
>>>>>> 
>>>>>> One way to distinguish these is whether the targets are generic (i.e 
>>>>>> every gene ) or specific (a subset of genes). If the genes targets are 
>>>>>> a subset of genes then the annotations is  probably valid.
>>>>>> 
>>>>>> Val
>>>>>> 
>>>>>> Karen Christie <kchris at genome.stanford.edu> wrote:
>>>>>>> I don't think the GOC has ever had a policy, or even a recommendation, 
>>>>>>> that process annotations should be made from all mutant phenotypes, 
>>>>>>> nor do I think that it should.
>>>>>>> 
>>>>>>> For example, SGD is currently working on annotating phenotypes for 
>>>>>>> Cell Division Cycle (CDC) mutants, i.e. mutations which cause a cell 
>>>>>>> cycle arrest phenotype. Here are some of the ones I worked on 
>>>>>>> yesterday:
>>>>>>>
>>>>>>>     CDC60   leucyl tRNA synthetase
>>>>>>>     PRT1    Subunit of eIF3
>>>>>>>     ALA1    alanyl-tRNA synthetase
>>>>>>>     CDC65   mitochondrial tRNA-Glu
>>>>>>>     SPT16   Subunit FACT transcription elongation complex
>>>>>>> 
>>>>>>> I don't think that anyone in the yeast community would expect or want 
>>>>>>> to see any of these genes annotated to a GO process related to the 
>>>>>>> cell cycle. There are lots of examples of where a mutant phenotype is 
>>>>>>> due to some downstream effect and not due to the primary defect.
>>>>>>> 
>>>>>>> So, at SGD, we try to focus on the primary process. Obviously, we 
>>>>>>> don't always know, but once we do, we like to avoid making GO 
>>>>>>> annotations for processes that are known to be downstream, rather than 
>>>>>>> direct, results of the mutation.
>>>>>>> 
>>>>>>> For Doug's specific example, if comparative data suggested that the 
>>>>>>> gene was a specific regulatory transcription factor, I'd probably be 
>>>>>>> inclined to go ahead and make specific process annotations. However, 
>>>>>>> if comparative data suggested that it was related to a Pol II general 
>>>>>>> transcription factor, I might not want to make a GO process annotation 
>>>>>>> to such a specific process.
>>>>>>> 
>>>>>>> At all of the Annotation Camps, we've always said that one should be 
>>>>>>> careful when making annotations from mutant phenotypes. At both of the 
>>>>>>> public ones, the question has come up of how much to annotate from 
>>>>>>> mutant phenotypes. The answer we've given has been that if one only 
>>>>>>> has a mutant phenotype to annotated from, then make the best 
>>>>>>> annotations you can. However, be aware that as you learn more, you may 
>>>>>>> find that some of the mutant phenotypes are indirect results rather 
>>>>>>> than something the gene product is directly involved in, and that in 
>>>>>>> these cases you may choose to remove process annotations based on 
>>>>>>> these phenotypes.
>>>>>>> 
>>>>>>> I think this is still good advice, that curator judgement should play 
>>>>>>> a role in deciding whether a GO process annotation is merited from any 
>>>>>>> particular mutant phenotype.
>>>>>>> 
>>>>>>> -Karen
>>>>>>> 
>>>>>>> 
>>>>>>> On Sun, 6 Jul 2008, Judith Blake wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> I can understand the duplication of effort, but since the GO and 
>>>>>>>> phenotype annotations aren't co-mingled in GOdb, the SGD genes would 
>>>>>>>> I think appear under-annotated if the effect of the gene on phenotype 
>>>>>>>> is not curated in BP. For comparative genomics studies using GO, this 
>>>>>>>> would be missing, yet available in the literature, information.
>>>>>>>> 
>>>>>>>> for mouse, the phenotype data is effectively 'disfunction' data, so 
>>>>>>>> the phenotype annotation reflects a different view from the GO 
>>>>>>>> annotation.
>>>>>>>> 
>>>>>>>> Judy
>>>>>>>> 
>>>>>>>> Julie Park wrote:
>>>>>>>> 
>>>>>>>>> Hi Doug,
>>>>>>>>> 
>>>>>>>>> SGD's practice on this is that if it is known that what is being 
>>>>>>>>> observed is a secondary/downstream effect, then we only capture it 
>>>>>>>>> via phenotypes and not as a GO process.  However, if the gene 
>>>>>>>>> product in question is not well characterized or there is a conflict 
>>>>>>>>> in the literature about whether it is a direct or indirect 
>>>>>>>>> involvement then we would give it a GO annotation.
>>>>>>>>> 
>>>>>>>>> We've made a decision to use GO to try and capture the primary role 
>>>>>>>>> of a gene product as much as possible and to reduce the duplication 
>>>>>>>>> of effort required to capture data both in GO and as phenotypes.
>>>>>>>>> 
>>>>>>>>> Just our take on things.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> -Julie
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Jul 3, 2008, at 3:16 PM, Doug howe wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> Hi David,
>>>>>>>>>> It still seems like there is a line that has to be drawn somewhere.
>>>>>>>>>> We've talked in the past about the scope of a process...when does 
>>>>>>>>>> it
>>>>>>>>>> start and when does it end?  A gene that has as it's primary role
>>>>>>>>>> regulation of transcription (perhaps binds DNA etc. etc.) may have 
>>>>>>>>>> a
>>>>>>>>>> secondary effect upon eye morphogenesis.  However, the process of 
>>>>>>>>>> eye
>>>>>>>>>> morphogenesis does not start with the binding of such a gene to a
>>>>>>>>>> regulatory sequence...it is a downstream consequence....and perhaps 
>>>>>>>>>> it
>>>>>>>>>> is the gene who's expression is being regulated that is really 
>>>>>>>>>> involved
>>>>>>>>>> in the downstream process.  It seems like there is a significant 
>>>>>>>>>> amount
>>>>>>>>>> of redundant curation work to do if we always annotate both GO and
>>>>>>>>>> phenotype using the same GO process terms.  I'm not strongly 
>>>>>>>>>> opposed to
>>>>>>>>>> such annotations, I just want to revisit the discussion and see if
>>>>>>>>>> anyone has other views on the issue.
>>>>>>>>>> -Doug
>>>>>>>>>> 
>>>>>>>>>> David Hill wrote:
>>>>>>>>>> 
>>>>>>>>>>> Doug,
>>>>>>>>>>> 
>>>>>>>>>>> I do this all the time. I just finished systematically doing all 
>>>>>>>>>>> the homeobox genes in mouse. Many of them are annotated to things 
>>>>>>>>>>> like pattern specification. I think in the future, it will be very 
>>>>>>>>>>> nice to know these are playing roles in regulating transcription 
>>>>>>>>>>> but that regulation is fundamental in other processes as well.
>>>>>>>>>>> 
>>>>>>>>>>> David
>>>>>>>>>>> 
>>>>>>>>>>> Doug howe wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I'm still struggling with the issue of whether to make a GO 
>>>>>>>>>>>> annotation (processes in particular) or only phenotype 
>>>>>>>>>>>> annotation. The zebrafish literature is replete with mutant 
>>>>>>>>>>>> papers that often describe phenotypes involving eyes, otic 
>>>>>>>>>>>> vesicles, or pharyngeal arches, organ development etc.   Often, 
>>>>>>>>>>>> the IEA annotations for a gene seems to indicate that the gene is 
>>>>>>>>>>>> binding DNA, and may be some sort of transcriptional regulator. 
>>>>>>>>>>>> Should such a gene be annotated with GO terms like 'otic vesicle 
>>>>>>>>>>>> development', or 'eye morphogenesis', or should that be left for 
>>>>>>>>>>>> phenotype annotations?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Doug Howe, Ph.D.
>>>>>>>>>> ZFIN Scientific Curator
>>>>>>>>>> Zebrafish Nomenclature Coordinator
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Annotation mailing list
>>>>>>>>>> Annotation at geneontology.org
>>>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> Annotation mailing list
>>>>>>>>> Annotation at geneontology.org
>>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Annotation mailing list
>>>>>>>> Annotation at geneontology.org
>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>> 
>>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Annotation mailing list
>>>>>>> Annotation at geneontology.org
>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>
>>>>>> 
>>>>>> ------------------------------------------------------------------------ 
>>>>>> _______________________________________________
>>>>>> Annotation mailing list
>>>>>> Annotation at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Annotation mailing list
>>>>> Annotation at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> --------------------------------------------------------------------------- 
>>> Valerie Wood             Tel: 01223 496909
>>> S. pombe Genome Project         Fax: 01223 494919 Wellcome Trust Sanger 
>>> Institute     email: val at sanger.ac.uk
>>> Wellcome Trust Genome Campus     http://www.genedb.org/genedb/pombe 
>>> Hinxton, Cambridge, CB10 1HH     http://www.sanger.ac.uk/Projects/S_pombe
>>> 
>>> 
>>> 
>>> -- 
>>> The Wellcome Trust Sanger Institute is operated by Genome Research 
>>> Limited, a charity registered in England with number 1021457 and a company 
>>> registered in England with number 2742969, whose registered office is 215 
>>> Euston Road, London, NW1 2BE. 
>>> _______________________________________________
>>> Annotation mailing list
>>> Annotation at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>> 
>
>


More information about the Annotation mailing list