[Annotation] phenotype or GO-still struggling
Karen Christie
kchris at genome.stanford.edu
Tue Jul 15 12:18:22 PDT 2008
There's a big difference between saying some GO annotations with IMP
evidence are valid and that all IMP evidence should be annotated in GO.
-Karen
On Tue, 15 Jul 2008, Judith Blake wrote:
> Frankly I don't understand why we are having this discussion when we agree
> that IMPs are valid annotations. Of course one might re-evaluate IMPs with
> new data. I don't understand why this is big news.
>
> Judy
>
> Karen Christie wrote:
>> Hi,
>>
>> I completely agree with what Val said, including that in both Judy's and
>> Harold's examples, I agree that it is appropriate to make annotations based
>> on current knowledge.
>>
>> I also sometimes come across genes where there is evidence that
>> shows/suggests that it has multiple roles. Most recently REX2, which is
>> involved in 3'-end processing of various different nuclear/nucleolar RNAs,
>> and is also localized to the mitochondrion where its role is not clear but
>> it interacts genetically with TRZ1, the tRNA 3'-end processing
>> endonuclease. In cases like these, we make all the annotations that are
>> supported by experimental evidence, even those that may be surprising.
>>
>> However, much more frequently, I come across cases where the original idea
>> about what a gene did, based solely on its mutant phenotype, is later shown
>> to be due to a downstream effect of the mutation or to an artifact of the
>> experimental system. In these types of cases, we choose not to represent
>> these mutant phenotypes as GO annotations. Some examples of known cases for
>> cerevisiae are below.
>>
>> So, as more is known about a given gene, it seems it is often appropriate
>> to reevaluate whether old annotations based on IMP are still valid.
>> Sometimes they are and other times they really seem inappropriate in light
>> of new knowledge. For multicellular organisms, many of these developmental
>> mutants may well turn out to be genes specifically involved in that
>> developmental process. I went to a talk a couple months ago that said that
>> mammals have 100-1000 fold more specific regulatory transcription factors
>> than yeast. A mutant in one of these might be quite informative as to which
>> processes it regulates.
>>
>> However, there will surely also be cases where there is something else is
>> occurring. For example, a human disease called SCID (Severe Combined Immune
>> Deficiency) is caused by deficiency of the enzyme adenosine deaminase
>> (ADA). However, I'm not sure one would want to say that ADA is involved in
>> immune cell development; it is generally active through the body. Rather,
>> when ADA is defective, a toxic intermediate builds up and immune and other
>> rapidly dividing cells are most sensitive. As the specific effect of ADA
>> mutantions on immune cells is a pathology, rather than a normal process, it
>> seems outwith the scope of GO to annotate the immune cell effect of ADA
>> mutants.
>>
>> SGD has also started having large sets of high-throughput mutant phenotypes
>> data. We have found that many of these screens identify large sets of genes
>> with a given phenotype. However, based on the knowledge of what many of
>> these genes do, we have become rather leary of making GO annotations
>> wholesale from these large mutant phenotype studies because the mutant
>> phenotype doesn't seem to be a very specific indicator of the process the
>> gene is involved in. We're seeing a lot of these now. We've basically
>> decided that though we are quite happy to put these into our phenotype
>> curation wholesale, we are not comfortable in making GO annotations based
>> on these large scale phenotype screens.
>>
>> -Karen
>>
>> Some specific examples for cerevisiae:
>>
>> 1. cell cycle arrest phenotypes: people looked for things with cell
>> division cycle (cdc) arrest phenotypes in order to find cell cycle
>> regulators. Some cdc mutants actually are cell cycle regulators. However,
>> the collection of cdc mutants also includes:
>> - tRNAs
>> - tRNA synthetases
>> - an Hsp90 co-chaperone
>> - general transcription regulators, e.g. components of the Paf1
>> transcription regulatory complex, members of the CCR4/NOT complex
>> - things involved in response to mating pheromone (which do cause G1
>> arrest, but which are not thought to be cell cycle regulators)
>> - eIF4E cytoplasmic mRNA cap binding protein required for translation
>>
>> Inhibition of ribosome synthesis can also produce cell cycle arrest. A
>> couple different U3 snoRNA associated complexes involved in the first
>> stages of rRNA processing and small ribosomal subunit assembly have
>> recently been characterized in yeast. U3 and many, if not most, of the
>> proteins are conserved. Depeleting for most of the individual protein
>> components of these complexes produces cell cycle arrest.
>>
>> So, while SGD would be quite happy to have a cell cycle arrest -phenotype-
>> annotated for every gene, we don't really want to go on and make a GO
>> process annotation to cell cycle for many of these genes.
>>
>> 2. splicing vs translation - A lot of things that turn out to be involved
>> in splicing of nuclear mRNAs were originally characterized as being
>> involved in translation. This turns out to be due to the unusual
>> distribution of introns in S. cerevisiae. Only about 270 genes, out of
>> 6000, contain introns, and these are predominantly found in protein coding
>> genes. Thus splicing defects have a very immediate effect on translation
>> due to loss of production of ribosomal proteins.
>>
>> In light of the knowledge of why splicing mutants cause translation
>> defects, we don't want to make GO process annotations to
>> translation-related terms for splicing genes even if they do produce a
>> translation-specific phenotype.
>>
>> 3. AAR2 - This gene was originally thought to be specifically required for
>> splicing of MATa1 mRNA because mutant extracts appeared specifically
>> defective in splicing this mRNA. It turns out to be due to the fact that
>> MATa1 has 2 introns, while almost all other genes only have 1, which meant
>> that the assay system ran out of splicing components when MATa1 was used,
>> but not when any of the other test pre-mRNAs were used. There was actually
>> a specific GO term (GO:0006377 - MATa1 (A1) pre-mRNA splicing) based on the
>> original mutant characterization of this gene.
>>
>> It turns out that Aar2 is actually part of general splicing factor U5
>> snRNP, and thus required for splicing generally. GO:0006377 was obsoleted
>> because a MATa1-specific splicing process does not occur.
>>
>>
>>
>> On Fri, 11 Jul 2008, Valerie Wood wrote:
>>
>>>
>>> Hi Judy/ Harold,
>>>
>>> In both of these examples (your heart development in the power point, and
>>> Harolds ribosomal example), we would make these annotations using current
>>> practices (so I don't think we are being inconsistent here). I have a
>>> similar example to Harolds where a subunit of RNA polymerase II plays a
>>> specialized role in cell separation. This is what the data shows and this
>>> is fine.
>>>
>>> What Karen and I are saying is that not EVERY annotation which can be made
>>> from a phenotype deserves a process annotation in the context of all of
>>> the available information.
>>>
>>> Some processes which initially appear to be due to a particular phenotype
>>> turn out to be downstream effects based on subsequent information. We feel
>>> in these cases, where the effect is *known* to be *indirect* effect of an
>>> upstream process, then the process annotation based on this phenotype
>>> should be removed. It seems increasingly that it is not helpful for our
>>> communities using GO to make every annotation for the phenotype, if they
>>> are subsequently shown to be a result of an upstream process. This is the
>>> feedback I have got from my community, and makes more sense of global
>>> analysis.
>>>
>>> Sometimes the observations initially attributed to cell division defects
>>> are actually known to be due to defects in DNA repair or replication
>>> because replication is late and cytokinesis too early cell division is
>>> compromised. There are many more dependencies on rRNA processing and
>>> translation.
>>>
>>> If it is NOT clear (reported) that the phenotype is due to the upstream
>>> process, then the IMP process from phenotype would still be valid.This
>>> shows a different level of knowledge which can be captured by a curator
>>> when more information is available. The phenotypes in these cases are
>>> still captured as appropriate.
>>>
>>> Probably we have more cases like this because yeast are better studied,
>>> and there are many dependencies in cell biology. SGD may have some better
>>> examples as they have more legacy data.
>>>
>>> Val
>>>
>>>
>>>
>>>
>>>
>>> Judith Blake wrote:
>>>> Hi,
>>>> I sent a response with ppt and it's waiting to be moderated
>>>>
>>>> J
>>>>
>>>> Harold Drabkin wrote:
>>>>>
>>>>> On the other hand, we have to be careful about applying what we think we
>>>>> know to ignore what a mutant phenotypes is telling you, because things
>>>>> can be complicated. .I just finished looking at one of the ribosomal
>>>>> proteins, Rpl10. There is very little mouse data, but from skimming
>>>>> some other references (human), it appears to be originally identified in
>>>>> a screen for tumor suppressors. It is unclear why. It appears to be a
>>>>> protein that associates with the large subunit after the subunit is
>>>>> exported from the nucleus. However, there is some reference to it's
>>>>> release from the 60S ribosomal subunit as a mechanism of
>>>>> transcript-specific translational control. This might have been
>>>>> reflected in the search for tumor suppressors. Yet another paper
>>>>> describes it is a zinc-binding transcription regulatory protein: which
>>>>> can bind to c-Jun i ( this binding is dependent upon zinc ions and
>>>>> phosphorylation by protein kinase C ). Haven't looked at those papers in
>>>>> detail; But there is something interesting going on (no one has done a
>>>>> KO in mouse that I can find which might tell us a bit more), and I'm not
>>>>> at all sure one should rule out that it participates in other processes
>>>>> other than the one obvious from it's name. Just grist for the mill.
>>>>>
>>>>> h
>>>>>
>>>>>
>>>>> Valerie Wood wrote:
>>>>>> I agree completely with Karen/SGD and this has been the procedure I
>>>>>> have always followed.
>>>>>> In the absence of any other information, a mutant phenotype is
>>>>>> frequently used to infer a specific process. Once more information is
>>>>>> available it often becomes clear that this is a downstream (indirect
>>>>>> affect).
>>>>>> For example defects in ribosome biogenesis and translation and general
>>>>>> translation will often have plieotrophic affects which are indirect, as
>>>>>> it will affect nearly every process downstream (for example there are
>>>>>> associated downstream effects in chromosome segregation, cell division,
>>>>>> and in multicellular organisms, multiple developmental processes).
>>>>>> This does not mean that a biologist would expect to see the annotations
>>>>>> to these processes once the upstream process is known. If we did follow
>>>>>> this logic, then we would find that all genes involved in translation,
>>>>>> ribosome biogenesis and general replication would eventually become
>>>>>> annotated to most other processes.
>>>>>>
>>>>>> Another classic example from yeast is vacuolar targeting. Many mutants
>>>>>> result in defects which result in proteins usually localized to the
>>>>>> vacuole becoming mislocalised and were initially interpreted as a
>>>>>> defect in protein targeting. It has since become clear that many of
>>>>>> these defects are very far upstream of the vacuolar targeting pathway,
>>>>>> and this is just a downstream consequence of things being mis folded,
>>>>>> mis transcribed etc. Subsequently these annotations have gradually been
>>>>>> removed as better information has become available.
>>>>>>
>>>>>> On the other hand, mutations in a gene may have phenotypic effects
>>>>>> which you DO want to capture as processes (for example the effects of
>>>>>> phenylalanine hydroxylase on skin pigmentation etc). However you would
>>>>>> not necessarily want to curate the effect of a gene involved in all
>>>>>> translation initation in a developmental process from a high throughput
>>>>>> screen (once better information was avaiable). In Doug's example I
>>>>>> would also follow Karen's suggestion and make the annotation if this is
>>>>>> possibly specific transcription for the pathway (i.e specific to a
>>>>>> subset of genes), but if the defect is definately general transcription
>>>>>> I would not make the annotation.
>>>>>>
>>>>>> Not caputuring EVERY phenotype using biological process should not be
>>>>>> considered underannotation. The purpose of GO process annotations is to
>>>>>> capture processes not phenotypes. Sometimes phenotypes are direct
>>>>>> indicators of the process a gene is involved in sometimes they are not.
>>>>>> A major consequence of making these ubiquitous annotations is that can
>>>>>> distort genome wide analysis (not improve it), and this is often the
>>>>>> case when annotations come from high throughput screens and early
>>>>>> experiments. Over the past couple of years cerevisiae and pombe have
>>>>>> done a lot of 'tidying' of these legacy annotations, and the
>>>>>> genome-wide GO data is much improved and useful as a result.
>>>>>>
>>>>>> This is also why annotations to orthologs made using ISS should only
>>>>>> be made by a curator on a gene by gene basis and not by an automated
>>>>>> process. A curator is able to assess all of the available information
>>>>>> to make an ISS annotation (from different organisms) and distinguish
>>>>>> between current annotations and legacy annotations.
>>>>>>
>>>>>> One way to distinguish these is whether the targets are generic (i.e
>>>>>> every gene ) or specific (a subset of genes). If the genes targets are
>>>>>> a subset of genes then the annotations is probably valid.
>>>>>>
>>>>>> Val
>>>>>>
>>>>>> Karen Christie <kchris at genome.stanford.edu> wrote:
>>>>>>> I don't think the GOC has ever had a policy, or even a recommendation,
>>>>>>> that process annotations should be made from all mutant phenotypes,
>>>>>>> nor do I think that it should.
>>>>>>>
>>>>>>> For example, SGD is currently working on annotating phenotypes for
>>>>>>> Cell Division Cycle (CDC) mutants, i.e. mutations which cause a cell
>>>>>>> cycle arrest phenotype. Here are some of the ones I worked on
>>>>>>> yesterday:
>>>>>>>
>>>>>>> CDC60 leucyl tRNA synthetase
>>>>>>> PRT1 Subunit of eIF3
>>>>>>> ALA1 alanyl-tRNA synthetase
>>>>>>> CDC65 mitochondrial tRNA-Glu
>>>>>>> SPT16 Subunit FACT transcription elongation complex
>>>>>>>
>>>>>>> I don't think that anyone in the yeast community would expect or want
>>>>>>> to see any of these genes annotated to a GO process related to the
>>>>>>> cell cycle. There are lots of examples of where a mutant phenotype is
>>>>>>> due to some downstream effect and not due to the primary defect.
>>>>>>>
>>>>>>> So, at SGD, we try to focus on the primary process. Obviously, we
>>>>>>> don't always know, but once we do, we like to avoid making GO
>>>>>>> annotations for processes that are known to be downstream, rather than
>>>>>>> direct, results of the mutation.
>>>>>>>
>>>>>>> For Doug's specific example, if comparative data suggested that the
>>>>>>> gene was a specific regulatory transcription factor, I'd probably be
>>>>>>> inclined to go ahead and make specific process annotations. However,
>>>>>>> if comparative data suggested that it was related to a Pol II general
>>>>>>> transcription factor, I might not want to make a GO process annotation
>>>>>>> to such a specific process.
>>>>>>>
>>>>>>> At all of the Annotation Camps, we've always said that one should be
>>>>>>> careful when making annotations from mutant phenotypes. At both of the
>>>>>>> public ones, the question has come up of how much to annotate from
>>>>>>> mutant phenotypes. The answer we've given has been that if one only
>>>>>>> has a mutant phenotype to annotated from, then make the best
>>>>>>> annotations you can. However, be aware that as you learn more, you may
>>>>>>> find that some of the mutant phenotypes are indirect results rather
>>>>>>> than something the gene product is directly involved in, and that in
>>>>>>> these cases you may choose to remove process annotations based on
>>>>>>> these phenotypes.
>>>>>>>
>>>>>>> I think this is still good advice, that curator judgement should play
>>>>>>> a role in deciding whether a GO process annotation is merited from any
>>>>>>> particular mutant phenotype.
>>>>>>>
>>>>>>> -Karen
>>>>>>>
>>>>>>>
>>>>>>> On Sun, 6 Jul 2008, Judith Blake wrote:
>>>>>>>
>>>>>>>
>>>>>>>> I can understand the duplication of effort, but since the GO and
>>>>>>>> phenotype annotations aren't co-mingled in GOdb, the SGD genes would
>>>>>>>> I think appear under-annotated if the effect of the gene on phenotype
>>>>>>>> is not curated in BP. For comparative genomics studies using GO, this
>>>>>>>> would be missing, yet available in the literature, information.
>>>>>>>>
>>>>>>>> for mouse, the phenotype data is effectively 'disfunction' data, so
>>>>>>>> the phenotype annotation reflects a different view from the GO
>>>>>>>> annotation.
>>>>>>>>
>>>>>>>> Judy
>>>>>>>>
>>>>>>>> Julie Park wrote:
>>>>>>>>
>>>>>>>>> Hi Doug,
>>>>>>>>>
>>>>>>>>> SGD's practice on this is that if it is known that what is being
>>>>>>>>> observed is a secondary/downstream effect, then we only capture it
>>>>>>>>> via phenotypes and not as a GO process. However, if the gene
>>>>>>>>> product in question is not well characterized or there is a conflict
>>>>>>>>> in the literature about whether it is a direct or indirect
>>>>>>>>> involvement then we would give it a GO annotation.
>>>>>>>>>
>>>>>>>>> We've made a decision to use GO to try and capture the primary role
>>>>>>>>> of a gene product as much as possible and to reduce the duplication
>>>>>>>>> of effort required to capture data both in GO and as phenotypes.
>>>>>>>>>
>>>>>>>>> Just our take on things.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> -Julie
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jul 3, 2008, at 3:16 PM, Doug howe wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Hi David,
>>>>>>>>>> It still seems like there is a line that has to be drawn somewhere.
>>>>>>>>>> We've talked in the past about the scope of a process...when does
>>>>>>>>>> it
>>>>>>>>>> start and when does it end? A gene that has as it's primary role
>>>>>>>>>> regulation of transcription (perhaps binds DNA etc. etc.) may have
>>>>>>>>>> a
>>>>>>>>>> secondary effect upon eye morphogenesis. However, the process of
>>>>>>>>>> eye
>>>>>>>>>> morphogenesis does not start with the binding of such a gene to a
>>>>>>>>>> regulatory sequence...it is a downstream consequence....and perhaps
>>>>>>>>>> it
>>>>>>>>>> is the gene who's expression is being regulated that is really
>>>>>>>>>> involved
>>>>>>>>>> in the downstream process. It seems like there is a significant
>>>>>>>>>> amount
>>>>>>>>>> of redundant curation work to do if we always annotate both GO and
>>>>>>>>>> phenotype using the same GO process terms. I'm not strongly
>>>>>>>>>> opposed to
>>>>>>>>>> such annotations, I just want to revisit the discussion and see if
>>>>>>>>>> anyone has other views on the issue.
>>>>>>>>>> -Doug
>>>>>>>>>>
>>>>>>>>>> David Hill wrote:
>>>>>>>>>>
>>>>>>>>>>> Doug,
>>>>>>>>>>>
>>>>>>>>>>> I do this all the time. I just finished systematically doing all
>>>>>>>>>>> the homeobox genes in mouse. Many of them are annotated to things
>>>>>>>>>>> like pattern specification. I think in the future, it will be very
>>>>>>>>>>> nice to know these are playing roles in regulating transcription
>>>>>>>>>>> but that regulation is fundamental in other processes as well.
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>> Doug howe wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I'm still struggling with the issue of whether to make a GO
>>>>>>>>>>>> annotation (processes in particular) or only phenotype
>>>>>>>>>>>> annotation. The zebrafish literature is replete with mutant
>>>>>>>>>>>> papers that often describe phenotypes involving eyes, otic
>>>>>>>>>>>> vesicles, or pharyngeal arches, organ development etc. Often,
>>>>>>>>>>>> the IEA annotations for a gene seems to indicate that the gene is
>>>>>>>>>>>> binding DNA, and may be some sort of transcriptional regulator.
>>>>>>>>>>>> Should such a gene be annotated with GO terms like 'otic vesicle
>>>>>>>>>>>> development', or 'eye morphogenesis', or should that be left for
>>>>>>>>>>>> phenotype annotations?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Doug Howe, Ph.D.
>>>>>>>>>> ZFIN Scientific Curator
>>>>>>>>>> Zebrafish Nomenclature Coordinator
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Annotation mailing list
>>>>>>>>>> Annotation at geneontology.org
>>>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Annotation mailing list
>>>>>>>>> Annotation at geneontology.org
>>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Annotation mailing list
>>>>>>>> Annotation at geneontology.org
>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Annotation mailing list
>>>>>>> Annotation at geneontology.org
>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> _______________________________________________
>>>>>> Annotation mailing list
>>>>>> Annotation at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Annotation mailing list
>>>>> Annotation at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> ---------------------------------------------------------------------------
>>> Valerie Wood Tel: 01223 496909
>>> S. pombe Genome Project Fax: 01223 494919 Wellcome Trust Sanger
>>> Institute email: val at sanger.ac.uk
>>> Wellcome Trust Genome Campus http://www.genedb.org/genedb/pombe
>>> Hinxton, Cambridge, CB10 1HH http://www.sanger.ac.uk/Projects/S_pombe
>>>
>>>
>>>
>>> --
>>> The Wellcome Trust Sanger Institute is operated by Genome Research
>>> Limited, a charity registered in England with number 1021457 and a company
>>> registered in England with number 2742969, whose registered office is 215
>>> Euston Road, London, NW1 2BE.
>>> _______________________________________________
>>> Annotation mailing list
>>> Annotation at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>
>
>
More information about the Annotation
mailing list