[Annotation] phenotype or GO-still struggling

Karen Christie kchris at genome.stanford.edu
Tue Jul 15 11:42:18 PDT 2008


Hi,

I completely agree with what Val said, including that in both Judy's and 
Harold's examples, I agree that it is appropriate to make annotations 
based on current knowledge.

I also sometimes come across genes where there is evidence that 
shows/suggests that it has multiple roles. Most recently REX2, which is 
involved in 3'-end processing of various different nuclear/nucleolar RNAs, 
and is also localized to the mitochondrion where its role is not clear but 
it interacts genetically with TRZ1, the tRNA 3'-end processing 
endonuclease. In cases like these, we make all the annotations that are 
supported by experimental evidence, even those that may be surprising.

However, much more frequently, I come across cases where the original idea 
about what a gene did, based solely on its mutant phenotype, is later 
shown to be due to a downstream effect of the mutation or to an artifact 
of the experimental system. In these types of cases, we choose not to 
represent these mutant phenotypes as GO annotations. Some examples of 
known cases for cerevisiae are below.

So, as more is known about a given gene, it seems it is often appropriate 
to reevaluate whether old annotations based on IMP are still valid. 
Sometimes they are and other times they really seem inappropriate in light 
of new knowledge. For multicellular organisms, many of these developmental 
mutants may well turn out to be genes specifically involved in that 
developmental process. I went to a talk a couple months ago that said that 
mammals have 100-1000 fold more specific regulatory transcription factors 
than yeast. A mutant in one of these might be quite informative as to 
which processes it regulates.

However, there will surely also be cases where there is something else is 
occurring. For example, a human disease called SCID (Severe Combined 
Immune Deficiency) is caused by deficiency of the enzyme adenosine 
deaminase (ADA). However, I'm not sure one would want to say that ADA is 
involved in immune cell development; it is generally active through the 
body. Rather, when ADA is defective, a toxic intermediate builds up and 
immune and other rapidly dividing cells are most sensitive. As the 
specific effect of ADA mutantions on immune cells is a pathology, rather 
than a normal process, it seems outwith the scope of GO to annotate the 
immune cell effect of ADA mutants.

SGD has also started having large sets of high-throughput mutant 
phenotypes data. We have found that many of these screens identify large 
sets of genes with a given phenotype. However, based on the knowledge of 
what many of these genes do, we have become rather leary of making GO 
annotations wholesale from these large mutant phenotype studies because 
the mutant phenotype doesn't seem to be a very specific indicator of the 
process the gene is involved in. We're seeing a lot of these now. We've 
basically decided that though we are quite happy to put these into our 
phenotype curation wholesale, we are not comfortable in making GO 
annotations based on these large scale phenotype screens.

-Karen

Some specific examples for cerevisiae:

1. cell cycle arrest phenotypes: people looked for things with cell
division cycle (cdc) arrest phenotypes in order to find cell cycle
regulators. Some cdc mutants actually are cell cycle regulators. However,
the collection of cdc mutants also includes:
- tRNAs
- tRNA synthetases
- an Hsp90 co-chaperone
- general transcription regulators, e.g. components of the Paf1
         transcription regulatory complex, members of the CCR4/NOT complex
- things involved in response to mating pheromone (which do cause G1
         arrest, but which are not thought to be cell cycle regulators)
- eIF4E cytoplasmic mRNA cap binding protein required for translation

Inhibition of ribosome synthesis can also produce cell cycle arrest. A
couple different U3 snoRNA associated complexes involved in the first
stages of rRNA processing and small ribosomal subunit assembly have
recently been characterized in yeast. U3 and many, if not most, of the
proteins are conserved. Depeleting for most of the individual protein
components of these complexes produces cell cycle arrest.

So, while SGD would be quite happy to have a cell cycle arrest -phenotype-
annotated for every gene, we don't really want to go on and make a GO
process annotation to cell cycle for many of these genes.

2. splicing vs translation - A lot of things that turn out to be involved 
in splicing of nuclear mRNAs were originally characterized as being 
involved in translation. This turns out to be due to the unusual 
distribution of introns in S. cerevisiae. Only about 270 genes, out of 
6000, contain introns, and these are predominantly found in protein coding 
genes. Thus splicing defects have a very immediate effect on translation 
due to loss of production of ribosomal proteins.

In light of the knowledge of why splicing mutants cause translation 
defects, we don't want to make GO process annotations to 
translation-related terms for splicing genes even if they do produce a 
translation-specific phenotype.

3. AAR2 - This gene was originally thought to be specifically required for 
splicing of MATa1 mRNA because mutant extracts appeared specifically 
defective in splicing this mRNA. It turns out to be due to the fact that 
MATa1 has 2 introns, while almost all other genes only have 1, which meant 
that the assay system ran out of splicing components when MATa1 was used, 
but not when any of the other test pre-mRNAs were used. There was actually 
a specific GO term (GO:0006377 - MATa1 (A1) pre-mRNA splicing) based on 
the original mutant characterization of this gene.

It turns out that Aar2 is actually part of general splicing factor U5
snRNP, and thus required for splicing generally. GO:0006377 was obsoleted
because a MATa1-specific splicing process does not occur.



On Fri, 11 Jul 2008, Valerie Wood wrote:

>
> Hi Judy/ Harold,
>
> In both of these examples (your heart development in the power point, and 
> Harolds ribosomal example), we would make these  annotations using current 
> practices (so I don't think we are being inconsistent here). I have a similar 
> example to Harolds where a subunit of  RNA polymerase II plays a specialized 
> role in cell separation. This is what the data shows and this is fine.
>
> What Karen and I are saying is that not EVERY annotation which can be made 
> from a phenotype deserves a process annotation in the context of all of the 
> available information.
>
> Some processes which initially appear to be due to a particular phenotype 
> turn out to be downstream effects based on subsequent information. We feel in 
> these cases, where the effect is *known* to be *indirect* effect of an 
> upstream process, then the process annotation based on this phenotype should 
> be removed. It seems increasingly that it is not helpful for our communities 
> using GO to make every annotation for the phenotype, if they are subsequently 
> shown to be a result of an upstream process. This is the feedback I have got 
> from my community, and makes more sense of global analysis.
>
> Sometimes the observations initially attributed to cell division defects  are 
> actually known to be due to defects in DNA repair or replication  because 
> replication is late and cytokinesis  too early cell division is compromised. 
> There are many more dependencies on rRNA processing and translation.
>
> If it is NOT clear (reported) that the phenotype is due to the upstream 
> process, then the IMP process from phenotype would still be valid.This shows 
> a different level of knowledge which can be captured by a curator when more 
> information is available. The phenotypes in these cases are still captured as 
> appropriate.
>
> Probably we have more cases like this because yeast are better studied, and 
> there are many dependencies in cell biology. SGD may have some better 
> examples as they have more legacy data.
>
> Val
>
>
>
>
>
> Judith Blake wrote:
>> Hi,
>> I sent a response with ppt and it's waiting to be moderated
>> 
>> J
>> 
>> Harold Drabkin wrote:
>>> 
>>> On the other hand, we have to be careful about applying what we think we 
>>> know to ignore what a  mutant phenotypes is telling you, because things 
>>> can be complicated. .I just finished looking at one of the ribosomal 
>>> proteins, Rpl10. There is  very little mouse data, but from skimming some 
>>> other references (human), it appears to be originally identified in a 
>>> screen for tumor suppressors. It is unclear why. It appears to be a 
>>> protein that associates with the large subunit after the subunit is 
>>> exported from the nucleus.  However, there is some reference to it's 
>>> release from the 60S ribosomal subunit as a mechanism of 
>>> transcript-specific translational control. This might have been reflected 
>>> in the search for tumor suppressors. Yet another paper describes it is a 
>>> zinc-binding transcription regulatory protein: which can bind to c-Jun i ( 
>>> this binding is dependent upon zinc ions and phosphorylation by protein 
>>> kinase C ). Haven't looked at those papers in detail;   But there is 
>>> something interesting going on (no one has done a KO in mouse that I can 
>>> find which might tell us a bit more), and I'm not at all sure one should 
>>> rule out that it participates in other processes other than the one 
>>> obvious from it's name. Just grist for the mill.
>>> 
>>> h
>>> 
>>> 
>>> Valerie Wood wrote:
>>>> I agree completely with Karen/SGD and this has been the procedure I have 
>>>> always followed.
>>>> In the absence of any other information, a mutant phenotype is frequently 
>>>> used to infer a specific process.  Once more information is available it 
>>>> often becomes clear that this is a downstream (indirect affect).
>>>> For example defects in ribosome biogenesis and translation  and general 
>>>> translation will often have plieotrophic affects which are indirect, as 
>>>> it will affect nearly every process downstream (for example there are 
>>>> associated downstream effects in chromosome segregation, cell division, 
>>>> and in multicellular organisms,  multiple developmental processes). This 
>>>> does not mean that a biologist would expect to see the annotations to 
>>>> these processes once the upstream process is known. If we did follow this 
>>>> logic, then we would find that all genes involved in translation, 
>>>> ribosome biogenesis and general replication would eventually become 
>>>> annotated to most other processes.
>>>> 
>>>> Another classic example from yeast is vacuolar targeting. Many mutants 
>>>> result in defects which result in  proteins usually localized to the 
>>>> vacuole becoming mislocalised and were initially interpreted as a defect 
>>>> in protein targeting. It has since become clear that many of these 
>>>> defects are very far upstream of the vacuolar targeting pathway, and this 
>>>> is just a downstream consequence of things being mis folded, mis 
>>>> transcribed etc. Subsequently these annotations have gradually been 
>>>> removed as better information has become available.
>>>> 
>>>> On the other hand, mutations in a gene may have phenotypic effects which 
>>>> you DO want to capture as processes (for example the effects of 
>>>> phenylalanine hydroxylase on skin pigmentation etc). However you would 
>>>> not necessarily want to curate the effect of a gene involved in all 
>>>> translation initation in a developmental process from a high throughput 
>>>> screen (once better information was avaiable). In Doug's example I would 
>>>> also follow Karen's suggestion and make the annotation if this is 
>>>> possibly specific transcription for the pathway (i.e specific to a subset 
>>>> of genes), but if the defect is definately general transcription I would 
>>>> not make the annotation.
>>>> 
>>>> Not caputuring EVERY phenotype using biological process should not be 
>>>> considered underannotation. The purpose of GO process annotations is to 
>>>> capture processes not phenotypes. Sometimes phenotypes are direct 
>>>> indicators of the process a gene is involved in sometimes they are not.
>>>> A major consequence of making these ubiquitous annotations is that can 
>>>> distort   genome wide analysis (not improve it), and  this is often the 
>>>> case when annotations  come from high throughput screens and early 
>>>> experiments. Over the past couple of years cerevisiae and pombe have done 
>>>> a lot of 'tidying' of these legacy annotations, and the genome-wide GO 
>>>> data is much improved and useful as a result.
>>>> 
>>>> This is also why annotations  to orthologs made using ISS should only be 
>>>> made by a curator on a gene by gene basis and not by an automated 
>>>> process. A curator is able to assess all of the available information to 
>>>> make an ISS annotation (from different organisms) and distinguish between 
>>>> current annotations and legacy annotations.
>>>> 
>>>> One way to distinguish these is whether the targets are generic (i.e 
>>>> every gene ) or specific (a subset of genes). If the genes targets are a 
>>>> subset of genes then the annotations is  probably valid.
>>>> 
>>>> Val
>>>> 
>>>> Karen Christie <kchris at genome.stanford.edu> wrote: 
>>>>> I don't think the GOC has ever had a policy, or even a recommendation, 
>>>>> that process annotations should be made from all mutant phenotypes, nor 
>>>>> do I think that it should.
>>>>> 
>>>>> For example, SGD is currently working on annotating phenotypes for Cell 
>>>>> Division Cycle (CDC) mutants, i.e. mutations which cause a cell cycle 
>>>>> arrest phenotype. Here are some of the ones I worked on yesterday:
>>>>>
>>>>>     CDC60   leucyl tRNA synthetase
>>>>>     PRT1    Subunit of eIF3
>>>>>     ALA1    alanyl-tRNA synthetase
>>>>>     CDC65   mitochondrial tRNA-Glu
>>>>>     SPT16   Subunit FACT transcription elongation complex
>>>>> 
>>>>> I don't think that anyone in the yeast community would expect or want to 
>>>>> see any of these genes annotated to a GO process related to the cell 
>>>>> cycle. There are lots of examples of where a mutant phenotype is due to 
>>>>> some downstream effect and not due to the primary defect.
>>>>> 
>>>>> So, at SGD, we try to focus on the primary process. Obviously, we don't 
>>>>> always know, but once we do, we like to avoid making GO annotations for 
>>>>> processes that are known to be downstream, rather than direct, results 
>>>>> of the mutation.
>>>>> 
>>>>> For Doug's specific example, if comparative data suggested that the gene 
>>>>> was a specific regulatory transcription factor, I'd probably be inclined 
>>>>> to go ahead and make specific process annotations. However, if 
>>>>> comparative data suggested that it was related to a Pol II general 
>>>>> transcription factor, I might not want to make a GO process annotation 
>>>>> to such a specific process.
>>>>> 
>>>>> At all of the Annotation Camps, we've always said that one should be 
>>>>> careful when making annotations from mutant phenotypes. At both of the 
>>>>> public ones, the question has come up of how much to annotate from 
>>>>> mutant phenotypes. The answer we've given has been that if one only has 
>>>>> a mutant phenotype to annotated from, then make the best annotations you 
>>>>> can. However, be aware that as you learn more, you may find that some of 
>>>>> the mutant phenotypes are indirect results rather than something the 
>>>>> gene product is directly involved in, and that in these cases you may 
>>>>> choose to remove process annotations based on these phenotypes.
>>>>> 
>>>>> I think this is still good advice, that curator judgement should play a 
>>>>> role in deciding whether a GO process annotation is merited from any 
>>>>> particular mutant phenotype.
>>>>> 
>>>>> -Karen
>>>>> 
>>>>> 
>>>>> On Sun, 6 Jul 2008, Judith Blake wrote:
>>>>>
>>>>> 
>>>>>> I can understand the duplication of effort, but since the GO and 
>>>>>> phenotype annotations aren't co-mingled in GOdb, the SGD genes would I 
>>>>>> think appear under-annotated if the effect of the gene on phenotype is 
>>>>>> not curated in BP. For comparative genomics studies using GO, this 
>>>>>> would be missing, yet available in the literature, information.
>>>>>> 
>>>>>> for mouse, the phenotype data is effectively 'disfunction' data, so the 
>>>>>> phenotype annotation reflects a different view from the GO annotation.
>>>>>> 
>>>>>> Judy
>>>>>> 
>>>>>> Julie Park wrote:
>>>>>> 
>>>>>>> Hi Doug,
>>>>>>> 
>>>>>>> SGD's practice on this is that if it is known that what is being 
>>>>>>> observed is a secondary/downstream effect, then we only capture it via 
>>>>>>> phenotypes and not as a GO process.  However, if the gene product in 
>>>>>>> question is not well characterized or there is a conflict in the 
>>>>>>> literature about whether it is a direct or indirect involvement then 
>>>>>>> we would give it a GO annotation.
>>>>>>> 
>>>>>>> We've made a decision to use GO to try and capture the primary role of 
>>>>>>> a gene product as much as possible and to reduce the duplication of 
>>>>>>> effort required to capture data both in GO and as phenotypes.
>>>>>>> 
>>>>>>> Just our take on things.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> -Julie
>>>>>>> 
>>>>>>> 
>>>>>>> On Jul 3, 2008, at 3:16 PM, Doug howe wrote:
>>>>>>>
>>>>>>> 
>>>>>>>> Hi David,
>>>>>>>> It still seems like there is a line that has to be drawn somewhere.
>>>>>>>> We've talked in the past about the scope of a process...when does it
>>>>>>>> start and when does it end?  A gene that has as it's primary role
>>>>>>>> regulation of transcription (perhaps binds DNA etc. etc.) may have a
>>>>>>>> secondary effect upon eye morphogenesis.  However, the process of eye
>>>>>>>> morphogenesis does not start with the binding of such a gene to a
>>>>>>>> regulatory sequence...it is a downstream consequence....and perhaps 
>>>>>>>> it
>>>>>>>> is the gene who's expression is being regulated that is really 
>>>>>>>> involved
>>>>>>>> in the downstream process.  It seems like there is a significant 
>>>>>>>> amount
>>>>>>>> of redundant curation work to do if we always annotate both GO and
>>>>>>>> phenotype using the same GO process terms.  I'm not strongly opposed 
>>>>>>>> to
>>>>>>>> such annotations, I just want to revisit the discussion and see if
>>>>>>>> anyone has other views on the issue.
>>>>>>>> -Doug
>>>>>>>> 
>>>>>>>> David Hill wrote:
>>>>>>>> 
>>>>>>>>> Doug,
>>>>>>>>> 
>>>>>>>>> I do this all the time. I just finished systematically doing all the 
>>>>>>>>> homeobox genes in mouse. Many of them are annotated to things like 
>>>>>>>>> pattern specification. I think in the future, it will be very nice 
>>>>>>>>> to know these are playing roles in regulating transcription but that 
>>>>>>>>> regulation is fundamental in other processes as well.
>>>>>>>>> 
>>>>>>>>> David
>>>>>>>>> 
>>>>>>>>> Doug howe wrote:
>>>>>>>>> 
>>>>>>>>>> I'm still struggling with the issue of whether to make a GO 
>>>>>>>>>> annotation (processes in particular) or only phenotype annotation. 
>>>>>>>>>> The zebrafish literature is replete with mutant papers that often 
>>>>>>>>>> describe phenotypes involving eyes, otic vesicles, or pharyngeal 
>>>>>>>>>> arches, organ development etc.   Often, the IEA annotations for a 
>>>>>>>>>> gene seems to indicate that the gene is binding DNA, and may be 
>>>>>>>>>> some sort of transcriptional regulator. Should such a gene be 
>>>>>>>>>> annotated with GO terms like 'otic vesicle development', or 'eye 
>>>>>>>>>> morphogenesis', or should that be left for phenotype annotations?
>>>>>>>>>>
>>>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Doug Howe, Ph.D.
>>>>>>>> ZFIN Scientific Curator
>>>>>>>> Zebrafish Nomenclature Coordinator
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Annotation mailing list
>>>>>>>> Annotation at geneontology.org
>>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Annotation mailing list
>>>>>>> Annotation at geneontology.org
>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> Annotation mailing list
>>>>>> Annotation at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>
>>>>>> 
>>>>> _______________________________________________
>>>>> Annotation mailing list
>>>>> Annotation at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>> 
>>>>>
>>>>> 
>>>>
>>>> 
>>>> ------------------------------------------------------------------------ 
>>>> 
>>>> _______________________________________________
>>>> Annotation mailing list
>>>> Annotation at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>> 
>>> 
>>> _______________________________________________
>>> Annotation mailing list
>>> Annotation at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>> 
>> 
>> 
>> 
>> 
>
>
> -- 
> ---------------------------------------------------------------------------
> Valerie Wood			 Tel: 01223 496909
> S. pombe Genome Project		 Fax: 01223 494919 
> Wellcome Trust Sanger Institute	 email: val at sanger.ac.uk
> Wellcome Trust Genome Campus	 http://www.genedb.org/genedb/pombe Hinxton, 
> Cambridge, CB10 1HH	 http://www.sanger.ac.uk/Projects/S_pombe
>
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a 
> charity registered in England with number 1021457 and a company registered in 
> England with number 2742969, whose registered office is 215 Euston Road, 
> London, NW1 2BE. _______________________________________________
> Annotation mailing list
> Annotation at geneontology.org
> http://fafner.stanford.edu/mailman/listinfo/annotation
>


More information about the Annotation mailing list