[Annotation] phenotype or GO-still struggling
Valerie Wood
val at sanger.ac.uk
Fri Jul 11 07:48:26 PDT 2008
Hi Judy/ Harold,
In both of these examples (your heart development in the power point,
and Harolds ribosomal example), we would make these annotations using
current practices (so I don't think we are being inconsistent here). I
have a similar example to Harolds where a subunit of
RNA polymerase II plays a specialized role in cell separation. This is
what the data shows and this is fine.
What Karen and I are saying is that not EVERY annotation which can be
made from a phenotype deserves a process annotation in the context of
all of the available information.
Some processes which initially appear to be due to a particular
phenotype turn out to be downstream effects based on subsequent
information. We feel in these cases, where the effect is *known* to be
*indirect* effect of an upstream process, then the process annotation
based on this phenotype should be removed. It seems increasingly that it
is not helpful for our communities using GO to make every annotation for
the phenotype, if they are subsequently shown to be a result of an
upstream process. This is the feedback I have got from my community, and
makes more sense of global analysis.
Sometimes the observations initially attributed to cell division
defects are actually known to be due to defects in DNA repair or
replication because replication is late and cytokinesis too early cell
division is compromised. There are many more dependencies on rRNA
processing and translation.
If it is NOT clear (reported) that the phenotype is due to the upstream
process, then the IMP process from phenotype would still be valid.This
shows a different level of knowledge which can be captured by a curator
when more information is available. The phenotypes in these cases are
still captured as appropriate.
Probably we have more cases like this because yeast are better studied,
and there are many dependencies in cell biology. SGD may have some
better examples as they have more legacy data.
Val
Judith Blake wrote:
> Hi,
> I sent a response with ppt and it's waiting to be moderated
>
> J
>
> Harold Drabkin wrote:
>>
>> On the other hand, we have to be careful about applying what we think
>> we know to ignore what a mutant phenotypes is telling you, because
>> things can be complicated. .I just finished looking at one of the
>> ribosomal proteins, Rpl10. There is very little mouse data, but from
>> skimming some other references (human), it appears to be originally
>> identified in a screen for tumor suppressors. It is unclear why. It
>> appears to be a protein that associates with the large subunit after
>> the subunit is exported from the nucleus. However, there is some
>> reference to it's release from the 60S ribosomal subunit as a
>> mechanism of transcript-specific translational control. This might
>> have been reflected in the search for tumor suppressors. Yet another
>> paper describes it is a zinc-binding transcription regulatory
>> protein: which can bind to c-Jun i ( this binding is dependent upon
>> zinc ions and phosphorylation by protein kinase C ). Haven't looked
>> at those papers in detail; But there is something interesting going
>> on (no one has done a KO in mouse that I can find which might tell us
>> a bit more), and I'm not at all sure one should rule out that it
>> participates in other processes other than the one obvious from it's
>> name. Just grist for the mill.
>>
>> h
>>
>>
>> Valerie Wood wrote:
>>> I agree completely with Karen/SGD and this has been the procedure I
>>> have always followed.
>>> In the absence of any other information, a mutant phenotype is
>>> frequently used to infer a specific process. Once more information
>>> is available it often becomes clear that this is a downstream
>>> (indirect affect).
>>> For example defects in ribosome biogenesis and translation and
>>> general translation will often have plieotrophic affects which are
>>> indirect, as it will affect nearly every process downstream (for
>>> example there are associated downstream effects in chromosome
>>> segregation, cell division, and in multicellular organisms,
>>> multiple developmental processes). This does not mean that a
>>> biologist would expect to see the annotations to these processes
>>> once the upstream process is known. If we did follow this logic,
>>> then we would find that all genes involved in translation, ribosome
>>> biogenesis and general replication would eventually become annotated
>>> to most other processes.
>>>
>>> Another classic example from yeast is vacuolar targeting. Many
>>> mutants result in defects which result in proteins usually
>>> localized to the vacuole becoming mislocalised and were initially
>>> interpreted as a defect in protein targeting. It has since become
>>> clear that many of these defects are very far upstream of the
>>> vacuolar targeting pathway, and this is just a downstream
>>> consequence of things being mis folded, mis transcribed etc.
>>> Subsequently these annotations have gradually been removed as better
>>> information has become available.
>>>
>>> On the other hand, mutations in a gene may have phenotypic effects
>>> which you DO want to capture as processes (for example the effects
>>> of phenylalanine hydroxylase on skin pigmentation etc). However you
>>> would not necessarily want to curate the effect of a gene involved
>>> in all translation initation in a developmental process from a high
>>> throughput screen (once better information was avaiable). In Doug's
>>> example I would also follow Karen's suggestion and make the
>>> annotation if this is possibly specific transcription for the
>>> pathway (i.e specific to a subset of genes), but if the defect is
>>> definately general transcription I would not make the annotation.
>>>
>>> Not caputuring EVERY phenotype using biological process should not
>>> be considered underannotation. The purpose of GO process annotations
>>> is to capture processes not phenotypes. Sometimes phenotypes are
>>> direct indicators of the process a gene is involved in sometimes
>>> they are not.
>>> A major consequence of making these ubiquitous annotations is that
>>> can distort genome wide analysis (not improve it), and this is
>>> often the case when annotations come from high throughput screens
>>> and early experiments. Over the past couple of years cerevisiae and
>>> pombe have done a lot of 'tidying' of these legacy annotations, and
>>> the genome-wide GO data is much improved and useful as a result.
>>>
>>> This is also why annotations to orthologs made using ISS should
>>> only be made by a curator on a gene by gene basis and not by an
>>> automated process. A curator is able to assess all of the available
>>> information to make an ISS annotation (from different organisms) and
>>> distinguish between current annotations and legacy annotations.
>>>
>>> One way to distinguish these is whether the targets are generic (i.e
>>> every gene ) or specific (a subset of genes). If the genes targets
>>> are a subset of genes then the annotations is probably valid.
>>>
>>> Val
>>>
>>> Karen Christie <kchris at genome.stanford.edu> wrote:
>>>> I don't think the GOC has ever had a policy, or even a
>>>> recommendation, that process annotations should be made from all
>>>> mutant phenotypes, nor do I think that it should.
>>>>
>>>> For example, SGD is currently working on annotating phenotypes for
>>>> Cell Division Cycle (CDC) mutants, i.e. mutations which cause a
>>>> cell cycle arrest phenotype. Here are some of the ones I worked on
>>>> yesterday:
>>>>
>>>> CDC60 leucyl tRNA synthetase
>>>> PRT1 Subunit of eIF3
>>>> ALA1 alanyl-tRNA synthetase
>>>> CDC65 mitochondrial tRNA-Glu
>>>> SPT16 Subunit FACT transcription elongation complex
>>>>
>>>> I don't think that anyone in the yeast community would expect or
>>>> want to see any of these genes annotated to a GO process related to
>>>> the cell cycle. There are lots of examples of where a mutant
>>>> phenotype is due to some downstream effect and not due to the
>>>> primary defect.
>>>>
>>>> So, at SGD, we try to focus on the primary process. Obviously, we
>>>> don't always know, but once we do, we like to avoid making GO
>>>> annotations for processes that are known to be downstream, rather
>>>> than direct, results of the mutation.
>>>>
>>>> For Doug's specific example, if comparative data suggested that the
>>>> gene was a specific regulatory transcription factor, I'd probably
>>>> be inclined to go ahead and make specific process annotations.
>>>> However, if comparative data suggested that it was related to a Pol
>>>> II general transcription factor, I might not want to make a GO
>>>> process annotation to such a specific process.
>>>>
>>>> At all of the Annotation Camps, we've always said that one should
>>>> be careful when making annotations from mutant phenotypes. At both
>>>> of the public ones, the question has come up of how much to
>>>> annotate from mutant phenotypes. The answer we've given has been
>>>> that if one only has a mutant phenotype to annotated from, then
>>>> make the best annotations you can. However, be aware that as you
>>>> learn more, you may find that some of the mutant phenotypes are
>>>> indirect results rather than something the gene product is directly
>>>> involved in, and that in these cases you may choose to remove
>>>> process annotations based on these phenotypes.
>>>>
>>>> I think this is still good advice, that curator judgement should
>>>> play a role in deciding whether a GO process annotation is merited
>>>> from any particular mutant phenotype.
>>>>
>>>> -Karen
>>>>
>>>>
>>>> On Sun, 6 Jul 2008, Judith Blake wrote:
>>>>
>>>>
>>>>> I can understand the duplication of effort, but since the GO and
>>>>> phenotype annotations aren't co-mingled in GOdb, the SGD genes
>>>>> would I think appear under-annotated if the effect of the gene on
>>>>> phenotype is not curated in BP. For comparative genomics studies
>>>>> using GO, this would be missing, yet available in the literature,
>>>>> information.
>>>>>
>>>>> for mouse, the phenotype data is effectively 'disfunction' data,
>>>>> so the phenotype annotation reflects a different view from the GO
>>>>> annotation.
>>>>>
>>>>> Judy
>>>>>
>>>>> Julie Park wrote:
>>>>>
>>>>>> Hi Doug,
>>>>>>
>>>>>> SGD's practice on this is that if it is known that what is being
>>>>>> observed is a secondary/downstream effect, then we only capture
>>>>>> it via phenotypes and not as a GO process. However, if the gene
>>>>>> product in question is not well characterized or there is a
>>>>>> conflict in the literature about whether it is a direct or
>>>>>> indirect involvement then we would give it a GO annotation.
>>>>>>
>>>>>> We've made a decision to use GO to try and capture the primary
>>>>>> role of a gene product as much as possible and to reduce the
>>>>>> duplication of effort required to capture data both in GO and as
>>>>>> phenotypes.
>>>>>>
>>>>>> Just our take on things.
>>>>>>
>>>>>> Regards,
>>>>>> -Julie
>>>>>>
>>>>>>
>>>>>> On Jul 3, 2008, at 3:16 PM, Doug howe wrote:
>>>>>>
>>>>>>
>>>>>>> Hi David,
>>>>>>> It still seems like there is a line that has to be drawn somewhere.
>>>>>>> We've talked in the past about the scope of a process...when
>>>>>>> does it
>>>>>>> start and when does it end? A gene that has as it's primary role
>>>>>>> regulation of transcription (perhaps binds DNA etc. etc.) may
>>>>>>> have a
>>>>>>> secondary effect upon eye morphogenesis. However, the process
>>>>>>> of eye
>>>>>>> morphogenesis does not start with the binding of such a gene to a
>>>>>>> regulatory sequence...it is a downstream consequence....and
>>>>>>> perhaps it
>>>>>>> is the gene who's expression is being regulated that is really
>>>>>>> involved
>>>>>>> in the downstream process. It seems like there is a significant
>>>>>>> amount
>>>>>>> of redundant curation work to do if we always annotate both GO and
>>>>>>> phenotype using the same GO process terms. I'm not strongly
>>>>>>> opposed to
>>>>>>> such annotations, I just want to revisit the discussion and see if
>>>>>>> anyone has other views on the issue.
>>>>>>> -Doug
>>>>>>>
>>>>>>> David Hill wrote:
>>>>>>>
>>>>>>>> Doug,
>>>>>>>>
>>>>>>>> I do this all the time. I just finished systematically doing
>>>>>>>> all the homeobox genes in mouse. Many of them are annotated to
>>>>>>>> things like pattern specification. I think in the future, it
>>>>>>>> will be very nice to know these are playing roles in regulating
>>>>>>>> transcription but that regulation is fundamental in other
>>>>>>>> processes as well.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> Doug howe wrote:
>>>>>>>>
>>>>>>>>> I'm still struggling with the issue of whether to make a GO
>>>>>>>>> annotation (processes in particular) or only phenotype
>>>>>>>>> annotation. The zebrafish literature is replete with mutant
>>>>>>>>> papers that often describe phenotypes involving eyes, otic
>>>>>>>>> vesicles, or pharyngeal arches, organ development etc.
>>>>>>>>> Often, the IEA annotations for a gene seems to indicate that
>>>>>>>>> the gene is binding DNA, and may be some sort of
>>>>>>>>> transcriptional regulator. Should such a gene be annotated
>>>>>>>>> with GO terms like 'otic vesicle development', or 'eye
>>>>>>>>> morphogenesis', or should that be left for phenotype annotations?
>>>>>>>>>
>>>>>>>>>
>>>>>>> --
>>>>>>> Doug Howe, Ph.D.
>>>>>>> ZFIN Scientific Curator
>>>>>>> Zebrafish Nomenclature Coordinator
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Annotation mailing list
>>>>>>> Annotation at geneontology.org
>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Annotation mailing list
>>>>>> Annotation at geneontology.org
>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>>
>>>>> _______________________________________________
>>>>> Annotation mailing list
>>>>> Annotation at geneontology.org
>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Annotation mailing list
>>>> Annotation at geneontology.org
>>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>>
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> Annotation mailing list
>>> Annotation at geneontology.org
>>> http://fafner.stanford.edu/mailman/listinfo/annotation
>>>
>>
>> _______________________________________________
>> Annotation mailing list
>> Annotation at geneontology.org
>> http://fafner.stanford.edu/mailman/listinfo/annotation
>
>
>
>
>
--
---------------------------------------------------------------------------
Valerie Wood Tel: 01223 496909
S. pombe Genome Project Fax: 01223 494919
Wellcome Trust Sanger Institute email: val at sanger.ac.uk
Wellcome Trust Genome Campus http://www.genedb.org/genedb/pombe
Hinxton, Cambridge, CB10 1HH http://www.sanger.ac.uk/Projects/S_pombe
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Annotation
mailing list