[go] finishing up Evidence Code Issues
Karen Christie
kchris at genome.Stanford.EDU
Mon Sep 10 16:15:37 PDT 2007
Hi,
Since I'm due on September 20th and will be going on maternity leave
shortly before the GO meeting, Mike asked me to send these remaining items
to finish up the Evidence Code documentation directly to the list to at
least get the discussion started. Some issues may need to be discussed at
the GO meeting as well.
This email will contain some responses to Midori's last email and a few
other email comments to resolve some minor comments on the current draft
of the new Evidence Code documentation. I will send separate emails to
deal with each of these specific issues:
1. Restriction that all unknowns MUST use ND
2. IMP vs IGI for single gene mutations, regardless of gene being annotated
3. How to put program or method names in the with column for ISS
4. Scope of the RCA evidence code
For both issues 2 and 4, I think that the recommendations I've made
will help make it possible to create a decision tree/flowchart that is
fairly simple and clear. I'll send a very rough draft of a flowchart
separately as well.
Note that for both #s 3 and 4, I have put some supplemental info into
html docs in my personal space. I did not spend much time doing html
formatting for these docs, on the thought that people might prefer to
move them to the GOC wiki. However, as the Evidence Code Committee was
not designated as a Working Group, I have no idea where to put them
within the wiki structure. If a spot is designated for them, they can
be moved to the wiki.
-Karen
Responses and comments on things in red on this page:
http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml
1. GO_REF documentation
> We should have documentation that explains GO_REF's and links to it
> when we refer to them.
Midori (15 Jun 2007):
Links can go to the existing GO References page:
http://www.geneontology.org/cgi-bin/references.cgi
I can write up a description (which will be brief; there's not an
enormous amount to say) and give it to Amelia to be added to the blurb
at the top of this page. The plain text file from which the web page
is generated contains a brief description of the format, which could
be HTMLified and also added to the blurb if it would be useful.
Karen (9 Sept 2007):
Please do. It would also be good if the page for the GO_REFs is made
easier to find in general in our documentation.
2. ChEBI IDs in with field?
> Do we allow things like ChEBI IDs in the with field?
Midori (15 Jun 2007):
I would say yes.
Karen (9 Sept 2007):
Perhaps we should make this a quick agenda item for the next GO
meeting, so that people can ratify this face to face, unless we get an
overwhelming response via email to proceed with allowing this new ID
for the with field.
3. IMP examples
> any more positive examples for IMP?, e.g. phenotypic similarity
Midori (15 Jun 2007):
Dredged up from email from January 2002 ...
Erich Schwarz needed to know which code to use for "other mutations
sharing a complex mutant phenotype syndrome with [a well-characterized
mutant]." My comment at the time was: "The situation you've described
is IMP, not IGI, because (if I understand correctly) you're looking at
one mutation at a time. Comparing the phenotype of one mutation to
that of another helps you interpret the meaning, but is not a kind of
genetic interaction."
I think this still holds. Erich provided some details of an example,
which I can forward if you want.
Karen (9 Sept 2007):
We can certainly include it, the more examples the better in my
opinion, but don't send it to me. I'll be going on maternity leave
soon and don't want to be responsible for this getting added.
4. use of with field for NAS
> The Evidence Code Committee discussed the idea of making GO
> annotations from Reactome entries. ... What does the full group feel
> about the idea of allowing the ID for a database record, when such
> exist, in the with field?
Midori (15 Jun 2007):
I'm all for including annotations based on Reactome entries -- they
have a well-developed curation system that deeply involves expert
biologists, so the statements in their records are very reliable.
I am not in favor of putting the Reactome ID in the with field for
these annotations, however, because the Reactome entry does not modify
or supplement the evidence; rather, the entry provides the
evidence. GO would effectively be using a Recatome record as a source
of information about a gene product, so it would make much more sense
to put the Reactome ID in the reference field.
For the more general database record case, it may be that I don't
sufficiently understand what might go in a GO_REF (or equivalent), so
I don't understand the rationale for allowing 'with' for NAS.
For the case where the author infers one thing from another, using a
GO ID in 'with' makes more sense, but I think it's not really
necessary because the author (presumably) hasn't actually made any GO
annotations, and hasn't stated observations or conclusions in terms
of, well, GO terms. (Perhaps this will change some day!) Also, note
that we have expressly disallowed the use of 'with' for NAS, so the
script would have to be changed if the use of with-for-NAS is agreed.
Karen (9 Sept 2007):
Regarding the idea of allowing Reactome IDs in the with field, the
thought was that it provided the specific information about which
record in Reactome made the statement, but the idea was
controversial even just with the Evidence Code Committee.
Regarding the idea of allowing GOids for NAS, I think you bring up a
good point that this may not make sense since the author has typically
not stated their statement in terms of a GOid from which an inference
was made. Allowing this may just be more confusing than helpful,
especially since deciding which GOid to put in the with field will
almost always be a curator judgement.
However, I wasn't one of the proponents of this idea, so those who
are may wish to defend it.
In any case, rather than adding yet another usage of the with column
that is potentially confusing to users, I could personallyjust go
with not allowing use of the with column at all for NAS.
5. Representation of examples for with/from:
Susan (14 Jun 2007):
IPI examples
Looks good but there something odd about the IPI example,
assuming I am looking at the latest version ok.
Firstly, the paper is about mouse proteins not Drosophila so could we
change FB to MGI please. Also, I am confused as to why there are three
lines shown - MGI just list the middle one:
FB:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI UniProt:protein_2_ID ...
FB:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI UniProt:protein_2_ID|UniProt:protein_3_ID ...
FB:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI FB:gene_2_ID
So unless I'm missing something I suggest we lose the extra lines and
have either:
MGI:1349216 Abcd3 GO:0005515 PMID:10551832 IPI UniProt:P33897|UniProt:Q61285
OR
MGI:gene_1_ID Abcd3 GO:0005515 PMID:10551832 IPI UniProt:protein_2_ID|UniProt:protein_3_ID
I'd prefer to include the real identifiers so it isn't a mix of 'real'
and 'example'.
Similarly there seems to be a mix of FB and SGD db identifiers in the
IGI examples. A possible alternative for IGI is:
In PMID:9043060, flies simultaneously mutant for three genes: klingon
(klg), sevenless (sev) and Son of sevenless (Sos) are used to show that
klingon participates in R7 photoreceptor fate commitment. This leads to
the annotation:
FB:FBgn0017590 klg GO:0045466 PMID:9043060 IGI FB:FBgn0003366|FB:FBgn0001965
Karen (9 Sept 2007):
I'm all for real examples, but I don't have time to dig them up for
every evidence code. Perhaps we could distribute this task around, so
that we have multiple real examples for each evidence code. It would
be good to have at least one example with one entry in the with
column, as well as the one with multiple. It would also be good if
they showed various IDs in the with field.
This would be a reasonable task if there was one person for each
evidence code to find some real examples, and then hopefully it would
be easy for Amelia to put them in the right format if she was given
all the specific info that should be in the table.
6. ISS & with col:
> Note that there should be good evidence that the gene product(s)
> placed in the with/from column actually has the activity, process,
> etc. being annotated.
Midori (15 Jun 2007):
Do we want to specifically say the "good evidence" should be
*experimental* evidence? Would be consistent with the Ref Genome
requirement, and good practice generally ...
Karen (9 Sept 2007):
We do have to remember that this Evidence Code document is not just
for the use of the Reference Genomes. While did agree that ISS should
not be made from pairwise BLAST unless the gene to be placed in the
with column has been experimentally characterized, the ISS code covers
more situations than just that. The with field may also contain Pfams,
Prosite, TIGRFAMS, CBS, COG, PANTHER, and we also have to determine
how to include method names here for stuff like tRNAscan and my
specific question about snoRNAs. Michelle Gwinn may wish to comment
on this too.
Typos, other trivial fixes:
-------------------------------------------------------
1. IGI
> Should we add a statement in the paragraph above to IGI, similar to
> the one in IMP, about care in making annotations from gain of
> function mutations ...?
Midori (15 Jun 2007):
Sounds reasonable to me.
Karen (9 Sept 2007):
OK, added to first paragraph of IGI.
2. Last paragraph of Introduction:
Midori (15 Jun 2007):
Change "effect" to "affect" in "... will also effect the quality of the resulting annotation."
Karen (9 Sept 2007):
done
3. IDA & IMP:
Midori (15 Jun 2007):
Does "over-expression" really need to be hyphenated? I've seen it
unhyphenated more frequently; also, there's one unhyphenated
occurrence in the document.
Karen (9 Sept 2007):
changed to unhyphenated
4. IGI examples:
Midori (15 Jun 2007):
The statements "For this type of experiment, use the IGI Code" could
be deleted -- they're redundant with the fact that the description
appears in a list headed "where the IGI code should be used."
Karen (9 Sept 2007):
done
More information about the Go
mailing list