[go] Putting method/program names into the with field for ISS
Karen Christie
kchris at genome.Stanford.EDU
Mon Sep 10 16:20:08 PDT 2007
Putting method/program names into the with field for ISS
--------------------------------------------------------
I've reviewed several papers where ISS is the appropriate code, but
for which only a method could be placed into the with field. Thus, I
have some comments on how we might want to do this. I'll start with a
little background.
At the last GO meeting, we agreed to "Always use a WITH column for IEA
and ISS, containing a program name if necessary. For example, make a
ref to tRNAscan." However, we did not work out how to implement doing
this.
As phrased in the minutes, it sounds like the idea is just to put the
name of the method in the with column. If that's all that is required
then it's fairly simple to find an appropriate text string from a
paper to put in the with column. However, I'm kind of assuming that we
don't want to allow uncontrolled text strings in the with column mixed
in with things of the format namespace:ID.
Currently, to put something in the with column, it must have a
namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names
or methods, there are a couple problems with trying to put them into
this type of format. One is that some of the methods to which research
refer are not given an official name. The second, which applies to all
the papers I've read so far, is that none of them have a namespace.
If we need to format these in a way that is compatible with the
namespace:ID format, then GO could generate a 'database' of collected
methods. An entry in the GO.xrf_abbs file like the one below could
define a namespace for such a collection.
abbreviation: GO_CM
database: Gene Ontology Database collected methods
object: Accession (for collected method)
example_id: GO_CM:0000001
Then for the second part, we'd have to start a collection of these
various methods, probably just a file somewhat like the GO.xrf_abbs
file. For this, there are a couple issues to deal with:
1) The authors of methods don't always give them a clear name.
2) There isn't always a single source reference. For programmatic
methods, there is often a single source reference. However, for the
consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't
be comfortable designating a single reference as the source. In these
cases, I'd be happier if we could associate a number of relevant refs
to the 'method'. In other cases, an algorithm is mentioned by name,
but no reference is cited.
However, with those issues in mind, perhaps collecting this
information would work.
- accession: accession ID given by GO
- method name: the name given to a program by the authors, when
available, or a descriptive name based on the paper
- developed in reference: the ID, e.g. PMID:xxxxx, for the reference
describing the development of a method, when applicable, but would not
be required. Can be filled with Not Applicable) for cases like 'box
C/D snoRNA consensus' where there isn't a specific program that was
developed. I don't know how we want to deal with cases like
'TMpredict' where they cited a reference that appears irrelevant or
'Kyte-Doolittle algorithm' where I didn't see a citation for the
algorithm.
- other references: Useful for cases like 'box C/D snoRNA consensus'
where there isn't a specific program that was developed, but where you
can cite 1 or more references which describe what the consensus is.
- method classification: maybe this tag isn't necessary, but I thought
it might be useful, particularly if we ever get to a situation where
we have this in a database where you can search on this field.
Below is what I would fill in for each field for the references listed
at:
http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-ExamplePapers.html
The comments in parentheses are just comments to correlate the info below
with the Example papers, and would not be included in the proposed file.
accession: GO_CM:0000001
method name: box C/D snoRNA probabilistic model
developed in reference: PMID:10024243
method classification: box C/D snoRNA gene prediction
(would be used for example #1)
accession: GO_CM:0000002
method name: box C/D snoRNA consensus
developed in reference: Not Applicable
other references: PMID:8674114; PMID:16484372
method classification: box C/D snoRNA gene prediction
(would be used for example #s 2 & 3)
accession: GO_CM:0000003
method name: snoGPS
developed in reference: PMID:15306656
method classification: box H/ACA snoRNA gene prediction
(would be used for example #4)
accession: GO_CM:0000004
method name: box H/ACA snoRNA consensus
developed in reference: Not Applicable
other references: PMID:12007400
method classification: box H/ACA snoRNA gene prediction
(would be used for example #5)
accession: GO_CM:0000005
method name: TMpredict
developed in reference: ?
(paper #6 cites a reference, but seems incorrect
did not find an appropriate citation via PubMed)
method classification: protein hydrophobicity
(would be used for example #6)
accession: GO_CM:0000006
method name: Kyte-Doolittle algorithm
developed in reference: ? (paper #7 does not cite a reference)
method classification: protein hydrophobicity
(would be used for example #7)
accession: GO_CM:0000007
method name: tRNAscan
developed in reference: PMID:1870126
other references: PMID:
method classification: tRNA gene prediction
(The Lowe & Eddy tRNAscan-SE ref referred to this program as
"tRNAscan 1.3 by Fichant and Burks (12)" and cited this
paper. However, this paper doesn't appear to name the
algorithm at al.
accession: GO_CM:0000008
method name: Pavesi et al. tRNA prediction algorithm
developed in reference: PMID:8165140
method classification: tRNA gene prediction
(they don't name their algorithm, so this name is
derived from what they say, in conjuction with how
it was referred to in the Lowe & Eddy paper on
tRNAscan-SE.)
accession: GO_CM:0000009
method name: tRNAscan-SE
developed in reference: PMID:9023104
method classification: tRNA gene prediction
More information about the Go
mailing list