[go] Putting method/program names into the with field for ISS
Benjamin Hitz
hitz at genome.Stanford.EDU
Mon Sep 10 16:34:28 PDT 2007
Maybe this could exactly be the distinction between ISS and RCA.
If you can specify a WITH value which corresponds to a single
sequence, structure, or "family" (read as HMM or other statistical
model) then it's ISS. Otherwise it's RCA (if curated, obv.)
Ben
On Sep 10, 2007, at 4:20 PM, Karen Christie wrote:
> Putting method/program names into the with field for ISS
> --------------------------------------------------------
>
> I've reviewed several papers where ISS is the appropriate code, but
> for which only a method could be placed into the with field. Thus, I
> have some comments on how we might want to do this. I'll start with a
> little background.
>
> At the last GO meeting, we agreed to "Always use a WITH column for IEA
> and ISS, containing a program name if necessary. For example, make a
> ref to tRNAscan." However, we did not work out how to implement doing
> this.
>
> As phrased in the minutes, it sounds like the idea is just to put the
> name of the method in the with column. If that's all that is required
> then it's fairly simple to find an appropriate text string from a
> paper to put in the with column. However, I'm kind of assuming that we
> don't want to allow uncontrolled text strings in the with column mixed
> in with things of the format namespace:ID.
>
> Currently, to put something in the with column, it must have a
> namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names
> or methods, there are a couple problems with trying to put them into
> this type of format. One is that some of the methods to which research
> refer are not given an official name. The second, which applies to all
> the papers I've read so far, is that none of them have a namespace.
>
> If we need to format these in a way that is compatible with the
> namespace:ID format, then GO could generate a 'database' of collected
> methods. An entry in the GO.xrf_abbs file like the one below could
> define a namespace for such a collection.
>
> abbreviation: GO_CM
> database: Gene Ontology Database collected methods
> object: Accession (for collected method)
> example_id: GO_CM:0000001
>
> Then for the second part, we'd have to start a collection of these
> various methods, probably just a file somewhat like the GO.xrf_abbs
> file. For this, there are a couple issues to deal with:
>
> 1) The authors of methods don't always give them a clear name.
>
> 2) There isn't always a single source reference. For programmatic
> methods, there is often a single source reference. However, for the
> consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't
> be comfortable designating a single reference as the source. In these
> cases, I'd be happier if we could associate a number of relevant refs
> to the 'method'. In other cases, an algorithm is mentioned by name,
> but no reference is cited.
>
> However, with those issues in mind, perhaps collecting this
> information would work.
>
> - accession: accession ID given by GO
>
> - method name: the name given to a program by the authors, when
> available, or a descriptive name based on the paper
>
> - developed in reference: the ID, e.g. PMID:xxxxx, for the reference
> describing the development of a method, when applicable, but would not
> be required. Can be filled with Not Applicable) for cases like 'box
> C/D snoRNA consensus' where there isn't a specific program that was
> developed. I don't know how we want to deal with cases like
> 'TMpredict' where they cited a reference that appears irrelevant or
> 'Kyte-Doolittle algorithm' where I didn't see a citation for the
> algorithm.
>
> - other references: Useful for cases like 'box C/D snoRNA consensus'
> where there isn't a specific program that was developed, but where you
> can cite 1 or more references which describe what the consensus is.
>
> - method classification: maybe this tag isn't necessary, but I thought
> it might be useful, particularly if we ever get to a situation where
> we have this in a database where you can search on this field.
>
> Below is what I would fill in for each field for the references listed
> at: http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-
> ExamplePapers.html
>
> The comments in parentheses are just comments to correlate the info
> below with the Example papers, and would not be included in the
> proposed file.
>
> accession: GO_CM:0000001
> method name: box C/D snoRNA probabilistic model
> developed in reference: PMID:10024243
> method classification: box C/D snoRNA gene prediction
> (would be used for example #1)
>
> accession: GO_CM:0000002
> method name: box C/D snoRNA consensus
> developed in reference: Not Applicable
> other references: PMID:8674114; PMID:16484372
> method classification: box C/D snoRNA gene prediction
> (would be used for example #s 2 & 3)
>
> accession: GO_CM:0000003
> method name: snoGPS
> developed in reference: PMID:15306656
> method classification: box H/ACA snoRNA gene prediction
> (would be used for example #4)
>
> accession: GO_CM:0000004
> method name: box H/ACA snoRNA consensus
> developed in reference: Not Applicable
> other references: PMID:12007400
> method classification: box H/ACA snoRNA gene prediction
> (would be used for example #5)
>
> accession: GO_CM:0000005
> method name: TMpredict
> developed in reference: ?
> (paper #6 cites a reference, but seems incorrect
> did not find an appropriate citation via PubMed)
> method classification: protein hydrophobicity
> (would be used for example #6)
>
> accession: GO_CM:0000006
> method name: Kyte-Doolittle algorithm
> developed in reference: ? (paper #7 does not cite a reference)
> method classification: protein hydrophobicity
> (would be used for example #7)
>
> accession: GO_CM:0000007
> method name: tRNAscan
> developed in reference: PMID:1870126
> other references: PMID:
> method classification: tRNA gene prediction
> (The Lowe & Eddy tRNAscan-SE ref referred to this program as
> "tRNAscan 1.3 by Fichant and Burks (12)" and cited this
> paper. However, this paper doesn't appear to name the
> algorithm at al.
>
> accession: GO_CM:0000008
> method name: Pavesi et al. tRNA prediction algorithm
> developed in reference: PMID:8165140
> method classification: tRNA gene prediction
> (they don't name their algorithm, so this name is
> derived from what they say, in conjuction with how
> it was referred to in the Lowe & Eddy paper on
> tRNAscan-SE.)
>
> accession: GO_CM:0000009
> method name: tRNAscan-SE
> developed in reference: PMID:9023104
> method classification: tRNA gene prediction
>
--
Ben Hitz
Senior Scientific Programmer ** Saccharomyces Genome Database ** GO
Consortium
Stanford University ** hitz at genome.stanford.edu
More information about the Go
mailing list