[go] Putting method/program names into the with field for ISS

Benjamin Hitz hitz at genome.Stanford.EDU
Mon Sep 10 16:34:28 PDT 2007


Maybe this could exactly be the distinction between ISS and RCA.

If you can specify a WITH value which corresponds to a single  
sequence, structure, or "family" (read as HMM or other statistical  
model) then it's ISS.  Otherwise it's RCA (if curated, obv.)

Ben


On Sep 10, 2007, at 4:20 PM, Karen Christie wrote:

> Putting method/program names into the with field for ISS
> --------------------------------------------------------
>
> I've reviewed several papers where ISS is the appropriate code, but
> for which only a method could be placed into the with field. Thus, I
> have some comments on how we might want to do this. I'll start with a
> little background.
>
> At the last GO meeting, we agreed to "Always use a WITH column for IEA
> and ISS, containing a program name if necessary. For example, make a
> ref to tRNAscan." However, we did not work out how to implement doing
> this.
>
> As phrased in the minutes, it sounds like the idea is just to put the
> name of the method in the with column. If that's all that is required
> then it's fairly simple to find an appropriate text string from a
> paper to put in the with column. However, I'm kind of assuming that we
> don't want to allow uncontrolled text strings in the with column mixed
> in with things of the format namespace:ID.
>
> Currently, to put something in the with column, it must have a
> namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names
> or methods, there are a couple problems with trying to put them into
> this type of format. One is that some of the methods to which research
> refer are not given an official name. The second, which applies to all
> the papers I've read so far, is that none of them have a namespace.
>
> If we need to format these in a way that is compatible with the
> namespace:ID format, then GO could generate a 'database' of collected
> methods.  An entry in the GO.xrf_abbs file like the one below could
> define a namespace for such a collection.
>
>   abbreviation: GO_CM
>   database: Gene Ontology Database collected methods
>   object: Accession (for collected method)
>   example_id: GO_CM:0000001
>
> Then for the second part, we'd have to start a collection of these
> various methods, probably just a file somewhat like the GO.xrf_abbs
> file. For this, there are a couple issues to deal with:
>
> 1) The authors of methods don't always give them a clear name.
>
> 2) There isn't always a single source reference. For programmatic
> methods, there is often a single source reference. However, for the
> consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't
> be comfortable designating a single reference as the source. In these
> cases, I'd be happier if we could associate a number of relevant refs
> to the 'method'. In other cases, an algorithm is mentioned by name,
> but no reference is cited.
>
> However, with those issues in mind, perhaps collecting this
> information would work.
>
> - accession: accession ID given by GO
>
> - method name: the name given to a program by the authors, when
> 	available, or a descriptive name based on the paper
>
> - developed in reference: the ID, e.g. PMID:xxxxx, for the reference
> describing the development of a method, when applicable, but would not
> be required. Can be filled with Not Applicable) for cases like 'box
> C/D snoRNA consensus' where there isn't a specific program that was
> developed. I don't know how we want to deal with cases like
> 'TMpredict' where they cited a reference that appears irrelevant or
> 'Kyte-Doolittle algorithm' where I didn't see a citation for the
> algorithm.
>
> - other references: Useful for cases like 'box C/D snoRNA consensus'
> where there isn't a specific program that was developed, but where you
> can cite 1 or more references which describe what the consensus is.
>
> - method classification: maybe this tag isn't necessary, but I thought
> it might be useful, particularly if we ever get to a situation where
> we have this in a database where you can search on this field.
>
> Below is what I would fill in for each field for the references listed
> at: http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS- 
> ExamplePapers.html
>
> The comments in parentheses are just comments to correlate the info  
> below with the Example papers, and would not be included in the  
> proposed file.
>
> accession: GO_CM:0000001
> method name: box C/D snoRNA probabilistic model
> developed in reference: PMID:10024243
> method classification: box C/D snoRNA gene prediction
> 	(would be used for example #1)
>
> accession: GO_CM:0000002
> method name: box C/D snoRNA consensus
> developed in reference: Not Applicable
> other references: PMID:8674114; PMID:16484372
> method classification: box C/D snoRNA gene prediction
> 	(would be used for example #s 2 & 3)
>
> accession: GO_CM:0000003
> method name: snoGPS
> developed in reference: PMID:15306656
> method classification: box H/ACA snoRNA gene prediction
> 	(would be used for example #4)
>
> accession: GO_CM:0000004
> method name: box H/ACA snoRNA consensus
> developed in reference: Not Applicable
> other references: PMID:12007400
> method classification: box H/ACA snoRNA gene prediction
> 	(would be used for example #5)
>
> accession: GO_CM:0000005
> method name: TMpredict
> developed in reference: ?
> 	(paper #6 cites a reference, but seems incorrect
> 	did not find an appropriate citation via PubMed)
> method classification: protein hydrophobicity
> 	(would be used for example #6)
>
> accession: GO_CM:0000006
> method name: Kyte-Doolittle algorithm
> developed in reference: ? (paper #7 does not cite a reference)
> method classification: protein hydrophobicity
> 	(would be used for example #7)
>
> accession: GO_CM:0000007
> method name: tRNAscan
> developed in reference: PMID:1870126
> other references: PMID:
> method classification: tRNA gene prediction
> 	(The Lowe & Eddy tRNAscan-SE ref referred to this program as
> 	"tRNAscan 1.3 by Fichant and Burks (12)" and cited this
> 	paper. However, this paper doesn't appear to name the
> 	algorithm at al.
>
> accession: GO_CM:0000008
> method name: Pavesi et al. tRNA prediction algorithm
> developed in reference: PMID:8165140
> method classification: tRNA gene prediction
> 	(they don't name their algorithm, so this name is
> 	derived from what they say, in conjuction with how
> 	it was referred to in the Lowe & Eddy paper on
> 	tRNAscan-SE.)
>
> accession: GO_CM:0000009
> method name: tRNAscan-SE
> developed in reference: PMID:9023104
> method classification: tRNA gene prediction
>

--
Ben Hitz
Senior Scientific Programmer ** Saccharomyces Genome Database ** GO  
Consortium
Stanford University ** hitz at genome.stanford.edu






More information about the Go mailing list