[go] Putting method/program names into the with field for ISS
Valerie Wood
val at sanger.ac.uk
Tue Sep 11 01:37:26 PDT 2007
I responded to the RCA issue before I saw this. I still think the
previous distinction makes sense.
I also didn't reilise "with" was mandatory for IEA, and
propose that it would be simpler if it was only mandatory *if * there
were no supporting
PubMed ID.
We don't want to make life any more complicated than it already is.
Moving some of these 'combination algorithms' to the scope of RCA if
curator approved (which they fit perfectly) would simplify things,
without any loss of information.
Val
Karen Christie wrote:
> Putting method/program names into the with field for ISS
> --------------------------------------------------------
>
> I've reviewed several papers where ISS is the appropriate code, but
> for which only a method could be placed into the with field. Thus, I
> have some comments on how we might want to do this. I'll start with a
> little background.
>
> At the last GO meeting, we agreed to "Always use a WITH column for IEA
> and ISS, containing a program name if necessary. For example, make a
> ref to tRNAscan." However, we did not work out how to implement doing
> this.
>
> As phrased in the minutes, it sounds like the idea is just to put the
> name of the method in the with column. If that's all that is required
> then it's fairly simple to find an appropriate text string from a
> paper to put in the with column. However, I'm kind of assuming that we
> don't want to allow uncontrolled text strings in the with column mixed
> in with things of the format namespace:ID.
>
> Currently, to put something in the with column, it must have a
> namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names
> or methods, there are a couple problems with trying to put them into
> this type of format. One is that some of the methods to which research
> refer are not given an official name. The second, which applies to all
> the papers I've read so far, is that none of them have a namespace.
>
> If we need to format these in a way that is compatible with the
> namespace:ID format, then GO could generate a 'database' of collected
> methods. An entry in the GO.xrf_abbs file like the one below could
> define a namespace for such a collection.
>
> abbreviation: GO_CM
> database: Gene Ontology Database collected methods
> object: Accession (for collected method)
> example_id: GO_CM:0000001
>
> Then for the second part, we'd have to start a collection of these
> various methods, probably just a file somewhat like the GO.xrf_abbs
> file. For this, there are a couple issues to deal with:
>
> 1) The authors of methods don't always give them a clear name.
>
> 2) There isn't always a single source reference. For programmatic
> methods, there is often a single source reference. However, for the
> consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't
> be comfortable designating a single reference as the source. In these
> cases, I'd be happier if we could associate a number of relevant refs
> to the 'method'. In other cases, an algorithm is mentioned by name,
> but no reference is cited.
>
> However, with those issues in mind, perhaps collecting this
> information would work.
>
> - accession: accession ID given by GO
>
> - method name: the name given to a program by the authors, when
> available, or a descriptive name based on the paper
>
> - developed in reference: the ID, e.g. PMID:xxxxx, for the reference
> describing the development of a method, when applicable, but would not
> be required. Can be filled with Not Applicable) for cases like 'box
> C/D snoRNA consensus' where there isn't a specific program that was
> developed. I don't know how we want to deal with cases like
> 'TMpredict' where they cited a reference that appears irrelevant or
> 'Kyte-Doolittle algorithm' where I didn't see a citation for the
> algorithm.
>
> - other references: Useful for cases like 'box C/D snoRNA consensus'
> where there isn't a specific program that was developed, but where you
> can cite 1 or more references which describe what the consensus is.
>
> - method classification: maybe this tag isn't necessary, but I thought
> it might be useful, particularly if we ever get to a situation where
> we have this in a database where you can search on this field.
>
> Below is what I would fill in for each field for the references listed
> at:
> http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-ExamplePapers.html
>
>
> The comments in parentheses are just comments to correlate the info
> below with the Example papers, and would not be included in the
> proposed file.
>
> accession: GO_CM:0000001
> method name: box C/D snoRNA probabilistic model
> developed in reference: PMID:10024243
> method classification: box C/D snoRNA gene prediction
> (would be used for example #1)
>
> accession: GO_CM:0000002
> method name: box C/D snoRNA consensus
> developed in reference: Not Applicable
> other references: PMID:8674114; PMID:16484372
> method classification: box C/D snoRNA gene prediction
> (would be used for example #s 2 & 3)
>
> accession: GO_CM:0000003
> method name: snoGPS
> developed in reference: PMID:15306656
> method classification: box H/ACA snoRNA gene prediction
> (would be used for example #4)
>
> accession: GO_CM:0000004
> method name: box H/ACA snoRNA consensus
> developed in reference: Not Applicable
> other references: PMID:12007400
> method classification: box H/ACA snoRNA gene prediction
> (would be used for example #5)
>
> accession: GO_CM:0000005
> method name: TMpredict
> developed in reference: ?
> (paper #6 cites a reference, but seems incorrect
> did not find an appropriate citation via PubMed)
> method classification: protein hydrophobicity
> (would be used for example #6)
>
> accession: GO_CM:0000006
> method name: Kyte-Doolittle algorithm
> developed in reference: ? (paper #7 does not cite a reference)
> method classification: protein hydrophobicity
> (would be used for example #7)
>
> accession: GO_CM:0000007
> method name: tRNAscan
> developed in reference: PMID:1870126
> other references: PMID:
> method classification: tRNA gene prediction
> (The Lowe & Eddy tRNAscan-SE ref referred to this program as
> "tRNAscan 1.3 by Fichant and Burks (12)" and cited this
> paper. However, this paper doesn't appear to name the
> algorithm at al.
>
> accession: GO_CM:0000008
> method name: Pavesi et al. tRNA prediction algorithm
> developed in reference: PMID:8165140
> method classification: tRNA gene prediction
> (they don't name their algorithm, so this name is
> derived from what they say, in conjuction with how
> it was referred to in the Lowe & Eddy paper on
> tRNAscan-SE.)
>
> accession: GO_CM:0000009
> method name: tRNAscan-SE
> developed in reference: PMID:9023104
> method classification: tRNA gene prediction
>
>
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Go
mailing list