[go] Putting method/program names into the with field for ISS

Valerie Wood val at sanger.ac.uk
Tue Sep 11 01:24:25 PDT 2007


I agree with Ben that this would be a useful general  distinction. It 
would prevent people from considering RCA as being similar to 'IEA' and 
reinforce the fact that curator approval  is a requirement.

It would also resolve the issue 'what to put in the with column' which 
keeps recurring for
the TMM /GPI and annotations based on signal peptides, tRNA scan and 
other predictors (where the algorithms model additional constraints on 
the feature . i.e TMMs/GPI  includes hydrophobicity  and (I think) 
spatial information, tRNA scan uses complementary bp info etc. etc.). 
These could be RCA if approved by a curator, otherwise they would be IEA.

ISS would then be ' curator approved' based on alignment only (whether 
pairwise  RBH, multiple alignment, HMM or threading), RCA would be 
everything else (i.e any functional prediction which was not purely 
*alignment* based).

I don't think this is conficting with the new proposal but it would make 
the distinction between RCA and ISS and IEA clearer.

Val




Karen Christie wrote:

> Read the proposed scope of RCA. This code was requested to cover an 
> entirely different type of analysis than sequence similarity comparisons.
>
> In addition, if you read the last example for ISS, provided by 
> Michelle Gwinn on the basis of what TIGR does in their sequence 
> analysis methods, in the proposed new documentation (url below), it 
> states that ISS analyses may include more than one type of evidence.
>
> http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#iss
>
> -Karen
>
> On Mon, 10 Sep 2007, Benjamin Hitz wrote:
>
>>
>> Maybe this could exactly be the distinction between ISS and RCA.
>>
>> If you can specify a WITH value which corresponds to a single 
>> sequence, structure, or "family" (read as HMM or other statistical 
>> model) then it's ISS.  Otherwise it's RCA (if curated, obv.)
>>
>> Ben
>>
>>
>> On Sep 10, 2007, at 4:20 PM, Karen Christie wrote:
>>
>>> Putting method/program names into the with field for ISS
>>> --------------------------------------------------------
>>>
>>> I've reviewed several papers where ISS is the appropriate code, but
>>> for which only a method could be placed into the with field. Thus, I
>>> have some comments on how we might want to do this. I'll start with a
>>> little background.
>>>
>>> At the last GO meeting, we agreed to "Always use a WITH column for IEA
>>> and ISS, containing a program name if necessary. For example, make a
>>> ref to tRNAscan." However, we did not work out how to implement doing
>>> this.
>>>
>>> As phrased in the minutes, it sounds like the idea is just to put the
>>> name of the method in the with column. If that's all that is required
>>> then it's fairly simple to find an appropriate text string from a
>>> paper to put in the with column. However, I'm kind of assuming that we
>>> don't want to allow uncontrolled text strings in the with column mixed
>>> in with things of the format namespace:ID.
>>>
>>> Currently, to put something in the with column, it must have a
>>> namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names
>>> or methods, there are a couple problems with trying to put them into
>>> this type of format. One is that some of the methods to which research
>>> refer are not given an official name. The second, which applies to all
>>> the papers I've read so far, is that none of them have a namespace.
>>>
>>> If we need to format these in a way that is compatible with the
>>> namespace:ID format, then GO could generate a 'database' of collected
>>> methods.  An entry in the GO.xrf_abbs file like the one below could
>>> define a namespace for such a collection.
>>>
>>>  abbreviation: GO_CM
>>>  database: Gene Ontology Database collected methods
>>>  object: Accession (for collected method)
>>>  example_id: GO_CM:0000001
>>>
>>> Then for the second part, we'd have to start a collection of these
>>> various methods, probably just a file somewhat like the GO.xrf_abbs
>>> file. For this, there are a couple issues to deal with:
>>>
>>> 1) The authors of methods don't always give them a clear name.
>>>
>>> 2) There isn't always a single source reference. For programmatic
>>> methods, there is often a single source reference. However, for the
>>> consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't
>>> be comfortable designating a single reference as the source. In these
>>> cases, I'd be happier if we could associate a number of relevant refs
>>> to the 'method'. In other cases, an algorithm is mentioned by name,
>>> but no reference is cited.
>>>
>>> However, with those issues in mind, perhaps collecting this
>>> information would work.
>>>
>>> - accession: accession ID given by GO
>>>
>>> - method name: the name given to a program by the authors, when
>>>     available, or a descriptive name based on the paper
>>>
>>> - developed in reference: the ID, e.g. PMID:xxxxx, for the reference
>>> describing the development of a method, when applicable, but would not
>>> be required. Can be filled with Not Applicable) for cases like 'box
>>> C/D snoRNA consensus' where there isn't a specific program that was
>>> developed. I don't know how we want to deal with cases like
>>> 'TMpredict' where they cited a reference that appears irrelevant or
>>> 'Kyte-Doolittle algorithm' where I didn't see a citation for the
>>> algorithm.
>>>
>>> - other references: Useful for cases like 'box C/D snoRNA consensus'
>>> where there isn't a specific program that was developed, but where you
>>> can cite 1 or more references which describe what the consensus is.
>>>
>>> - method classification: maybe this tag isn't necessary, but I thought
>>> it might be useful, particularly if we ever get to a situation where
>>> we have this in a database where you can search on this field.
>>>
>>> Below is what I would fill in for each field for the references listed
>>> at: 
>>> http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-ExamplePapers.html 
>>>
>>>
>>> The comments in parentheses are just comments to correlate the info 
>>> below with the Example papers, and would not be included in the 
>>> proposed file.
>>>
>>> accession: GO_CM:0000001
>>> method name: box C/D snoRNA probabilistic model
>>> developed in reference: PMID:10024243
>>> method classification: box C/D snoRNA gene prediction
>>>     (would be used for example #1)
>>>
>>> accession: GO_CM:0000002
>>> method name: box C/D snoRNA consensus
>>> developed in reference: Not Applicable
>>> other references: PMID:8674114; PMID:16484372
>>> method classification: box C/D snoRNA gene prediction
>>>     (would be used for example #s 2 & 3)
>>>
>>> accession: GO_CM:0000003
>>> method name: snoGPS
>>> developed in reference: PMID:15306656
>>> method classification: box H/ACA snoRNA gene prediction
>>>     (would be used for example #4)
>>>
>>> accession: GO_CM:0000004
>>> method name: box H/ACA snoRNA consensus
>>> developed in reference: Not Applicable
>>> other references: PMID:12007400
>>> method classification: box H/ACA snoRNA gene prediction
>>>     (would be used for example #5)
>>>
>>> accession: GO_CM:0000005
>>> method name: TMpredict
>>> developed in reference: ?
>>>     (paper #6 cites a reference, but seems incorrect
>>>     did not find an appropriate citation via PubMed)
>>> method classification: protein hydrophobicity
>>>     (would be used for example #6)
>>>
>>> accession: GO_CM:0000006
>>> method name: Kyte-Doolittle algorithm
>>> developed in reference: ? (paper #7 does not cite a reference)
>>> method classification: protein hydrophobicity
>>>     (would be used for example #7)
>>>
>>> accession: GO_CM:0000007
>>> method name: tRNAscan
>>> developed in reference: PMID:1870126
>>> other references: PMID:
>>> method classification: tRNA gene prediction
>>>     (The Lowe & Eddy tRNAscan-SE ref referred to this program as
>>>     "tRNAscan 1.3 by Fichant and Burks (12)" and cited this
>>>     paper. However, this paper doesn't appear to name the
>>>     algorithm at al.
>>>
>>> accession: GO_CM:0000008
>>> method name: Pavesi et al. tRNA prediction algorithm
>>> developed in reference: PMID:8165140
>>> method classification: tRNA gene prediction
>>>     (they don't name their algorithm, so this name is
>>>     derived from what they say, in conjuction with how
>>>     it was referred to in the Lowe & Eddy paper on
>>>     tRNAscan-SE.)
>>>
>>> accession: GO_CM:0000009
>>> method name: tRNAscan-SE
>>> developed in reference: PMID:9023104
>>> method classification: tRNA gene prediction
>>>
>>
>> -- 
>> Ben Hitz
>> Senior Scientific Programmer ** Saccharomyces Genome Database ** GO 
>> Consortium
>> Stanford University ** hitz at genome.stanford.edu
>>
>>
>



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Go mailing list