[go] Putting method/program names into the with field for ISS
Gwinn-Giglio, Michelle
MLGwinn at jcvi.org
Tue Sep 11 07:55:12 PDT 2007
Hi,
I disagree. I think taking this approach would significantly muddy the waters in terms of distinguishing between ISS and RCA.
Anything that is based only on sequence analysis, be it simple Blast or vastly more complicated modeling methods, should be ISS because at their heart they are all comparing sequences of known function to ones with unknown function. Whether they do simple alignments to make that comparison or more complicated models, it is still a sequence based analysis.
Karen has proposed and I agree that RCA (or ICA as we would like to rename it) should be reserved for cases where multiple types of evidence are combined to reach a conclusion. Sequence-based analysis is one type, two hybrid screens are another type, mass spec is another type, etc. When these different types of evidence are integrated together and a conclusion is drawn from that integration, this should be RCA (or better yet ICA).
I think Karen's ideas of how to store the method in the "with" field are good. It might be able to simplified a bit if necessary or if people think it will be confusing, but what she has proposed will thorougly store the information.
Michelle
-----Original Message-----
From: owner-go at genome.stanford.edu on behalf of Valerie Wood
Sent: Tue 9/11/2007 4:24 AM
To: Karen Christie
Cc: Benjamin Hitz; GO mailing list
Subject: Re: [go] Putting method/program names into the with field for ISS
I agree with Ben that this would be a useful general distinction. It
would prevent people from considering RCA as being similar to 'IEA' and
reinforce the fact that curator approval is a requirement.
It would also resolve the issue 'what to put in the with column' which
keeps recurring for
the TMM /GPI and annotations based on signal peptides, tRNA scan and
other predictors (where the algorithms model additional constraints on
the feature . i.e TMMs/GPI includes hydrophobicity and (I think)
spatial information, tRNA scan uses complementary bp info etc. etc.).
These could be RCA if approved by a curator, otherwise they would be IEA.
ISS would then be ' curator approved' based on alignment only (whether
pairwise RBH, multiple alignment, HMM or threading), RCA would be
everything else (i.e any functional prediction which was not purely
*alignment* based).
I don't think this is conficting with the new proposal but it would make
the distinction between RCA and ISS and IEA clearer.
Val
Karen Christie wrote:
> Read the proposed scope of RCA. This code was requested to cover an
> entirely different type of analysis than sequence similarity comparisons.
>
> In addition, if you read the last example for ISS, provided by
> Michelle Gwinn on the basis of what TIGR does in their sequence
> analysis methods, in the proposed new documentation (url below), it
> states that ISS analyses may include more than one type of evidence.
>
> http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#iss
>
> -Karen
>
> On Mon, 10 Sep 2007, Benjamin Hitz wrote:
>
>>
>> Maybe this could exactly be the distinction between ISS and RCA.
>>
>> If you can specify a WITH value which corresponds to a single
>> sequence, structure, or "family" (read as HMM or other statistical
>> model) then it's ISS. Otherwise it's RCA (if curated, obv.)
>>
>> Ben
>>
>>
>> On Sep 10, 2007, at 4:20 PM, Karen Christie wrote:
>>
>>> Putting method/program names into the with field for ISS
>>> --------------------------------------------------------
>>>
>>> I've reviewed several papers where ISS is the appropriate code, but
>>> for which only a method could be placed into the with field. Thus, I
>>> have some comments on how we might want to do this. I'll start with a
>>> little background.
>>>
>>> At the last GO meeting, we agreed to "Always use a WITH column for IEA
>>> and ISS, containing a program name if necessary. For example, make a
>>> ref to tRNAscan." However, we did not work out how to implement doing
>>> this.
>>>
>>> As phrased in the minutes, it sounds like the idea is just to put the
>>> name of the method in the with column. If that's all that is required
>>> then it's fairly simple to find an appropriate text string from a
>>> paper to put in the with column. However, I'm kind of assuming that we
>>> don't want to allow uncontrolled text strings in the with column mixed
>>> in with things of the format namespace:ID.
>>>
>>> Currently, to put something in the with column, it must have a
>>> namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names
>>> or methods, there are a couple problems with trying to put them into
>>> this type of format. One is that some of the methods to which research
>>> refer are not given an official name. The second, which applies to all
>>> the papers I've read so far, is that none of them have a namespace.
>>>
>>> If we need to format these in a way that is compatible with the
>>> namespace:ID format, then GO could generate a 'database' of collected
>>> methods. An entry in the GO.xrf_abbs file like the one below could
>>> define a namespace for such a collection.
>>>
>>> abbreviation: GO_CM
>>> database: Gene Ontology Database collected methods
>>> object: Accession (for collected method)
>>> example_id: GO_CM:0000001
>>>
>>> Then for the second part, we'd have to start a collection of these
>>> various methods, probably just a file somewhat like the GO.xrf_abbs
>>> file. For this, there are a couple issues to deal with:
>>>
>>> 1) The authors of methods don't always give them a clear name.
>>>
>>> 2) There isn't always a single source reference. For programmatic
>>> methods, there is often a single source reference. However, for the
>>> consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't
>>> be comfortable designating a single reference as the source. In these
>>> cases, I'd be happier if we could associate a number of relevant refs
>>> to the 'method'. In other cases, an algorithm is mentioned by name,
>>> but no reference is cited.
>>>
>>> However, with those issues in mind, perhaps collecting this
>>> information would work.
>>>
>>> - accession: accession ID given by GO
>>>
>>> - method name: the name given to a program by the authors, when
>>> available, or a descriptive name based on the paper
>>>
>>> - developed in reference: the ID, e.g. PMID:xxxxx, for the reference
>>> describing the development of a method, when applicable, but would not
>>> be required. Can be filled with Not Applicable) for cases like 'box
>>> C/D snoRNA consensus' where there isn't a specific program that was
>>> developed. I don't know how we want to deal with cases like
>>> 'TMpredict' where they cited a reference that appears irrelevant or
>>> 'Kyte-Doolittle algorithm' where I didn't see a citation for the
>>> algorithm.
>>>
>>> - other references: Useful for cases like 'box C/D snoRNA consensus'
>>> where there isn't a specific program that was developed, but where you
>>> can cite 1 or more references which describe what the consensus is.
>>>
>>> - method classification: maybe this tag isn't necessary, but I thought
>>> it might be useful, particularly if we ever get to a situation where
>>> we have this in a database where you can search on this field.
>>>
>>> Below is what I would fill in for each field for the references listed
>>> at:
>>> http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-ExamplePapers.html
>>>
>>>
>>> The comments in parentheses are just comments to correlate the info
>>> below with the Example papers, and would not be included in the
>>> proposed file.
>>>
>>> accession: GO_CM:0000001
>>> method name: box C/D snoRNA probabilistic model
>>> developed in reference: PMID:10024243
>>> method classification: box C/D snoRNA gene prediction
>>> (would be used for example #1)
>>>
>>> accession: GO_CM:0000002
>>> method name: box C/D snoRNA consensus
>>> developed in reference: Not Applicable
>>> other references: PMID:8674114; PMID:16484372
>>> method classification: box C/D snoRNA gene prediction
>>> (would be used for example #s 2 & 3)
>>>
>>> accession: GO_CM:0000003
>>> method name: snoGPS
>>> developed in reference: PMID:15306656
>>> method classification: box H/ACA snoRNA gene prediction
>>> (would be used for example #4)
>>>
>>> accession: GO_CM:0000004
>>> method name: box H/ACA snoRNA consensus
>>> developed in reference: Not Applicable
>>> other references: PMID:12007400
>>> method classification: box H/ACA snoRNA gene prediction
>>> (would be used for example #5)
>>>
>>> accession: GO_CM:0000005
>>> method name: TMpredict
>>> developed in reference: ?
>>> (paper #6 cites a reference, but seems incorrect
>>> did not find an appropriate citation via PubMed)
>>> method classification: protein hydrophobicity
>>> (would be used for example #6)
>>>
>>> accession: GO_CM:0000006
>>> method name: Kyte-Doolittle algorithm
>>> developed in reference: ? (paper #7 does not cite a reference)
>>> method classification: protein hydrophobicity
>>> (would be used for example #7)
>>>
>>> accession: GO_CM:0000007
>>> method name: tRNAscan
>>> developed in reference: PMID:1870126
>>> other references: PMID:
>>> method classification: tRNA gene prediction
>>> (The Lowe & Eddy tRNAscan-SE ref referred to this program as
>>> "tRNAscan 1.3 by Fichant and Burks (12)" and cited this
>>> paper. However, this paper doesn't appear to name the
>>> algorithm at al.
>>>
>>> accession: GO_CM:0000008
>>> method name: Pavesi et al. tRNA prediction algorithm
>>> developed in reference: PMID:8165140
>>> method classification: tRNA gene prediction
>>> (they don't name their algorithm, so this name is
>>> derived from what they say, in conjuction with how
>>> it was referred to in the Lowe & Eddy paper on
>>> tRNAscan-SE.)
>>>
>>> accession: GO_CM:0000009
>>> method name: tRNAscan-SE
>>> developed in reference: PMID:9023104
>>> method classification: tRNA gene prediction
>>>
>>
>> --
>> Ben Hitz
>> Senior Scientific Programmer ** Saccharomyces Genome Database ** GO
>> Consortium
>> Stanford University ** hitz at genome.stanford.edu
>>
>>
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://fafner.stanford.edu/pipermail/go/attachments/20070911/00503c65/attachment.html
More information about the Go
mailing list