[annotation] [Fwd:What evidence code to use?]
Harold Drabkin
hjd at informatics.jax.org
Wed Nov 21 11:08:33 PST 2007
We are in the same boat also;
Our RCA set (we just have one or two from Riken) are in the same boat.
Basically they used a bunch of things which were then examined by
"experts' .
SO our set has things in the WITH field that was used to make an
assignment; sometimes domain, sometimes sequence. The matching and
alignments were done and then bucketed by various means (the reference
for the paper has some details). There was no attempt to determine if,
when a sequence was used, the organism that owned it had an experiment
done with it to support the GO term. Definitely not an ISS as we use
(backed by experiment in comparison organism). It is based on a
computational method (motifs, domains, alignments) to point to something
which one of the translation tables spewed a GO term out of. Then a
curator looked at it to see if it were reasonable. Unlike our current
IEAs where everything is done without any monitoring (which is why some
of our IEAs come up with such informative terms as "catalytic activity".
We USED to call the Rikens ISS long ago; then changed because there was
no insistence on a link to anything experimental.
We are uncomfortable changing them to TAS. Presently since they are
static (we kill several a month because what's in the WITH is no longer
valid (domain no longer in a translation table), we are even tempted to
"archive" them.
hjd
> Forwarding this from the evidence code discussion group. Apologies to
> those who are on both lists. I've sorted the emails from top to
> bottom in chronological order for easier reading:
>
> ----------
> My original email:
>
> > Ah, the eternal question: Is it ISS, is it RCA?
> >
> > I've got a paper that describes the identification of a nice big set
> > of transcription factors in Arabidopsis.
> >
> >
> http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=11118137&dopt=AbstractPlus
>
> >
> >
> >
> > The authors use a combination of motif searches + BLAST + sequence
> > alignment and review those by eye and came up with 1500 or so genes
> > that they call 'transcription factors.'
> >
> > Right now, we've got these annotated to 'transcription factor
> > activity' with the evidence code ISS but nothing in the evidence_with
> > column. If I leave these as ISS, I'd like to put something in the
> > with column, but what? Does this type of a combination of sequence
> > analysis methods that's reviewed manually make it RCA? Not according
> > to the current RCA documentation:
> >
> > "Examples where the RCA evidence code should not be used:
> >
> > * Annotations based on more than one type of gene product sequence
> > based evidence, including such things as BLAST, profile HMMs, TMHMM,
> > SignalP, PROSITE, InterPro, mapping files such as interpro2go etc.
> > should use the ISS code. "
> >
> > Should I wait till ISS comes to a resolution?
> >
> > Help!
>
> ---------
> Ben's reply:
>
> If you can't put something USEFUL in the WITH column, I think this has
> to be RCA.
> I guess under the new, non-documented system, this would be ISS/no
> "With" ISA/ISO/ISM would require withs... (either seq ids or model
> aka interpro ids).
>
>
> Ben
>
> ----------
>
> Val's reply:
>
> This is *exactly* the type of data why I was orginally suggesting that
> RCA should not be restricted to analysis which include some
> experimental component. Unfortunately I couldn't come up with any
> good examples at the time.
>
> These would surely be better as RCA, even though they are sequence based
>
> Val
>
> ----------
>
> Susan's reply:
>
> I've just hit another example...
>
> Enhanced function annotations for Drosophila serine proteases: A case
> study for
> systematic annotation of multi-member gene families.
>
> Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE,
> Rodrigues V,
> White KP, Bork P, Sowdhamini R.
>
> PMID: 17996400
>
> This is a functional classification of serine proteases based on a
> 'function residue clustering' algorithm. The algorithm incorporates info
> from sequence alignments, hydrophobicity plots and info about key
> residues from 3D structures - all sequence based but no one thing to put
> in the 'with'.
>
> Susan
>
> -----------
>
> Pascale's reply:
>
> Tanya,
>
> I thought we agreed that BLAST and InterPro were ISS, as you point
> out. I don't think ISS + ISS = RCA?? That is, I would say using
> InterPro or the BLAST result should be enough to make the annotation;
> we dont need to capture both? In this case, the easiest might be using
> ISS with an InterPro domain ID in the 'with',
>
> Similarly in the paper Susan cites, they mention several domains and
> also they have compared to several proteins whose 3D structure has
> been determined hence can be used in the 'with' - I would pick one of
> those example proteins and ISS to that.
>
> Pascale
>
> ---------
>
> Any other thoughts?
>
>
> Thanks,
>
> Tanya
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -------- Original Message --------
> Subject: Re: [evidence] What evidence code to use?
> Date: Wed, 21 Nov 2007 08:43:16 -0500
> From: Pascale Gaudet <pgaudet at northwestern.edu>
> Reply-To: pgaudet at northwestern.edu
> Organization: Northwestern University
> To: tberardi at acoma.stanford.edu
> CC: evidence at genome.stanford.edu
> References: <47437C88.5070204 at acoma.stanford.edu>
>
> Tanya,
>
> I thought we agreed that BLAST and InterPro were ISS, as you point out.
> I don't think ISS + ISS = RCA?? That is, I would say using InterPro or
> the BLAST result should be enough to make the annotation; we dont need
> to capture both? In this case, the easiest might be using ISS with an
> InterPro domain ID in the 'with',
>
> Similarly in the paper Susan cites, they mention several domains and
> also they have compared to several proteins whose 3D structure has been
> determined hence can be used in the 'with' - I would pick one of those
> example proteins and ISS to that.
>
> Pascale
>
>
>> ------------------------------------------------------------------------------------------
>>
>> Tanya Berardini, Ph.D. tberardi at acoma.stanford.edu
>> The Arabidopsis Information Resource FAX: (650) 325-6857
>> Carnegie Institution of Washington Tel: (650) 325-1521 ext. 325
>> Department of Plant Biology URL: http://arabidopsis.org/
>> 260 Panama St.
>> Stanford, CA 94305
>> ------------------------------------------------------------------------------------------
>>
>>
>>
>
>
More information about the Annotation
mailing list