[go] Putting method/program names into the with field for ISS

Suzanna Lewis suzi at berkeleybop.org
Thu Sep 20 04:42:53 PDT 2007


On Sep 19, 2007, at 10:08 PM, Karen Christie wrote:

> I don't think anyone is suggesting that such identifiers, including  
> domain and HMM identifiers as well as individual sequence  
> identifiers, shouldn't be put into the 'with/from' column when  
> available.
>
> However, there are cases when there just isn't anything of that  
> sort to put in this column. Both snoRNAs and tRNAs are a good  
> example. Both of these types of RNAs are generally predicted by  
> methods that analyze both the primary sequence and the predicted  
> nucleic acid secondary structures of the gene product, not by  
> orthology methods. The two protein examples were both based on  
> algorithms that analyze sequence to determine hydrophobicity and  
> predict transmembrane domains. In all of these examples, the method  
> is clearly based purely upon the sequence of the gene product. Thus  
> these all fit into ISS, but there is no identifier for a sequence,  
> domain, or HMM that can be put into the with column.
>
> I really think that the evidence code should be based on the method  
> used, not on how the 'with/from' column can be filled; this is  
> supporting evidence after all. In the interest of having a logical  
> system that makes sense, especially when teaching it to new people,  
> I think it is important that we don't implement arcane rules where  
> the type of supporting evidence takes precedence over the method used.
>
> So, regardless of what we decide about filling the with column for  
> these types of situations, I think that these situations should  
> stay in ISS because they are clearly all methods based purely on  
> the sequence of the gene product. Personally, I can live with any  
> of three options that have come up in this thread:
>
> 1. the system I proposed where we start maintaining a new file to  
> track methods, not necessarily elegant and even the 10 or so  
> examples I used highlight the difficulties in tracking down  
> references for some methods, but meets our other requirements that  
> things have both a namespace and an ID.

This is the way to go IMO

> 2. Allow the with column to be filled with 'not applicable', or  
> some other descriptive phrase, for cases when there is no ID for a  
> sequence, domain, HMM, etc, but just a method or sequence consensus  
> without an ID

Nope

>
> 3. Relax the rule that the with column is mandatory for ISS

Nope

>
> -Karen
>
> P.S. Could we start calling this column the 'supporting evidence'  
> column or something else descriptive. Right now, it's full name is  
> 'with/from', but we've also allowed the column to be filled for IMP  
> where neither of those prepositions is really appropriate.
>
>
>
>
> On Tue, 18 Sep 2007, Suzanna Lewis wrote:
>
>> Actually there are (hoped for) operational reasons for requiring a  
>> sequence accession in the 'with' column (and if there is >1 then a  
>> representative one is just fine, because from there we could get  
>> to the other orthologs).
>>
>> The hope is that doing this should, in theory, make it possible to  
>> build in triggers such that if the annotation of the sequence in  
>> the 'with' column changes, then this could ripple back to all the  
>> annotations that were dependent/derived from this original.
>>
>> I would very much hate to see us give up on this. The GO is one of  
>> the few group that is even trying to indicate provenance and  
>> traceability. It is difficult, but very important.
>>
>> -S
>>
>> On Sep 12, 2007, at 8:44 AM, Susan Tweedie wrote:
>>
>>> At the risk of returning us to square one on this... I'd like to  
>>> take a
>>> step back and revisit why we decided it was vital to have  
>>> something in
>>> the with column for ISS. I thought this stemmed from an attempt at
>>> enforcing quality annotations - we wanted to identify the similar
>>> 'thing' for which there is experimental evidence and to use ISS only
>>> where this was available. We then shifted ground a bit to  
>>> acknowledge
>>> that there are cases where there is a strong case for ISS  
>>> annotation but
>>> no single sequence can be identified for this column. So what do we
>>> actually achieve by filling-in the slot for these cases? It seems  
>>> to me
>>> this is more to do with us saying 'yup I'm being stringent about  
>>> my use
>>> of ISS so I've stuck something in this column to prove it' than  
>>> actually
>>> helping users. The 'how they did it' in the the paper just like  
>>> it is
>>> for other evidence codes. I'm not sure we 'gain' enough here to  
>>> justify
>>> mixing methods and objects in the 'with' column and I am  
>>> struggling to
>>> see the justification for making ISS a special case in this  
>>> respect. If
>>> we show a method for ISS, do we set a precedent and run the risk of
>>> users wanting to know whether it was RNAi or knock-out for IMP etc?
>>> I guess I'd just like to know we haven't just made this column  
>>> mandatory
>>> as a means of policing curators. I strongly agree that we should  
>>> fill in
>>> a sequence where possible and do our best (within reason) to be sure
>>> there is an experiment there somewhere but, if we are going to  
>>> accept
>>> that there are cases where we can't identify a suitable sequence,  
>>> can't
>>> we just trust curator judgement i.e. leave the column blank and let
>>> people read the paper to see details of how it was done?
>>> If we stick with the plan to keep 'with' mandatory for ISS then  
>>> Karen's
>>> system is very nice. But what do we do for cases like Michelle's  
>>> example
>>> where a whole variety of similarity based methods are used. I  
>>> find this
>>> crops up time after time and I wouldn't want to have to list all  
>>> methods
>>> in this column and it doesn't seem very satisfactory to pick
>>> representative examples?
>>> Susan
>>> On Tue, 2007-09-11 at 19:03 +0000, Valerie Wood wrote:
>>>> That OK,
>>>> I just think its rather a trawl to have to create something to  
>>>> go in the 'with' field when the PMID of the published algorithm  
>>>> is sufficient.
>>>> My other reasoning was that these aren't purely based on  
>>>> 'sequence similarity', they always include some 'other   
>>>> additional step' (although I agree they are 'sequence based')
>>>> and thirdly, this could become hazy, if we got functional  
>>>> prediction methods which combined sequence data with some  
>>>> experimental date (like cellular localization), for example,  
>>>> would be be RCA (I presume). It therefore seemed  that if the  
>>>> distinction was that ISS needed to have some 'object' which  
>>>> represented a sequence in the 'with' column (rather than  
>>>> allowing the with column to contain other types of things,  
>>>> referring to algorithms), it would be quite a nice distinction.   
>>>> If you can't locate this object  then the method probably  
>>>> includes something else in addition to 'sequence similarity'.
>>>> However, these were just for consideration, I really have no  
>>>> strong preference either way..... although I prefer easy :)
>>>> Val
>>>> "Gwinn-Giglio, Michelle" <MLGwinn at jcvi.org> wrote:
>>>>> Ben,
>>>>> Yes, sorry to not be clear - I was disagreeing with Val's  
>>>>> suggestion to use RCA for things like TMHMM and tRNAscan.  At  
>>>>> least I think that was Val's suggestion and that is what I  
>>>>> diasagree with.
>>>>> Sorry to disagree with you Val.  :)
>>>>> Michelle
>>>>> -----Original Message-----
>>>>> From: Benjamin Hitz [mailto:hitz at genome.stanford.edu]
>>>>> Sent: Tue 9/11/2007 1:05 PM
>>>>> To: Gwinn-Giglio, Michelle
>>>>> Cc: GO mailing list
>>>>> Subject: Re: [go] Putting method/program names into the with  
>>>>> field for ISS
>>>>> On Sep 11, 2007, at 7:55 AM, Gwinn-Giglio, Michelle wrote:
>>>>>> Hi,
>>>>>> I disagree.  I think taking this approach would significantly  
>>>>>> muddy
>>>>>> the waters in terms of distinguishing between ISS and RCA.
>>>>>> Anything that is based only on sequence analysis, be it simple
>>>>>> Blast or vastly more complicated modeling methods, should be ISS
>>>>>> because at their heart they are all comparing sequences of known
>>>>>> function to ones with unknown function.  Whether they do simple
>>>>>> alignments to make that comparison or more complicated models, it
>>>>>> is still a sequence based analysis.
>>>>> I did not suggest otherwise.
>>>>> Ben
>>>>> --
>>>>> Ben Hitz
>>>>> Senior Scientific Programmer ** Saccharomyces Genome Database  
>>>>> ** GO
>>>>> Consortium
>>>>> Stanford University ** hitz at genome.stanford.edu
>>
>




More information about the Go mailing list