[go] Putting method/program names into the with field for ISS

Harold Drabkin hjd at informatics.jax.org
Mon Oct 8 10:18:17 PDT 2007


Actually, the sometimes with-yes, sometimes with-no, has a precedent: 
the use of IMP.
When IMP is used for an RNAi, there is usually no "id" that can be 
given.  Additionally, in some cases even at MGI, if an allele has not 
been assigned, because it was never taken to generate a "whole" mouse, 
but only ES cells, we would still give it an IMP but we can't put an MGI 
id for an allele in the with field.

hjd

Michelle Gwinn Giglio wrote:
>
>
> Hi Ben,
>
> The email you are replying to was written before the GOC meeting and I 
> think that most of these issues were discussed at the GOC meeting and 
> (I think) consensus was reached.
>
> As I understand the outcome of the discussion:
> ISS stays as a parent to the new sequence-based codes which are:
> ISA - inferred from sequence alignment
> ISO - inferred from sequence orthology
> ISM - inferred from sequence model (to include all things like HMMs, 
> TMHMM, tRNASCAN, etc.)
>
> I think the agreement was that ISM should have values in the "with" 
> field for things like HMMs, but did not requre "with" for things like 
> tRNASCAN.  I agree that it is problematic to have a code that can have 
> "with" sometimes but does not always require "with" - it makes it 
> nearly impossible to do proper quality checks.  However, there did not 
> seem to be an uproar at this idea and it seemed to fly ok.
>
> Please don't interpret the fact that I did not talk about "with" in my 
> earlier email before the meeting as an indication that I don't value 
> it - in fact I think it is very important to store "with" 
> information.  And indeed I would like to include in the "with" field 
> the method (like tRNASCAN) when a particular accession is not 
> available.   I would be happy for the evidence code committee to 
> discuss whether we should do this and ways of doing so (like Karen's 
> suggestion).
>
> You raised the point of what to do with computational anaysis that is 
> not sequence based and is not combinatorial.  This is not my area.  
> All the arguments I have made regarding RCA and its scope are on 
> behalf of Karen who very much wants a code that captures the concept 
> of integration of various forms of data into one result.  It was left 
> open at the GOC meeting to create more granular instances of RCA and 
> perhaps this would be one and combinatorial another - but I leave that 
> debate to those who deal with these kinds of annotations on a regular 
> basis.
>
> Finally, your point about sequence and structure being the same.  I 
> think it is possible for two proteins to form similar structures but 
> to have quite different primary sequences.  Therefore, it would be 
> possible to have the cystal structure of two proteins, observe that 
> they are similar, and then conclude that they might share a similar 
> function.  In fact there are 4 NIH-funded centers which are part of 
> the Protein Structure Initiative which are devoted to producing large 
> scale data sets of structures, many from proteins of unknown 
> function.  So, although I have not personally done any annotation of 
> this sort, I expect that comparisons of this kind will become more 
> frequent as more structures are produced.
>
> Michelle
>
>
>
>
> Ben Hitz wrote:
>>>
>>
>> I guess, first point. - I don't think there is any need to separate  
>> "sequence" from "structure" since sequence is merely a proxy for the  
>> chemical structure, so I will use sequence to mean both.  If anyone  
>> has a contrary opinion, I would like to hear it.    I just want to  
>> make this disclaimer so we can use "sequence" to mean both.
>>
>>> I think it is important to be able to distinguish methods that are  
>>> based on just sequence analysis from everything else and that ISS  
>>> should be the code to describe this.
>>
>>
>> I don't necessarily disagree - but I don't take this as a given  
>> either.  WHY is this important?   And why SOLEY sequence analysis 
>> and  not partially sequence analysis?
>>
>> Here are some things I think might be important:
>> o That the association is based on some computational theory, not an  
>> experiment (so it would not fall under the proposed EXP hierarchy.
>> o That in cases where an association is transferred from a specific  
>> gene product or family of gene products, that that the "transferree"  
>> is mentioned.
>> o whether or not the association has been reviewed by a curator
>> o whether or not the method has been reviewed by a curator (sub case  
>> if the above is not true)
>> o whether or not this is a (computational) prediction based on  
>> combining several sources of data (aka "Baysian Blah Blah Blah")
>>
>>> I think we need at least 2 categories:
>>> -one for all things sequence based (ISS or whatever new name might  
>>> be created)
>>> -one for combinatorial analyses that bring together different types  
>>> of information to reach a conclusion (ICA/RCA)
>>
>>
>> You are not accounting for some other non-sequence, 
>> non-combinatorial  analysis.  For example - there are many algorithms 
>> that infer  biological process from pattern of physical interactions 
>> - while this  seems to me be your 2nd class (Non-sequence), it's only 
>> based on 1  source of data.
>>
>>>
>>> If people feel there should be a code for alignments and only  
>>> alignments then we will need to split the sequence-based category  
>>> into 2 which would then give us 3 total:
>>> -orthology based evidence
>>> -all other sequence based evidence
>>> -combinatorial analyses that bring together different types of  
>>> information to reach a conclusion (ICA/RCA)
>>>
>>> I favor the first option (2 categories, not 3) as I think it is  
>>> cleaner and easier for people to understand.  If we feel the need  
>>> to change the name of ISS to reflect this more encompassing  
>>> definition, then OK, but that brings another whole can of worms  
>>> with it (what about legacy data, will the community have a cow, etc.)
>>
>>
>> There is a practical issue you are overlooking.  It is very 
>> important  that we capture WITH Information for certain types of 
>> homology- or  similarity- based methods of inference.   So important 
>> that your  association will be tossed back by MIke if you don't 
>> provide this  information.
>> I would say this is necessary for:
>> 1) all pairwise sequence alignment methods
>> 2) all "curated ortholog" methods (sub set of 1, above)
>> 3) all protein-family assignment based methods (Pfam, SMART, ProDom)
>>
>> So, for the above, WITH information is mandatory.  For other methods  
>> it isn't.  It is much, much easier from a practical standpoint to  
>> mandate WITH evidence code X, rather than mandate WITH for some  
>> complicated subset of evidence code X.
>>
>> Should we take this to the evidence-code mailing list?
>> Ben
>> -- 
>> Ben Hitz
>> Senior Scientific Programmer ** Saccharomyces Genome Database ** GO  
>> Consortium
>> Stanford University ** hitz at genome.stanford.edu
>>
>>
>>




More information about the Go mailing list