[go] Putting method/program names into the with field for ISS

Benjamin Hitz hitz at genome.Stanford.EDU
Mon Oct 8 11:26:51 PDT 2007


It cannot be enforced by the filtering script unless all situations  
are accounted for.

Ben

On Oct 8, 2007, at 10:18 AM, Harold Drabkin wrote:

> Actually, the sometimes with-yes, sometimes with-no, has a  
> precedent: the use of IMP.
> When IMP is used for an RNAi, there is usually no "id" that can be  
> given.  Additionally, in some cases even at MGI, if an allele has  
> not been assigned, because it was never taken to generate a "whole"  
> mouse, but only ES cells, we would still give it an IMP but we  
> can't put an MGI id for an allele in the with field.
>
> hjd
>
> Michelle Gwinn Giglio wrote:
>>
>>
>> Hi Ben,
>>
>> The email you are replying to was written before the GOC meeting  
>> and I think that most of these issues were discussed at the GOC  
>> meeting and (I think) consensus was reached.
>>
>> As I understand the outcome of the discussion:
>> ISS stays as a parent to the new sequence-based codes which are:
>> ISA - inferred from sequence alignment
>> ISO - inferred from sequence orthology
>> ISM - inferred from sequence model (to include all things like  
>> HMMs, TMHMM, tRNASCAN, etc.)
>>
>> I think the agreement was that ISM should have values in the  
>> "with" field for things like HMMs, but did not requre "with" for  
>> things like tRNASCAN.  I agree that it is problematic to have a  
>> code that can have "with" sometimes but does not always require  
>> "with" - it makes it nearly impossible to do proper quality  
>> checks.  However, there did not seem to be an uproar at this idea  
>> and it seemed to fly ok.
>>
>> Please don't interpret the fact that I did not talk about "with"  
>> in my earlier email before the meeting as an indication that I  
>> don't value it - in fact I think it is very important to store  
>> "with" information.  And indeed I would like to include in the  
>> "with" field the method (like tRNASCAN) when a particular  
>> accession is not available.   I would be happy for the evidence  
>> code committee to discuss whether we should do this and ways of  
>> doing so (like Karen's suggestion).
>>
>> You raised the point of what to do with computational anaysis that  
>> is not sequence based and is not combinatorial.  This is not my  
>> area.  All the arguments I have made regarding RCA and its scope  
>> are on behalf of Karen who very much wants a code that captures  
>> the concept of integration of various forms of data into one  
>> result.  It was left open at the GOC meeting to create more  
>> granular instances of RCA and perhaps this would be one and  
>> combinatorial another - but I leave that debate to those who deal  
>> with these kinds of annotations on a regular basis.
>>
>> Finally, your point about sequence and structure being the same.   
>> I think it is possible for two proteins to form similar structures  
>> but to have quite different primary sequences.  Therefore, it  
>> would be possible to have the cystal structure of two proteins,  
>> observe that they are similar, and then conclude that they might  
>> share a similar function.  In fact there are 4 NIH-funded centers  
>> which are part of the Protein Structure Initiative which are  
>> devoted to producing large scale data sets of structures, many  
>> from proteins of unknown function.  So, although I have not  
>> personally done any annotation of this sort, I expect that  
>> comparisons of this kind will become more frequent as more  
>> structures are produced.
>>
>> Michelle
>>
>>
>>
>>
>> Ben Hitz wrote:
>>>>
>>>
>>> I guess, first point. - I don't think there is any need to  
>>> separate  "sequence" from "structure" since sequence is merely a  
>>> proxy for the  chemical structure, so I will use sequence to mean  
>>> both.  If anyone  has a contrary opinion, I would like to hear  
>>> it.    I just want to  make this disclaimer so we can use  
>>> "sequence" to mean both.
>>>
>>>> I think it is important to be able to distinguish methods that  
>>>> are  based on just sequence analysis from everything else and  
>>>> that ISS  should be the code to describe this.
>>>
>>>
>>> I don't necessarily disagree - but I don't take this as a given   
>>> either.  WHY is this important?   And why SOLEY sequence analysis  
>>> and  not partially sequence analysis?
>>>
>>> Here are some things I think might be important:
>>> o That the association is based on some computational theory, not  
>>> an  experiment (so it would not fall under the proposed EXP  
>>> hierarchy.
>>> o That in cases where an association is transferred from a  
>>> specific  gene product or family of gene products, that that the  
>>> "transferree"  is mentioned.
>>> o whether or not the association has been reviewed by a curator
>>> o whether or not the method has been reviewed by a curator (sub  
>>> case  if the above is not true)
>>> o whether or not this is a (computational) prediction based on   
>>> combining several sources of data (aka "Baysian Blah Blah Blah")
>>>
>>>> I think we need at least 2 categories:
>>>> -one for all things sequence based (ISS or whatever new name  
>>>> might  be created)
>>>> -one for combinatorial analyses that bring together different  
>>>> types  of information to reach a conclusion (ICA/RCA)
>>>
>>>
>>> You are not accounting for some other non-sequence, non- 
>>> combinatorial  analysis.  For example - there are many algorithms  
>>> that infer  biological process from pattern of physical  
>>> interactions - while this  seems to me be your 2nd class (Non- 
>>> sequence), it's only based on 1  source of data.
>>>
>>>>
>>>> If people feel there should be a code for alignments and only   
>>>> alignments then we will need to split the sequence-based  
>>>> category  into 2 which would then give us 3 total:
>>>> -orthology based evidence
>>>> -all other sequence based evidence
>>>> -combinatorial analyses that bring together different types of   
>>>> information to reach a conclusion (ICA/RCA)
>>>>
>>>> I favor the first option (2 categories, not 3) as I think it is   
>>>> cleaner and easier for people to understand.  If we feel the  
>>>> need  to change the name of ISS to reflect this more  
>>>> encompassing  definition, then OK, but that brings another whole  
>>>> can of worms  with it (what about legacy data, will the  
>>>> community have a cow, etc.)
>>>
>>>
>>> There is a practical issue you are overlooking.  It is very  
>>> important  that we capture WITH Information for certain types of  
>>> homology- or  similarity- based methods of inference.   So  
>>> important that your  association will be tossed back by MIke if  
>>> you don't provide this  information.
>>> I would say this is necessary for:
>>> 1) all pairwise sequence alignment methods
>>> 2) all "curated ortholog" methods (sub set of 1, above)
>>> 3) all protein-family assignment based methods (Pfam, SMART, ProDom)
>>>
>>> So, for the above, WITH information is mandatory.  For other  
>>> methods  it isn't.  It is much, much easier from a practical  
>>> standpoint to  mandate WITH evidence code X, rather than mandate  
>>> WITH for some  complicated subset of evidence code X.
>>>
>>> Should we take this to the evidence-code mailing list?
>>> Ben
>>> -- 
>>> Ben Hitz
>>> Senior Scientific Programmer ** Saccharomyces Genome Database **  
>>> GO  Consortium
>>> Stanford University ** hitz at genome.stanford.edu
>>>
>>>
>>>

--
Ben Hitz
Senior Scientific Programmer ** Saccharomyces Genome Database ** GO  
Consortium
Stanford University ** hitz at genome.stanford.edu






More information about the Go mailing list