[go] Putting method/program names into the with field for ISS

Michelle Gwinn Giglio mlgwinn at tigr.ORG
Mon Oct 1 10:22:56 PDT 2007



Hi Ben,

The email you are replying to was written before the GOC meeting and I think that most of these 
issues were discussed at the GOC meeting and (I think) consensus was reached.

As I understand the outcome of the discussion:
ISS stays as a parent to the new sequence-based codes which are:
ISA - inferred from sequence alignment
ISO - inferred from sequence orthology
ISM - inferred from sequence model (to include all things like HMMs, TMHMM, tRNASCAN, etc.)

I think the agreement was that ISM should have values in the "with" field for things like HMMs, but 
did not requre "with" for things like tRNASCAN.  I agree that it is problematic to have a code that 
can have "with" sometimes but does not always require "with" - it makes it nearly impossible to do 
proper quality checks.  However, there did not seem to be an uproar at this idea and it seemed to 
fly ok.

Please don't interpret the fact that I did not talk about "with" in my earlier email before the 
meeting as an indication that I don't value it - in fact I think it is very important to store 
"with" information.  And indeed I would like to include in the "with" field the method (like 
tRNASCAN) when a particular accession is not available.   I would be happy for the evidence code 
committee to discuss whether we should do this and ways of doing so (like Karen's suggestion).

You raised the point of what to do with computational anaysis that is not sequence based and is not 
combinatorial.  This is not my area.  All the arguments I have made regarding RCA and its scope are 
on behalf of Karen who very much wants a code that captures the concept of integration of various 
forms of data into one result.  It was left open at the GOC meeting to create more granular 
instances of RCA and perhaps this would be one and combinatorial another - but I leave that debate 
to those who deal with these kinds of annotations on a regular basis.

Finally, your point about sequence and structure being the same.  I think it is possible for two 
proteins to form similar structures but to have quite different primary sequences.  Therefore, it 
would be possible to have the cystal structure of two proteins, observe that they are similar, and 
then conclude that they might share a similar function.  In fact there are 4 NIH-funded centers 
which are part of the Protein Structure Initiative which are devoted to producing large scale data 
sets of structures, many from proteins of unknown function.  So, although I have not personally done 
any annotation of this sort, I expect that comparisons of this kind will become more frequent as 
more structures are produced.

Michelle




Ben Hitz wrote:
>>
> 
> I guess, first point. - I don't think there is any need to separate  
> "sequence" from "structure" since sequence is merely a proxy for the  
> chemical structure, so I will use sequence to mean both.  If anyone  has 
> a contrary opinion, I would like to hear it.    I just want to  make 
> this disclaimer so we can use "sequence" to mean both.
> 
>> I think it is important to be able to distinguish methods that are  
>> based on just sequence analysis from everything else and that ISS  
>> should be the code to describe this.
> 
> 
> I don't necessarily disagree - but I don't take this as a given  
> either.  WHY is this important?   And why SOLEY sequence analysis and  
> not partially sequence analysis?
> 
> Here are some things I think might be important:
> o That the association is based on some computational theory, not an  
> experiment (so it would not fall under the proposed EXP hierarchy.
> o That in cases where an association is transferred from a specific  
> gene product or family of gene products, that that the "transferree"  is 
> mentioned.
> o whether or not the association has been reviewed by a curator
> o whether or not the method has been reviewed by a curator (sub case  if 
> the above is not true)
> o whether or not this is a (computational) prediction based on  
> combining several sources of data (aka "Baysian Blah Blah Blah")
> 
>> I think we need at least 2 categories:
>> -one for all things sequence based (ISS or whatever new name might  be 
>> created)
>> -one for combinatorial analyses that bring together different types  
>> of information to reach a conclusion (ICA/RCA)
> 
> 
> You are not accounting for some other non-sequence, non-combinatorial  
> analysis.  For example - there are many algorithms that infer  
> biological process from pattern of physical interactions - while this  
> seems to me be your 2nd class (Non-sequence), it's only based on 1  
> source of data.
> 
>>
>> If people feel there should be a code for alignments and only  
>> alignments then we will need to split the sequence-based category  
>> into 2 which would then give us 3 total:
>> -orthology based evidence
>> -all other sequence based evidence
>> -combinatorial analyses that bring together different types of  
>> information to reach a conclusion (ICA/RCA)
>>
>> I favor the first option (2 categories, not 3) as I think it is  
>> cleaner and easier for people to understand.  If we feel the need  to 
>> change the name of ISS to reflect this more encompassing  definition, 
>> then OK, but that brings another whole can of worms  with it (what 
>> about legacy data, will the community have a cow, etc.)
> 
> 
> There is a practical issue you are overlooking.  It is very important  
> that we capture WITH Information for certain types of homology- or  
> similarity- based methods of inference.   So important that your  
> association will be tossed back by MIke if you don't provide this  
> information.
> I would say this is necessary for:
> 1) all pairwise sequence alignment methods
> 2) all "curated ortholog" methods (sub set of 1, above)
> 3) all protein-family assignment based methods (Pfam, SMART, ProDom)
> 
> So, for the above, WITH information is mandatory.  For other methods  it 
> isn't.  It is much, much easier from a practical standpoint to  mandate 
> WITH evidence code X, rather than mandate WITH for some  complicated 
> subset of evidence code X.
> 
> Should we take this to the evidence-code mailing list?
> Ben
> -- 
> Ben Hitz
> Senior Scientific Programmer ** Saccharomyces Genome Database ** GO  
> Consortium
> Stanford University ** hitz at genome.stanford.edu
> 
> 
> 



More information about the Go mailing list