[go] Putting method/program names into the with field for ISS

Ben Hitz hitz at genome.Stanford.EDU
Fri Sep 28 14:32:19 PDT 2007


>

I guess, first point. - I don't think there is any need to separate  
"sequence" from "structure" since sequence is merely a proxy for the  
chemical structure, so I will use sequence to mean both.  If anyone  
has a contrary opinion, I would like to hear it.    I just want to  
make this disclaimer so we can use "sequence" to mean both.

> I think it is important to be able to distinguish methods that are  
> based on just sequence analysis from everything else and that ISS  
> should be the code to describe this.

I don't necessarily disagree - but I don't take this as a given  
either.  WHY is this important?   And why SOLEY sequence analysis and  
not partially sequence analysis?

Here are some things I think might be important:
o That the association is based on some computational theory, not an  
experiment (so it would not fall under the proposed EXP hierarchy.
o That in cases where an association is transferred from a specific  
gene product or family of gene products, that that the "transferree"  
is mentioned.
o whether or not the association has been reviewed by a curator
o whether or not the method has been reviewed by a curator (sub case  
if the above is not true)
o whether or not this is a (computational) prediction based on  
combining several sources of data (aka "Baysian Blah Blah Blah")

> I think we need at least 2 categories:
> -one for all things sequence based (ISS or whatever new name might  
> be created)
> -one for combinatorial analyses that bring together different types  
> of information to reach a conclusion (ICA/RCA)

You are not accounting for some other non-sequence, non-combinatorial  
analysis.  For example - there are many algorithms that infer  
biological process from pattern of physical interactions - while this  
seems to me be your 2nd class (Non-sequence), it's only based on 1  
source of data.
>
> If people feel there should be a code for alignments and only  
> alignments then we will need to split the sequence-based category  
> into 2 which would then give us 3 total:
> -orthology based evidence
> -all other sequence based evidence
> -combinatorial analyses that bring together different types of  
> information to reach a conclusion (ICA/RCA)
>
> I favor the first option (2 categories, not 3) as I think it is  
> cleaner and easier for people to understand.  If we feel the need  
> to change the name of ISS to reflect this more encompassing  
> definition, then OK, but that brings another whole can of worms  
> with it (what about legacy data, will the community have a cow, etc.)

There is a practical issue you are overlooking.  It is very important  
that we capture WITH Information for certain types of homology- or  
similarity- based methods of inference.   So important that your  
association will be tossed back by MIke if you don't provide this  
information.
I would say this is necessary for:
1) all pairwise sequence alignment methods
2) all "curated ortholog" methods (sub set of 1, above)
3) all protein-family assignment based methods (Pfam, SMART, ProDom)

So, for the above, WITH information is mandatory.  For other methods  
it isn't.  It is much, much easier from a practical standpoint to  
mandate WITH evidence code X, rather than mandate WITH for some  
complicated subset of evidence code X.

Should we take this to the evidence-code mailing list?
Ben
--
Ben Hitz
Senior Scientific Programmer ** Saccharomyces Genome Database ** GO  
Consortium
Stanford University ** hitz at genome.stanford.edu






More information about the Go mailing list