[go] Putting method/program names into the with field for ISS
Benjamin Hitz
hitz at genome.Stanford.EDU
Mon Oct 8 11:26:51 PDT 2007
It cannot be enforced by the filtering script unless all situations
are accounted for.
Ben
On Oct 8, 2007, at 10:18 AM, Harold Drabkin wrote:
> Actually, the sometimes with-yes, sometimes with-no, has a
> precedent: the use of IMP.
> When IMP is used for an RNAi, there is usually no "id" that can be
> given. Additionally, in some cases even at MGI, if an allele has
> not been assigned, because it was never taken to generate a "whole"
> mouse, but only ES cells, we would still give it an IMP but we
> can't put an MGI id for an allele in the with field.
>
> hjd
>
> Michelle Gwinn Giglio wrote:
>>
>>
>> Hi Ben,
>>
>> The email you are replying to was written before the GOC meeting
>> and I think that most of these issues were discussed at the GOC
>> meeting and (I think) consensus was reached.
>>
>> As I understand the outcome of the discussion:
>> ISS stays as a parent to the new sequence-based codes which are:
>> ISA - inferred from sequence alignment
>> ISO - inferred from sequence orthology
>> ISM - inferred from sequence model (to include all things like
>> HMMs, TMHMM, tRNASCAN, etc.)
>>
>> I think the agreement was that ISM should have values in the
>> "with" field for things like HMMs, but did not requre "with" for
>> things like tRNASCAN. I agree that it is problematic to have a
>> code that can have "with" sometimes but does not always require
>> "with" - it makes it nearly impossible to do proper quality
>> checks. However, there did not seem to be an uproar at this idea
>> and it seemed to fly ok.
>>
>> Please don't interpret the fact that I did not talk about "with"
>> in my earlier email before the meeting as an indication that I
>> don't value it - in fact I think it is very important to store
>> "with" information. And indeed I would like to include in the
>> "with" field the method (like tRNASCAN) when a particular
>> accession is not available. I would be happy for the evidence
>> code committee to discuss whether we should do this and ways of
>> doing so (like Karen's suggestion).
>>
>> You raised the point of what to do with computational anaysis that
>> is not sequence based and is not combinatorial. This is not my
>> area. All the arguments I have made regarding RCA and its scope
>> are on behalf of Karen who very much wants a code that captures
>> the concept of integration of various forms of data into one
>> result. It was left open at the GOC meeting to create more
>> granular instances of RCA and perhaps this would be one and
>> combinatorial another - but I leave that debate to those who deal
>> with these kinds of annotations on a regular basis.
>>
>> Finally, your point about sequence and structure being the same.
>> I think it is possible for two proteins to form similar structures
>> but to have quite different primary sequences. Therefore, it
>> would be possible to have the cystal structure of two proteins,
>> observe that they are similar, and then conclude that they might
>> share a similar function. In fact there are 4 NIH-funded centers
>> which are part of the Protein Structure Initiative which are
>> devoted to producing large scale data sets of structures, many
>> from proteins of unknown function. So, although I have not
>> personally done any annotation of this sort, I expect that
>> comparisons of this kind will become more frequent as more
>> structures are produced.
>>
>> Michelle
>>
>>
>>
>>
>> Ben Hitz wrote:
>>>>
>>>
>>> I guess, first point. - I don't think there is any need to
>>> separate "sequence" from "structure" since sequence is merely a
>>> proxy for the chemical structure, so I will use sequence to mean
>>> both. If anyone has a contrary opinion, I would like to hear
>>> it. I just want to make this disclaimer so we can use
>>> "sequence" to mean both.
>>>
>>>> I think it is important to be able to distinguish methods that
>>>> are based on just sequence analysis from everything else and
>>>> that ISS should be the code to describe this.
>>>
>>>
>>> I don't necessarily disagree - but I don't take this as a given
>>> either. WHY is this important? And why SOLEY sequence analysis
>>> and not partially sequence analysis?
>>>
>>> Here are some things I think might be important:
>>> o That the association is based on some computational theory, not
>>> an experiment (so it would not fall under the proposed EXP
>>> hierarchy.
>>> o That in cases where an association is transferred from a
>>> specific gene product or family of gene products, that that the
>>> "transferree" is mentioned.
>>> o whether or not the association has been reviewed by a curator
>>> o whether or not the method has been reviewed by a curator (sub
>>> case if the above is not true)
>>> o whether or not this is a (computational) prediction based on
>>> combining several sources of data (aka "Baysian Blah Blah Blah")
>>>
>>>> I think we need at least 2 categories:
>>>> -one for all things sequence based (ISS or whatever new name
>>>> might be created)
>>>> -one for combinatorial analyses that bring together different
>>>> types of information to reach a conclusion (ICA/RCA)
>>>
>>>
>>> You are not accounting for some other non-sequence, non-
>>> combinatorial analysis. For example - there are many algorithms
>>> that infer biological process from pattern of physical
>>> interactions - while this seems to me be your 2nd class (Non-
>>> sequence), it's only based on 1 source of data.
>>>
>>>>
>>>> If people feel there should be a code for alignments and only
>>>> alignments then we will need to split the sequence-based
>>>> category into 2 which would then give us 3 total:
>>>> -orthology based evidence
>>>> -all other sequence based evidence
>>>> -combinatorial analyses that bring together different types of
>>>> information to reach a conclusion (ICA/RCA)
>>>>
>>>> I favor the first option (2 categories, not 3) as I think it is
>>>> cleaner and easier for people to understand. If we feel the
>>>> need to change the name of ISS to reflect this more
>>>> encompassing definition, then OK, but that brings another whole
>>>> can of worms with it (what about legacy data, will the
>>>> community have a cow, etc.)
>>>
>>>
>>> There is a practical issue you are overlooking. It is very
>>> important that we capture WITH Information for certain types of
>>> homology- or similarity- based methods of inference. So
>>> important that your association will be tossed back by MIke if
>>> you don't provide this information.
>>> I would say this is necessary for:
>>> 1) all pairwise sequence alignment methods
>>> 2) all "curated ortholog" methods (sub set of 1, above)
>>> 3) all protein-family assignment based methods (Pfam, SMART, ProDom)
>>>
>>> So, for the above, WITH information is mandatory. For other
>>> methods it isn't. It is much, much easier from a practical
>>> standpoint to mandate WITH evidence code X, rather than mandate
>>> WITH for some complicated subset of evidence code X.
>>>
>>> Should we take this to the evidence-code mailing list?
>>> Ben
>>> --
>>> Ben Hitz
>>> Senior Scientific Programmer ** Saccharomyces Genome Database **
>>> GO Consortium
>>> Stanford University ** hitz at genome.stanford.edu
>>>
>>>
>>>
--
Ben Hitz
Senior Scientific Programmer ** Saccharomyces Genome Database ** GO
Consortium
Stanford University ** hitz at genome.stanford.edu
More information about the Go
mailing list