[Gofriends] matrices from terms
Gabriel Berriz
gberriz at hms.harvard.edu
Tue Nov 18 06:23:00 PST 2008
On 2008.11.14 Fri, at 14:06, Chris Mungall wrote:
>
> The generally recommended way to do this is via the graph_path table
> in the GO database. You can query either a local installation, or a
> public mirror, via
> http://berkeleyop.org/goose. The documentation for this table is here:
> http://www.geneontology.org/GO.database.schema.shtml#go-optimisations.table.graph-path
>
> However, at this time this table contains the transitive closure
> computed by considering all edges, ignoring edge labels (relations).
> Thus it does not satisfy your requirement that only the is_a relation
> is considered
Chris, couldn't one get the information Nicholas wants directly from
the table term2term in the GO database? Unlike the graph_path table,
this table does provide edge labels. One could use a recursive
descendants function together with a children function that returns
only is_a children (from term2term). In Perl-ish it would look like
this:
sub descendants {
my $node = shift;
my @descendants = ( $node );
for my $child ( children( $node ) ) {
push @descendants, descendants( $child );
}
return @descendants;
}
In English, the descendants of a node is the union of the node's
children's descendants, plus the node itself. (Here we follow the
convention, also followed by the graph_path table, of including the
node among its descendants.)
This is just a sketch. As written it returns a list of descendants
that will generally contain duplicates. It would be better if these
duplicates were removed. Also, its performance can be optimized
significantly by avoiding unnecessary recursion through some form of
memoization.
The children function used by the descendants function would be
implemented using something like the following SQL query:
SELECT term2_id FROM term2term tt, term t WHERE
tt.relationship_type_id = t.id AND t.acc = 'is_a' AND tt.term1_id = ?;
Gabriel
=============================================================
Gabriel F. Berriz, PhD
Bioinformatics Developer
Roth Lab
Biological Chemistry and Molecular Pharmacology -- Harvard Medical
School
Seeley G. Mudd Building 322B
Boston, MA 02115-5701
Telephone: 617.432.3555
Fax: 617.432.3557
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://fafner.stanford.edu/pipermail/gofriends/attachments/20081118/4ecf3056/attachment-0001.html>
More information about the Gofriends
mailing list