[gofriends] GO ontology in OWL format
Chris Mungall
cjm at fruitfly.org
Wed Sep 12 10:20:21 PDT 2007
On Sep 12, 2007, at 4:16 PM, markov at mpiz-koeln.mpg.de wrote:
> Hi list,
>
> I'm new here and hope this is the appropriate list for this
> question. If
> you know of a better list for it, please tell me.
Hi Maria
We have a mail list, go-help at genome.stanford.edu, for questions about
the GO and the resources it produces. However, I will keep this
discussion on gofriends since others may be interested.
> I am trying to load the GO ontology, including annotations (synonyms,
> etc.) as well as is_a and part_of relationships, into a Sesame 2.0
> repository. Seems that it should be a common enough task to have an
> easy
> solution -- perhaps somebody has done this successfully?
We have successfully loaded all of OBO into Sesame 1.2.6, but haven't
tried with 2.0 yet
> Anyway, here's
> what I've tried.
>
> To get the data into the Sesame repository, I am using the
> (RDFFormat.)RDF-XML format choice, since the other options are N3,
> Turtle,
> TRIG, TRIX, N Triples (which I'm not too familiar with myself but
> assume
> GO comes in none of these).
>
> If I use the GO .rdf-xml files from their download page,
> ironically, I get
> errors that disappear when using the .owl files. Namely, I get
> "<rdf:RDF>
> is not allowed as a property element", and if I replace this tag in
> the
> data file with the OWL equivalent, <owl:OWL> or whatever, I get
> further
> problems.
Note that owl:OWL isn't part of the OWL language.
There are a few issues with the GO RDF-XML format due to its age (it
predates OWL considerably). It's more of a pseudo-RDF format, you
have to strip the surrounding XML tags; see
http://wiki.geneontology.org/index.php/GO_FAQ#Why_won.27t_the_RDF-
XML_file_parse_using_RDF_parsers.3F
I would recommend you use OWL here, with caveats (see below)
> The .owl termdb file on the ftp download
> (ftp://ftp.geneontology.org/pub/go/godatabase/archive/) works (or
> rather
> worked; the archive put up on Sept 11 2007 gives unzip errors,
> something
> like "can't find end of file").
The Sept 12 file validates in Pellet. Perhaps this was a download error?
> However, this file does not contain any
> is_a or part_of, just the annotations. There is no other relevant .owl
> file.
is_a is mapped to owl:subClassOf
part_of links are mapped to existential restrictions
For example:
<rdfs:subClassOf rdf:resource="http://purl.org/obo/owl/
GO#GO_0000018"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty>
<owl:ObjectProperty rdf:about="http://purl.org/obo/owl/
obo#part_of"/>
</owl:onProperty>
<owl:someValuesFrom rdf:resource="http://purl.org/obo/owl/
GO#GO_0006312"/>
</owl:Restriction>
</rdfs:subClassOf>
> The .obo termdb file, as well as the .rdf-xml termdb file, include the
> is_a and part_of. Therefore I tried to use the obo2owl perl thingy to
> conert the .obo into an owl file. However I get "format problem
> detected"
> about 62353 lines into the file, on some very long GO term name.
Send me the full .obo file and error report off-list and I'll
investigate
There is a new release of go-perl on CPAN with the latest version of
the obo2owl xslt included
> Finally I downloaded OBO-Edit and tried to use the OWL plugin
> (mentioned
> here:
> http://www.bioontology.org/wiki/index.php/
> OboInOwl:Main_Page#OboEdit_OWL_plugin)
> to convert obo to owl. However, putting the plugin files into the
> extensions directory (as per the instructions) resulted in a series of
> NullPointerErrors that prevented OBO-Edit from starting up. Without
> the
> plugin, OBO-Edit works fine.
I'll investigate this off-list - email me the version of oboedit
you're using.
Let me also give you some information on the results of our
experiments using OWL and Sesame, as part of a different project
outside GO. Triplestores such as Sesame seem to work best when you
are storing instances rather than classes and class level relations.
Sesame has no knowledge of OWL entailment rules (there is an addon
called OWLIM that provides this). This means you are out of luck if
you want to make queries over the GO such as "what is the nuclear
chromosome part of?".
You may be tempted to treat GO classes as instances and query these
using normal RDFS semantics (for example, by loading the semi-
deprecated GO RDF-XML file). This will give you better results in the
short term, as you will be able to ask queries such as the part_of
one above. However, this approach may turn out to be a dead end in
the long run.
In my opinion SQL (with entailments pre-computed) still turns out to
be a far superior choice to existing semantic web technology - I'd be
interested to see the SPARQL equivalents of the following queries:
http://wiki.geneontology.org/index.php/Example_Queries
I'd like to continue this discussion with you, but right now we are
outside the realm of anything that is specifically to do with GO and
into the realm of semantic web technology.
I would encourage you to subscribe to this list and post details of
what you are doing there:
http://lists.w3.org/Archives/Public/public-semweb-lifesci/
There are many people on this list engaged in this kind of discussion
Cheers
Chris
> Cheers,
>
> Maria Markov
>
>
>
> --
> This message is from the GOFriends moderated mailing list. A list
> of public
> announcements and discussion of the Gene Ontology (GO) project.
> Problems with the list? E-mail: owner-
> gofriends at geneontology.org
> Subscribing send "subscribe" to gofriends-
> request at geneontology.org
> Unsubscribing send "unsubscribe" to gofriends-
> request at geneontology.org
> Web: http://www.geneontology.org/
>
--
This message is from the GOFriends moderated mailing list. A list of public
announcements and discussion of the Gene Ontology (GO) project.
Problems with the list? E-mail: owner-gofriends at geneontology.org
Subscribing send "subscribe" to gofriends-request at geneontology.org
Unsubscribing send "unsubscribe" to gofriends-request at geneontology.org
Web: http://www.geneontology.org/
More information about the Gofriends
mailing list