SemAnt

Thursday, March 22, 2007

bioguid.info


As announced on nodalpoint, I've put together a web site called bioguid.info which, rather grandly, is an attempt to bootstrap the biodiversity Semantic Web by providing resolvable URIs for biological objects, such as publications, taxonomic names, nucleotide sequences, and specimens. These URIs (or "GUIDs") can be resolved by a web browser to display HTML, but under the hood are resolved to RDF (which you can see by viewing the source of the web page you get for a URI).

Tuesday, March 13, 2007

More 3store3

Had a little fun with 3store3 on Fedora Core 4, which is what I run on my servers. 3store3 builds easily on Fedora (in constrast to the fun had building 3store3 on a Mac. However, any SPARQL query beyond a simple listing of all triples just bailed out with scary errors. For example:

ts-explain "PREFIX dc:
SELECT DISTINCT ?s ?o WHERE { ?s
dc:title ?o . } LIMIT 10"
SQL error 2013: Lost connection to MySQL server during query at util.c:55
SQL error 2002: Can't connect to local MySQL server through socket
'/var/lib/mysql/mysql.sock' (111) at query.c:316
Warning cannot calculate complexity
Complexity: 0
SELECT DISTINCT v0.lexical AS `s`, v1.lexical AS `o`, v1.datatype AS
o_dt, v1.language AS o_lang
FROM (SELECT DISTINCT t0.object AS `o`, t0.subject AS `s`
FROM triples t0
WHERE t0.predicate=-8024650864867163606
LIMIT 10) AS `tmp0_202a`, symbols v0, symbols v1
WHERE tmp0_202a.s=v0.hash && tmp0_202a.o=v1.hash
LIMIT 10;

After much cursing, rebuilding,trying different versions of 3store3 and the Redland libraries, gave up and contemplated a crude hack. The SQL query itself works fine in MySQL, but not when 3store3 tries to call MySQL. But, before writing a crude hack I mailed the 3store3 mailing list this morning, and by 9:30 the same morning Steve Harris had given me the solution. I was running MySQL version 4.1.11, and according to Steve there's a bug in early verisons of MySQL 4. He recommended upgrading, so off to do yum install mysql. Sure enough, I get MySQL 4.1.20, and everything works. Yay!

Tuesday, March 06, 2007

Bio2RDF


Stumbled across Bio2RDF by François Belleau. Here's the blurb:
It is now possible to query Bio2RDF triple store using Sesame's SeRQL query engine. The actual triple store contains all annotations about human and mouse from UniProt, Affymetrix and GeneID. It also contains all GO term definitions and OMIM disease description. It contains 50 millions triples and the native RDF store in sesame weigths 3 Go. Watch for the speed of it ! Thanks for the Sesame team for their great work.

There's a blog, and a SourceForge project. The Bio2RDF website itself displays RDF as tables. It doesn't have a query interface but it's a bit hidden. Try here, based on this blog post.
It uses Seasame, JSP, and SeRQL -- not my favourite technologies -- but is an example of people thinking about RDF and triple stores, and actually making stuff. Must polish up my creaking ant demo and play some more with this.
On the subject of ants, Terry Catapano has been playing with Simile, and come up with this demo.

Labels: , , , , ,

Friday, February 23, 2007

303 and concept URIs - towards GUIDs for all

I've bookmarked some stuff relevant to the issue of what HTTP URIs identify. This concerns whether a URI identifiers something in the real world (or a concept), or a document. The "303" refers to the HTTP code returned if the URI is a concept, not a document. The whole topic is a mess, but I need to get to grips with it as I'm working on a service to assign URIs to just about any biologically interesting object. Sigh.

Wednesday, February 14, 2007

Harvesting handles


Finally discovered how to get metadata from DSpace-hosted items, such as the AMNH's Scientific Publications. They have an OAI interface, so that the URL http://digitallibrary.amnh.org/dspace-oai/request?
verb=GetRecord
&metadataPrefix=oai_dc
&identifier=oai:digitallibrary.amnh.org:2246/1999
will retrieve XML metadata for the record hdl:2246/1999 -- in this case, number 2578 of the American Museum novitates.

Oh, and the Simile project already has a OAI to RDF XSLT stylesheet (found courtesy of Leigh Dodds OAI bookmarks on del.icio.us).

Almost forgot, I stumbled across this information at Celestial.

Tuesday, January 30, 2007

ANeT - an invisible ant resource


ANeT is a network to promote ant reserach in Asia. It has some interesting things, although it's a shocking example of poor design. Almost all the text on the home page is not plain text but text written in a GIF image, which means search engines like Google will have a tough time indexing the page, which in turn means it will be hard to find.

And if you can't be found by Google, you don't exist...

Wednesday, January 24, 2007

Searching Hymenoptera Name Server literature

These are some notes on efforts to make the Hymenoptera Name Server literature data base searchable. This work builds on LSID stuff I did earlier, and is also a response to the TAXACOM thread started by Roger Hyam. Donat Agosti has also been requesting something along these lines.

The first step is to suck all the records off HNS, and convert them to RIS format. I then want to import that in to an instance of MyPHPBib (an old project of mine languishing on SourceForge), which gives me a MySQL database of the literature to play with. What I'd like is an OpenURL style search interface that can be used to return records matching a user query.

Notes to self. Character encoding is a major, major pain. I'm running the script on a Fedora Core 4 box as Mac OS X drove me nuts, I'm ensuring that the XML style sheet outputs ISO-8859-1 encoding (to match that returned by HNS), and I set the Terminal character encoding to Western(ISO-8859-1) as well.