Wednesday, May 31, 2006

Truncating strings using XSL

I want to display a RSS feed from Connotea for papers I've tagged with "Formicidae". The titles of these papers can be long, e.g.:
Dracula ant phylogeny as inferred by nuclear 28S rDNA sequences and implications for ant systematics (Hymenoptera: Formicidae: Amblyoponinae)

This takes up too much space when displayed in a panel on the web page, so I want to truncate the titles in a sensible way.

Solution has a nice solution that truncates to a fixed length, optionally at a word boundary. Our verbose title becomes:
Dracula ant phylogeny as inferred by nuclear 28S...

Much nicer.

(Via Google: xsl truncate string.)

Currently playing in iTunes: Dani California by Red Hot Chili Peppers

Sunday, May 28, 2006

Australian ants online

The Australia National Insect Collection has an online resource of Australian ants that provides species maps and Google Earth files of specimen distributions.

A potentially useful source of data, but the KML files don't include specimen codes, and there is an unfortunate disconnect between specimen codes and ids used to link to them. For example, in the specimen list for Myrmecia midas, the first specimen has a "MaterialID" of 128315, yet the ANIC accession code for this specimen is 32-012805. Pity this identifier is not made of in the link to the data.

The site also displays specimen metadata as an HTML table -- this information is also available as a CSV file. This is an example where a touch more effort would make it truly useful as a source of data. Something as simple as RDF with the specimen URL as URI would be a big help, but this could be achieved by some judicious screen scraping...

Just realised, the data is served up by GBIF, which makes life much easier.

Tuesday, May 09, 2006


Semantic Ants (geddit?).

Initially I blogged this project on iPhylo, such as an introduction to populating the triple store, and thoughts on how to automatically add to its contents.

As the project develops I hope to describe further data sources, possible queries, user interface design, and benchmarks.