I've been playing with DiGIR providers, retrieving records to be massaged into RDF as part of a project to aggregate specimens, sequences, publications, and phylogenies. DiGIR was a major advance on what went before (i.e., basically nothing), but in addition to variations in the schema, and the fact that a good portion of the providers are off line at any one time (see Perils of Federation), I'm coming up against the wide range of ways people have of writing dates.
Ideally, I'd like dates in the variation of the ISO 8601 YYYY-MM-DD format described by the W3C W3C, and recommended by Dublin Core. What we can get in DiGIR records is all manner of formats, such as (with specimen codes following)
Idylly Googling, I stumbled across The Knowledge Bank at OSU, a DSpace server at Ohio State University. A number of publications on ants are listed there, complete with PDFs. For example, "The Mating Activities of the Ant Myrmica Americana Weber" by Kannowski and Kannowski, published in The Ohio Journal of Science in 1957. This paper has a GUID in the form of a Handle (hdl:1811/4489). What I find interesting is that digitisation efforts by libraries are putting biodiversity literature online as part of a broader effort (i.e., we get these papers, and a GUID, "for free"). This also raises issues about duplication of effort — clearly, if a library has put a PDF online, we don't want to duplicate this. Hence, we need a simple way of finding whether a paper has already been digitised. Google Scholar may be useful for this, although in this case Google finds the paper but Google Scholar doesn't.
One reason why I've put off adding specimen maps to iSpecies, despite repeated requests, is that Google Maps (my preferred mapping tool) is slow if you have lots of specimen records. I've played with some other tools, notably Map Bureau's Flash-based pointMapper, but what I'd really like is a quick and simple way to display a bunch of specimen records. Because the same issue comes up with SemAnt, I thought it's time to do something about it. I stress that I really like Google Maps, but for some purposes it's overkill. Furthermore, loading hundreds of points will take too long.
So, the idea is to take georeferenced specimen records and put them on a map of the world. Slowly it dawned on me that this was trivially easy. Firstly, take a map of the world drawn using the equirectangular (or plate carrée) projection (Wikipedia provided the example below).
This projection a simple connection between geographic location and pixel position. For example, if the map is scaled to 180 pixels high and 360 pixels wide, then you have a 1 pixel/degree grid. Hence, plotting localities is no harder than plotting a X-Y scatter plot.
Now, all I need to do is take a SPARQL result with latitude and longitudes and draw the localities on this map. One way to do this is to draw the points using SVG, so I can use a XSL transformation to generate the map. If I wanted to support zooming then ideally I'd have the map itself in SVG, but I just want a small world map, so I "cheat" and use a bitmap as the base map. This can be included like this:
The trick is to convert latitude and longitude to coordinates on the bitmap. For example, specimen casent0008682-d03 of Melissotarsus emeryi was collected from 31°58'0'' S, 18°51'0'' E, which in decimal values is latitude -31.966667, longitude 18.85. Now, how do I convert these values into a location on a 360 × 180 image? In SVG the coordinates grow from the upper left, whereas on the map shown above 0,0 is in the centre, such that southern latitudes are negative, as are western longitudes. We can use a transform to move the origin of the x- and y-axes to the left 180 pixels, and down 90 pixels, so that the origin of the graph is the intersection of the equator and Greenwich Meridian. We also have to invert the y-axis because in SVG it goes from top to bottom. This diagram shows the difference between SVG and geographical coordinates:
One thing which drove me nuts for a while was that the SVG rendered fine in Safari using Adobe's plugin, but not in Camino, which uses the same rendering engine as Mozilla. Turns out Camino needs http://www.w3.org/2000/svg to be the default namespace, so xmlns="http://www.w3.org/2000/svg" is fine, but it barfs over xmlns:svg="http://www.w3.org/2000/svg". Sigh.
Here is an example SVG file rendered using the XSLT style sheet, but using a different background map, showing the distribution of the ant Azteca constructor. The source SVG is here.
Dave Vieglais gave what looks like an interesting presentation at TDWG 2006. BigDig monitors the status of DiGIR providers that serve museum specimen records. I've not managed to get the map background to appear, but here's a snapshot of the geographical distribution of providers, and their status:
The happy faces are DiGIR providers that are live, the sad faces are not responding. What is interesting that a fair chunk are offline, of the 180 registered providers, 25 have never responded, and there are something like 17 variations of the DiGIR schema out there.
It's a little scary that so many providers are offline, and that they differ in the format of the messages they accept and return. For federated searches that are "live," this spells disaster. His presentation is here (rather unhelpfully in Open Office format, so I've made a PDF).