Thursday, August 31, 2006

LSIDs and DOIs for ant (and other hymenopteran) literature

I've now got LSIDs for literature served by the Hymenoptera Name Server and Antbase. As an added bonus, I serve DOIs where they exist. The RDF metadata for the LSIDs are generated on the fly by querying the Hymenoptera Name Server using a simple XML service Norm Johnson provided. Given a URL of the form the service returns a XML document for the corresponding reference. I simply transform this into RDF.
However, as an added feature, if the publication is an article I use Crossref's OpenURL resolver to see if a DOI exists for the publication -- if so, this gets added to the RDF as a <dc:identifier> tag.
Like most other LSIDs that I serve, the service is slower than it could be because everything is done on the fly, and in this case two calls may be needed to generate the metadata (see diagram below). The authority is, and the namespace is pub. Here is an example LSID, which you can either click on in your browser ( or using Launchpad (

One encouraging observation is that DOIs are not restricted to recent literature. For example, dates from 1952, but has a DOI (doi:10.2307/2422200).

Monday, August 28, 2006


Triple stores containing specimens, sequences, images, literature, etc. are all very well, but there is a lot of information that is not captured by such a system. For human users (as opposed to dumb computers), often a simple summary is more informative than a set of images and a map, especially if that summary mentions something interesting, such as why the "Google ant" was so named, or that the trap-jaw ant Odontomachus bauri has incredibly fast jaws (doi:10.1073/pnas.0604290103). There is also a lot of information that may one day be semantically encoded, but for now will only be captured as text (such as extensive commentaries on web sites).

Lastly, there is the issue of getting people involved. One of the striking things about biodiversity web sites is the lack of community involvement. My question is "why is this the case?" Here are some thoughts:
  1. If feedback consists of sending an email to the person/organisation running the site, then there's little incentive to get involved. Why bother typing a detailed summary of why a particular piece of information is wrong, only to have that essay disappear into a black hole?

  2. Annotations can be valuable, but there is an issue of trust. Why invest effort in a site that may be a short-lived toy? There are so many sites competing for attention.

One way to foster involvement might be through a Wiki, whereby anybody can contribute to writing a page about each organism. However, biodiversity Wikis such as Wikispecies have, in my opinion, been a spectacular failure, as evidenced by the number of missing pages or stubs. Perhaps part of the reason is the lack of content, which could be addressed by pre-populating each page with basic information from a database (such as name, any specimens, images, literature, etc.). In other words, each page would start with the level of detail of an iSpecies report (for background to iSpecies visit my iSpecies blog). As Kevin Kelly commented at the recent Google Sci Foo camp, people are much more likely to edit existing content than create content de novo.

Wikis are all very well, but my major worry is the potential to loose information. Here's the problem. Suppose I generate a Wiki page for an ant, and include information on its distribution. What happens if the underlying distribution data changes? Now that the page is in Wiki form, it will be out of date. Furthermore, I'm not sure I want users editing distribution records -- these should really be edited at the level of the source database, so that the changes propagate to other users of those data.

One possibility is to use custom tags in the Wiki. When the HTML page is generated, Wiki tags are rendered in the usual way, but the custom tags are replaced by the results of a database call (for example, a SPARQL query). Hence, something like %DISTRIBUTION would be replaced by a Google map of the specimens for that taxon. This would mean that the distribution map would always reflect the current database, and the user (assuming they don't delete the %DISTRIBUTION tag itself) won't be able to alter that information. Of course, this means we need some mechanism for users to inform the curators of the source data of any potential errors. This is particularly important if we pre-populate the Wiki page with information that may be incorrect (such as images harvested from a search engine).

We could also help things by encouraging standard ways of linking to other resources, or storing data. For example, say a user edits a page and adds a citation to a paper that isn't in the underlying triple store. Ideally we would get that new paper into the triple store (rather than have it languish in the Wiki text). There are some ways to do this, such as extracting metadata from DOIs, and using local links [need to think about this].

Likewise, with images, if we have a convention that images get posted to, say, Flickr, then we have a means for storing metadata about those images directly in our triple store.

These ideas have partly come out of conversations with Rebecca Shapley at Google, and Dave Thau at the California Academy of Sciences.

Thursday, August 10, 2006

So that's how they do it - adding a message to mailto

The href="mailto:" tag is useful, and I know how to add a subject, but today I learnt how to add a message as well. The example comes from the DOI web site:

This bit of code:

<p>If you believe you have requested a DOI name that should be found, you may report this error to
<a href=" Not Found&body=The%20following%20DOI%20was%20not%20found:%0D%0A...f"></a>.
<b>Please include information regarding where you found the DOI in your message:</b></p>

Gives this result:

If you believe you have requested a DOI name that should be found, you may report this error to
Please include information regarding where you found the DOI in your message:

The &body= tag gives the text of the email message. Neat.