<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-27820238</id><updated>2011-09-21T14:19:40.342+01:00</updated><category term='Simile'/><category term='RDF'/><category term='Bio2RDF'/><category term='sesame'/><category term='demo'/><category term='semantic web'/><title type='text'>SemAnt</title><subtitle type='html'>A record of my efforts to create a triple store for data about ants, and related musings</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://semant.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>43</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-27820238.post-4230612700955314802</id><published>2007-03-22T13:57:00.000Z</published><updated>2007-03-22T14:31:07.333Z</updated><title type='text'>bioguid.info</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bioguid.info/images/bioGUID48.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px;" src="http://bioguid.info/images/bioGUID48.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;As announced on &lt;a href="http://www.nodalpoint.org/2007/03/21/bioguid_info"&gt;nodalpoint&lt;/a&gt;, I've put together a web site called &lt;a href="http://bioguid.info"&gt;bioguid.info&lt;/a&gt; which, rather grandly, is an attempt to bootstrap the biodiversity Semantic Web by providing resolvable URIs for biological objects, such as publications, taxonomic names, nucleotide sequences, and specimens. These URIs (or "GUIDs") can be resolved by a web browser to display HTML, but under the hood are resolved to RDF (which you can see by viewing the source of the web page you get for a URI).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-4230612700955314802?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/4230612700955314802/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=4230612700955314802' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/4230612700955314802'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/4230612700955314802'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2007/03/bioguidinfo.html' title='bioguid.info'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-841299756040195584</id><published>2007-03-13T10:07:00.000Z</published><updated>2007-03-13T10:27:51.161Z</updated><title type='text'>More 3store3</title><content type='html'>Had a little fun with 3store3 on Fedora Core 4, which is what I run on my servers. 3store3 builds easily on Fedora (in constrast to the &lt;a href="http://semant.blogspot.com/2006/06/3store3.html"&gt;fun had building 3store3 on a Mac&lt;/a&gt;. However, any SPARQL query beyond a simple listing of all triples just bailed out with scary errors. For example:&lt;br /&gt;&lt;pre style="border: 1px solid #c7cfd5;background: #f1f5f9;margin: 20px 0;padding: 15px;text-align: left;font-size: 10px;"&gt;&lt;br /&gt;ts-explain "PREFIX dc:  &lt;br /&gt;&lt;http://purl.org/dc/elements/1.1/&gt; SELECT DISTINCT ?s  ?o WHERE { ?s  &lt;br /&gt;dc:title ?o . } LIMIT 10"&lt;br /&gt;SQL error 2013: Lost connection to MySQL server during query at util.c:55&lt;br /&gt;SQL error 2002: Can't connect to local MySQL server through socket  &lt;br /&gt;'/var/lib/mysql/mysql.sock' (111) at query.c:316&lt;br /&gt;Warning cannot calculate complexity&lt;br /&gt;Complexity: 0&lt;br /&gt;SELECT DISTINCT v0.lexical AS `s`, v1.lexical AS `o`, v1.datatype AS  &lt;br /&gt;o_dt, v1.language AS o_lang&lt;br /&gt;FROM (SELECT DISTINCT t0.object AS `o`, t0.subject AS `s`&lt;br /&gt;FROM triples t0&lt;br /&gt;WHERE t0.predicate=-8024650864867163606&lt;br /&gt;LIMIT 10) AS `tmp0_202a`, symbols v0, symbols v1&lt;br /&gt;WHERE tmp0_202a.s=v0.hash &amp;&amp; tmp0_202a.o=v1.hash&lt;br /&gt;LIMIT 10;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;After much cursing, rebuilding,trying different versions of 3store3 and the Redland libraries, gave up and contemplated a crude hack. The SQL query itself works fine in MySQL, but not when 3store3 tries to call MySQL. But, before writing a crude hack I mailed the &lt;a href="https://lists.sourceforge.net/lists/listinfo/threestore-devel"&gt;3store3 mailing list&lt;/a&gt; this morning, and by 9:30 the same morning Steve Harris had given me the solution. I was running MySQL version 4.1.11, and according to Steve there's a bug in early verisons of MySQL 4. He recommended upgrading, so off to do &lt;font face="courier"&gt;yum install mysql&lt;/font&gt;. Sure enough, I get MySQL 4.1.20, and everything works. Yay!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-841299756040195584?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/841299756040195584/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=841299756040195584' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/841299756040195584'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/841299756040195584'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2007/03/more-3store3.html' title='More 3store3'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-7230813706581071046</id><published>2007-03-06T07:30:00.000Z</published><updated>2007-03-06T07:50:59.359Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><category scheme='http://www.blogger.com/atom/ns#' term='demo'/><category scheme='http://www.blogger.com/atom/ns#' term='Simile'/><category scheme='http://www.blogger.com/atom/ns#' term='sesame'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><category scheme='http://www.blogger.com/atom/ns#' term='Bio2RDF'/><title type='text'>Bio2RDF</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bio2rdf.org/Bio2RDF.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 320px;" src="http://bio2rdf.org/Bio2RDF.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;Stumbled across &lt;a href="http://bio2rdf.org/"&gt;Bio2RDF&lt;/a&gt; by François Belleau. Here's the blurb:&lt;br /&gt;&lt;blockquote&gt;It is now possible to query Bio2RDF triple store using Sesame's SeRQL query engine. The actual triple store contains all annotations about human and mouse from UniProt, Affymetrix and GeneID. It also contains all GO term definitions and OMIM disease description. It contains 50 millions triples and the native RDF store in sesame weigths 3 Go. Watch for the speed of it ! Thanks for the Sesame team for their great work.&lt;/blockquote&gt;&lt;br /&gt;There's a &lt;a href="http://bio2rdf.blogspot.com/"&gt;blog&lt;/a&gt;, and a &lt;a href="http://sourceforge.net/projects/bio2rdf/"&gt;SourceForge project&lt;/a&gt;. The Bio2RDF website itself  displays RDF as tables. It doesn't have a query interface but it's a bit hidden. Try &lt;a href="http://bio2rdf.org/sesame-125v2/actionFrameset.jsp?repository=bio2rdf"&gt;here&lt;/a&gt;, based on this &lt;a href="http://bio2rdf.blogspot.com/2006/09/try-querying-bio2rdf-50-millions-triple.html"&gt;blog post&lt;/a&gt;.&lt;br /&gt;It uses Seasame, JSP, and SeRQL -- not my favourite technologies -- but is an example of people thinking about RDF and triple stores, and actually making stuff. Must polish up my creaking &lt;a href="http://linnaeus.zoologyh.gla.ac.uk/~rpage/ants/"&gt;ant demo&lt;/a&gt; and play some more with this. &lt;br /&gt;On the subject of ants, Terry Catapano has been playing with &lt;a href="http://simile.mit.edu/"&gt;Simile&lt;/a&gt;, and come up with &lt;a href="http://www.columbia.edu/~thc4/simile/madagascar_ants_exhibit.html"&gt;this demo&lt;/a&gt;.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_Gct8lVAxKqQ/Re0dFMusgYI/AAAAAAAAAAM/2FHZjWmAJ6o/s1600-h/exhibit.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://bp3.blogger.com/_Gct8lVAxKqQ/Re0dFMusgYI/AAAAAAAAAAM/2FHZjWmAJ6o/s320/exhibit.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5038715533251084674" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-7230813706581071046?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/7230813706581071046/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=7230813706581071046' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/7230813706581071046'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/7230813706581071046'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2007/03/bio2rdf.html' title='Bio2RDF'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_Gct8lVAxKqQ/Re0dFMusgYI/AAAAAAAAAAM/2FHZjWmAJ6o/s72-c/exhibit.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-117220707301436757</id><published>2007-02-23T05:00:00.000Z</published><updated>2007-02-23T05:04:33.026Z</updated><title type='text'>303 and concept URIs - towards GUIDs for all</title><content type='html'>I've &lt;a href="http://del.icio.us/rdmpage/303" target="_new"&gt;bookmarked&lt;/a&gt; some stuff relevant to the issue of what HTTP URIs identify. This concerns whether a URI identifiers something in the real world (or a concept), or a document. The "303" refers to the HTTP code returned if the URI is a concept, not a document. The whole topic is a mess, but I need to get to grips with it as I'm working on a service to assign URIs to just about any biologically interesting object. Sigh.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-117220707301436757?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/117220707301436757/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=117220707301436757' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/117220707301436757'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/117220707301436757'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2007/02/303-and-concept-uris-towards-guids-for.html' title='303 and concept URIs - towards GUIDs for all'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-117147789779545966</id><published>2007-02-14T18:19:00.000Z</published><updated>2007-02-14T18:49:30.146Z</updated><title type='text'>Harvesting handles</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://digitallibrary.amnh.org/dspace/image/blueBox.gif"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px;" src="http://digitallibrary.amnh.org/dspace/image/blueBox.gif" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;Finally discovered how to get metadata from DSpace-hosted items, such as the AMNH's &lt;a href="http://digitallibrary.amnh.org/dspace/"&gt;Scientific Publications&lt;/a&gt;. They have an OAI interface, so that the URL &lt;a href="http://digitallibrary.amnh.org/dspace-oai/request?verb=GetRecord&amp;metadataPrefix=oai_dc&amp;identifier=oai:digitallibrary.amnh.org:2246/1999" target="_new"&gt;http://digitallibrary.amnh.org/dspace-oai/request?&lt;br/&gt;verb=GetRecord&lt;br/&gt;&amp;amp;metadataPrefix=oai_dc&lt;br/&gt;&amp;amp;identifier=oai:digitallibrary.amnh.org:2246/1999&lt;/a&gt; will retrieve XML metadata for the record &lt;a href="http://hdl.handle.net/2246/1999" target="_new"&gt;hdl:2246/1999&lt;/a&gt; -- in this case, number 2578 of the &lt;i&gt;American Museum novitates&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Oh, and the Simile project already has a &lt;a href="http://simile.mit.edu/repository/RDFizers/oai2rdf/transformers/oai_dc/transformer.xslt"&gt;OAI to RDF XSLT stylesheet&lt;/a&gt; (found courtesy of Leigh Dodds &lt;a href="http://del.icio.us/ldodds/OAI"&gt;OAI bookmarks&lt;/a&gt; on del.icio.us).&lt;br /&gt;&lt;br /&gt;Almost forgot, I stumbled across this information at &lt;a href="http://celestial.eprints.org/repository?repository=915"&gt;Celestial&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-117147789779545966?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/117147789779545966/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=117147789779545966' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/117147789779545966'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/117147789779545966'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2007/02/harvesting-handles.html' title='Harvesting handles'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-117016651951904562</id><published>2007-01-30T14:08:00.000Z</published><updated>2007-01-30T14:15:19.533Z</updated><title type='text'>ANeT - an invisible ant resource</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://homepage.mac.com/dorylus/Resources/anetlog.gif"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://homepage.mac.com/dorylus/Resources/anetlog.gif" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://homepage.mac.com/dorylus/index.html"&gt;ANeT&lt;/a&gt; is a network to promote ant reserach in Asia. It has some interesting things, although it's a shocking example of poor design. Almost all the text on the home page is not plain text but text written in a GIF image, which means search engines like Google will have a tough time indexing the page, which in turn means it will be hard to find. &lt;br /&gt;&lt;br /&gt;And if you can't be found by Google, you don't exist...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-117016651951904562?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/117016651951904562/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=117016651951904562' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/117016651951904562'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/117016651951904562'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2007/01/anet-invisible-ant-resource.html' title='ANeT - an invisible ant resource'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-116966491346455003</id><published>2007-01-24T18:40:00.000Z</published><updated>2007-01-24T18:55:13.506Z</updated><title type='text'>Searching Hymenoptera Name Server literature</title><content type='html'>These are some notes on efforts to make the Hymenoptera Name Server literature data base searchable. This work builds on &lt;a href="http://semant.blogspot.com/2006/08/lsids-and-dois-for-ant-and-other.html"&gt;LSID stuff&lt;/a&gt; I did earlier, and is also a response to the &lt;a href="http://mailman.nhm.ku.edu/pipermail/taxacom/2007-January/025147.html"&gt;TAXACOM thread&lt;/a&gt; started by Roger Hyam. Donat Agosti has also been requesting something along these lines. &lt;br /&gt;&lt;br /&gt;The first step is to suck all the records off HNS, and convert them to &lt;a href="http://www.refman.com/support/risformat_intro.asp"&gt;RIS&lt;/a&gt; format. I then want to import that in to an instance of &lt;a href="http://myphpbib.sourceforge.net/"&gt;MyPHPBib&lt;/a&gt; (an old project of mine languishing on SourceForge), which gives me a MySQL database of the literature to play with. What I'd like is an OpenURL style search interface that can be used to return records matching a user query. &lt;br /&gt;&lt;br /&gt;Notes to self. Character encoding is a major, major pain. I'm running the script on a Fedora Core 4 box as Mac OS X drove me nuts, I'm ensuring that the XML style sheet outputs ISO-8859-1 encoding (to match that returned by HNS), and I set the Terminal character encoding to Western(ISO-8859-1) as well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-116966491346455003?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/116966491346455003/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=116966491346455003' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116966491346455003'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116966491346455003'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2007/01/searching-hymenoptera-name-server.html' title='Searching Hymenoptera Name Server literature'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-116500507956606986</id><published>2006-12-01T20:31:00.000Z</published><updated>2006-12-01T20:31:19.726Z</updated><title type='text'>Tales of a Semantic Web Consultancy » Blog Archive » 2 papers worth reviewing from ISWC2006</title><content type='html'>From &lt;a href="http://clarkparsia.com/weblog/"&gt;Tales of a Semantic Web Consultancy&lt;/a&gt; is  &lt;br /&gt;&lt;a href="http://clarkparsia.com/weblog/2006/11/29/2-papers-worth-reviewing-from-iswc2006/"&gt;this post&lt;/a&gt; on 2 papers worth reviewing from ISWC2006. &lt;a href="http://swui.semanticweb.org/swui06/papers/Karger/Pathetic_Fallacy.html"&gt;The Pathetic Fallacy of RDF&lt;/a&gt; is a nice summary of why graphs, appealing as they are, aren't the way to think about displaying RDF. Other approaches seem more promising...&lt;br /&gt;&lt;br /&gt;&lt;img src="http://www.nopain2.org/200608031437.jpg"/&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-116500507956606986?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/116500507956606986/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=116500507956606986' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116500507956606986'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116500507956606986'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/12/tales-of-semantic-web-consultancy-blog.html' title='Tales of a Semantic Web Consultancy » Blog Archive » 2 papers worth reviewing from ISWC2006'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-116488961050120433</id><published>2006-11-30T12:07:00.000Z</published><updated>2006-11-30T12:26:50.510Z</updated><title type='text'>Damn DiGIR</title><content type='html'>I've been playing with DiGIR providers, retrieving records to be massaged into RDF as part of a project to aggregate specimens, sequences, publications, and phylogenies. DiGIR was a major advance on what went before (i.e., basically nothing), but in addition to variations in the schema, and the fact that a good portion of the providers are off line at any one time (see &lt;a href="http://semant.blogspot.com/2006/11/perils-of-federation.html"&gt;Perils of Federation&lt;/a&gt;), I'm coming up against the wide range of ways people have of writing dates. &lt;br /&gt;&lt;br /&gt;Ideally, I'd like dates in the variation of the ISO 8601 YYYY-MM-DD format described by the W3C &lt;a href="http://www.w3.org/TR/NOTE-datetime"&gt;W3C&lt;/a&gt;, and recommended by &lt;a href="http://dublincore.org/documents/dcmi-terms/"&gt;Dublin Core&lt;/a&gt;. What we can get in DiGIR records is all manner of formats, such as (with specimen codes following)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;darwin:DateLastModified&gt;11/28/2000 12:28:30 PM&lt;/darwin:DateLastModified&gt; [KU 195138]&lt;br /&gt;&lt;darwin:DateLastModified&gt;2004-02-20 00:00:00&lt;/darwin:DateLastModified&gt; [MVZ 149006]&lt;br /&gt;&lt;darwin:DateLastModified&gt;2006-08-21&lt;/darwin:DateLastModified&gt; [FMNH 145699]&lt;br /&gt;&lt;darwin:VerbatimCollectingDate&gt;12/Jun/1983&lt;/darwin:VerbatimCollectingDate&gt; [KU 195138]&lt;br /&gt;&lt;darwin:VerbatimCollectingDate&gt;29 Jun 1974&lt;/darwin:VerbatimCollectingDate&gt; [MVZ 149006]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now, variation in the &amp;lt;darwin:VerbatimCollectingDate&amp;gt; tag (the last two dates) is expected, but for a computer generated field such as &amp;lt;darwin:DateLastModified&amp;gt; this is a bit much.&lt;br /&gt;&lt;br /&gt;Since I do most of my harvesting in Perl, I came across &lt;a href="http://www.cise.ufl.edu/~sbeck/DateManip.html"&gt;Date::Manip&lt;/a&gt;, which manages to convert these into a sensible form (for example &lt;font face="Courier"&gt;11/28/2000 12:28:30 PM&lt;/font&gt; becomes &lt;font face="Courier"&gt;2000-11-28T12:28:30&lt;/font&gt;). &lt;br /&gt;&lt;br /&gt;Integration is not easy...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-116488961050120433?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/116488961050120433/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=116488961050120433' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116488961050120433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116488961050120433'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/11/damn-digir.html' title='Damn DiGIR'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-116420462249878312</id><published>2006-11-22T13:57:00.000Z</published><updated>2006-11-22T14:10:22.550Z</updated><title type='text'>More GUIDs</title><content type='html'>&lt;a href="http://www.osu.edu/images/osu_logo.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://www.osu.edu/images/osu_logo.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;Idylly Googling, I stumbled across &lt;a href="https://kb.osu.edu/dspace/index.jsp"&gt;The Knowledge Bank at OSU&lt;/a&gt;, a &lt;a href="http://www.dspace.org/"&gt;DSpace&lt;/a&gt; server at &lt;a href="http://www.osu.edu"&gt;Ohio State University&lt;/a&gt;. A number of publications on ants are listed there, complete with PDFs. For example, "The Mating Activities of the Ant &lt;i&gt;Myrmica Americana&lt;/i&gt; Weber" by Kannowski and Kannowski, published in &lt;i&gt;The Ohio Journal of Science&lt;/i&gt; in 1957. This paper has a GUID in the form of a Handle (&lt;a href="http://hdl.handle.net/1811/4489"&gt;hdl:1811/4489&lt;/a&gt;). What I find interesting is that digitisation efforts by libraries are putting biodiversity literature online as part of a broader effort (i.e., we get these papers, and a GUID, "for free"). This also raises issues about duplication of effort &amp;#8212; clearly, if a library has put a PDF online, we don't want to duplicate this. Hence, we need a simple way of finding whether a paper has already been digitised. Google Scholar may be useful for this, although in this case Google finds the paper but Google Scholar doesn't.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-116420462249878312?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/116420462249878312/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=116420462249878312' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116420462249878312'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116420462249878312'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/11/more-guids.html' title='More GUIDs'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-116404720818909001</id><published>2006-11-20T18:23:00.000Z</published><updated>2006-11-20T18:26:48.196Z</updated><title type='text'>Copyright on images</title><content type='html'>&lt;a href="http://people.bu.edu/karitr/Acanthoponera%20peruviana%20L2x.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 320px;" src="http://people.bu.edu/karitr/Acanthoponera%20peruviana%20L2x.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;This post is a copy of a comment I wrote on The Ant Room post &lt;a href="http://theantroom.blogspot.com/2006/05/synchronizing-and-copyrighting-images.html"&gt;Synchronizing and Copyrighting Images&lt;/a&gt;, which I've repeated here so I don't loose track of it. &lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;Two thoughts on copyrighting images. The first is why choose copyright &amp;copy;  as opposed to a &lt;a href="http://creativecommons.org/"&gt;Creative Common (cc) license&lt;/a&gt;? With a cc license you get to specify what I can and can't do with the image, without me having to ask you. By sticking "&amp;copy;  K. T. Ryder Wilkie 2005" on an image (e.g., your gorgeous picture of &lt;a href="http://people.bu.edu/karitr/Acanthoponera%20peruviana%20L2x.jpg"&gt;&lt;i&gt;Acanthoponera peruviana&lt;/i&gt;&lt;/a&gt;), I then have to contact you to ask your permission. For one or two images, that's OK I guess, but what it I want to use lots of images? What if you are on holiday? &lt;br /&gt;&lt;br /&gt;The second comment is that I can read "&amp;copy;  K. T. Ryder Wilkie 2005" but computers can't (at least, not easily). There other other ways to tag images that computers can read this information. Examples include EXIF tags (as used by Antweb, as mentioned on my &lt;a href="http://ispecies.blogspot.com/2006/01/exif-tags.html"&gt;iSpecies blog&lt;/a&gt;) which get embedded in the image file itself (also &lt;a href="http://www.adobe.com/products/xmp/"&gt;XMP&lt;/a&gt; information added by Photoshop, or Flickr tags (for example, this image of  &lt;a href="http://www.flickr.com/photos/ants_in_my_pants/51876311/"&gt;&lt;i&gt;Strumigenys precava&lt;/i&gt;&lt;/a&gt;). My point is that if people are going to make use of your work on a large scale, using Creative Common licenses and embedding that information electronically in the image in the form of metadata will make your hard work even more useful. &lt;br /&gt;&lt;br /&gt;If sharing information on biodiversity is going to take off, then we need to start thinking about how to share, and how to make our information accessible to computers, not just people.&lt;br /&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-116404720818909001?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/116404720818909001/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=116404720818909001' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116404720818909001'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116404720818909001'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/11/copyright-on-images.html' title='Copyright on images'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-116327260843571866</id><published>2006-11-11T18:58:00.000Z</published><updated>2006-11-12T12:52:21.520Z</updated><title type='text'>SVG specimen maps from SPARQL results</title><content type='html'>One reason why I've put off adding specimen maps to iSpecies, despite &lt;a href="http://ispecies.blogspot.com/2006/11/identification-service.html"&gt;repeated requests&lt;/a&gt;, is that Google Maps (my preferred mapping tool) is slow if you have lots of specimen records. I've played with some other tools, notably &lt;a href="http://www.mapbureau.com/"&gt;Map Bureau's&lt;/a&gt; Flash-based pointMapper, but what I'd really like is a quick and simple way to display a bunch of specimen records. Because the same issue comes up with SemAnt, I thought it's time to do something about it. I stress that I really like Google Maps, but for some purposes it's overkill. Furthermore, loading hundreds of points will take too long.&lt;br /&gt;&lt;br /&gt;So, the idea is to take georeferenced specimen records and put them on a map of the world. Slowly it dawned on me that this was trivially easy. Firstly, take a map of the world drawn using the  &lt;a href="http://en.wikipedia.org/wiki/Equirectangular_projection"&gt;equirectangular&lt;/a&gt; (or plate carrée) projection (Wikipedia provided the &lt;a href="http://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Equirectangular-projection.jpg/360px-Equirectangular-projection.jpg"&gt;example below&lt;/a&gt;). &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Equirectangular-projection.jpg/360px-Equirectangular-projection.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px;" src="http://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Equirectangular-projection.jpg/360px-Equirectangular-projection.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This projection a simple connection between geographic location and pixel position. For example, if the map is scaled to 180 pixels high and 360 pixels wide, then you have a 1 pixel/degree grid. Hence, plotting localities is no harder than plotting a X-Y scatter plot.&lt;br /&gt;&lt;br /&gt;Now, all I need to do is take a SPARQL result with latitude and longitudes and draw the localities on this map. One way to do this is to draw the points using SVG, so I can use a XSL transformation to generate the map. If I wanted to support zooming then ideally I'd have the map itself in SVG, but I just want a small world map, so I "cheat" and use a bitmap as the base map. This can be included like this:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;image x="0" y="0" width="360" height="180" &lt;br /&gt;xlink:href="http://...360px-Equirectangular-projection.jpg" /&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The trick is to convert latitude and longitude to coordinates on the bitmap. For example, specimen &lt;a href="http://www.antweb.org/specimen.do?name=casent0008682-d03"&gt;casent0008682-d03&lt;/a&gt; of &lt;i&gt;Melissotarsus emeryi&lt;/i&gt; was collected from 31&amp;deg;58'0'' S, 18&amp;deg;51'0'' E, which in decimal values is latitude -31.966667, longitude 18.85. Now, how do I convert these values into a location on a 360 &amp;times; 180 image? In SVG the coordinates grow from the upper left, whereas on the map shown above 0,0 is in the centre, such that southern latitudes are negative, as are western longitudes. We can use a transform to move the origin of the x- and y-axes to the left 180 pixels, and down 90 pixels, so that the origin of the graph is the intersection of the equator and Greenwich Meridian. We also have to invert the y-axis because in SVG it goes from top to bottom. This diagram shows the difference between SVG and geographical coordinates:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/4123/605/1600/map.0.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/map.0.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This transformation is achieved by this statement&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;g transform="translate(180,90) scale(1,-1)" &amp;gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&amp;lt;/g&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This idea came from hack #55 in Michael Fitzgerald's book &lt;a href="http://www.oreilly.com/catalog/xmlhks/"&gt;XML Hacks&lt;/a&gt;. Here is the XSLT I use.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;?xml version='1.0'?&amp;gt;&lt;br /&gt;&amp;lt;xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&lt;br /&gt;    xmlns:res="http://www.w3.org/2005/sparql-results#" xmlns="http://www.w3.org/2000/svg"&lt;br /&gt;    xmlns:xlink="http://www.w3.org/1999/xlink" exclude-result-prefixes="res xsl"&amp;gt;&lt;br /&gt;    &amp;lt;xsl:output method="xml" version="1.0" indent="yes"/&amp;gt;&lt;br /&gt;    &amp;lt;xsl:template match="/"&amp;gt;&lt;br /&gt;        &amp;lt;svg&amp;gt;&lt;br /&gt;            &amp;lt;xsl:attribute name="width"&amp;gt;360px&amp;lt;/xsl:attribute&amp;gt;&lt;br /&gt;            &amp;lt;xsl:attribute name="height"&amp;gt;180px&amp;lt;/xsl:attribute&amp;gt;&lt;br /&gt;            &amp;lt;rect id="dot" x="0" y="0" width="4" height="4" style="stroke:none; stroke-width:1; fill:solid"/&amp;gt;&lt;br /&gt;            &amp;lt;image x="0" y="0" width="360" height="180" xlink:href="http://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Equirectangular-projection.jpg/360px-Equirectangular-projection.jpg"/&amp;gt;&lt;br /&gt;            &amp;lt;g transform="translate(180,90) scale(1,-1)"&amp;gt;&lt;br /&gt;                &amp;lt;xsl:apply-templates select="//res:result"/&amp;gt;&lt;br /&gt;            &amp;lt;/g&amp;gt;&lt;br /&gt;        &amp;lt;/svg&amp;gt;&lt;br /&gt;    &amp;lt;/xsl:template&amp;gt;&lt;br /&gt;    &amp;lt;xsl:template match="//res:result"&amp;gt;&lt;br /&gt;        &amp;lt;use xlink:href="#dot"&amp;gt;&lt;br /&gt;            &amp;lt;xsl:attribute name="transform"&amp;gt;&lt;br /&gt;                &amp;lt;xsl:text&amp;gt;translate(&amp;lt;/xsl:text&amp;gt;&lt;br /&gt;                &amp;lt;xsl:value-of select="res:binding[@name='long']/res:literal"/&amp;gt;&lt;br /&gt;                &amp;lt;xsl:text&amp;gt;,&amp;lt;/xsl:text&amp;gt;&lt;br /&gt;                &amp;lt;xsl:value-of select="res:binding[@name='lat']/res:literal"/&amp;gt;&lt;br /&gt;                &amp;lt;xsl:text&amp;gt;)&amp;lt;/xsl:text&amp;gt;&lt;br /&gt;            &amp;lt;/xsl:attribute&amp;gt;&lt;br /&gt;        &amp;lt;/use&amp;gt;&lt;br /&gt;    &amp;lt;/xsl:template&amp;gt;&lt;br /&gt;&amp;lt;/xsl:stylesheet&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This transforms a SPARQL result that looks something like this:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;result&amp;gt;&lt;br /&gt;   &amp;lt;binding name="lat"&amp;gt;&amp;lt;literal&amp;gt;10.266666&amp;lt;/literal&amp;gt;&amp;lt;/binding&amp;gt;&lt;br /&gt;   &amp;lt;binding name="long"&amp;gt;&amp;lt;literal&amp;gt;-84.083336&amp;lt;/literal&amp;gt;&amp;lt;/binding&amp;gt;&lt;br /&gt;&amp;lt;/result&amp;gt;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;One thing which drove me nuts for a while was that the SVG rendered fine in Safari using Adobe's plugin, but not in Camino, which uses the same rendering engine as Mozilla. Turns out Camino needs http://www.w3.org/2000/svg to be the default namespace, so xmlns="http://www.w3.org/2000/svg" is fine, but it barfs over xmlns&lt;b&gt;:svg&lt;/b&gt;="http://www.w3.org/2000/svg". Sigh.&lt;br /&gt;&lt;br /&gt;Here is an example SVG file rendered using the XSLT style sheet, but using a different background map, showing the distribution of the ant &lt;i&gt;Azteca constructor&lt;/i&gt;. The source SVG is &lt;a href="http://linnaeus.zoology.gla.ac.uk/~rpage/ants/mapsvg/map2.svg"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt; &lt;object type="image/svg+xml" data="http://linnaeus.zoology.gla.ac.uk/~rpage/ants/mapsvg/map2.svg" width="360" height="180" &gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;A nice, simple map, with minimal effort.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-116327260843571866?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/116327260843571866/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=116327260843571866' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116327260843571866'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116327260843571866'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/11/svg-specimen-maps-from-sparql-results.html' title='SVG specimen maps from SPARQL results'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-116298749515139816</id><published>2006-11-08T11:50:00.000Z</published><updated>2006-11-08T12:04:55.250Z</updated><title type='text'>Perils of federation</title><content type='html'>Dave Vieglais gave what looks like an interesting presentation at &lt;a href="http://tdwg2006.tdwg.org/programme/presentations/"&gt;TDWG 2006&lt;/a&gt;. &lt;a href="http://bigdig.ecoforge.net/wiki"&gt;BigDig&lt;/a&gt; monitors the status of &lt;a href="http://digir.sf.net/"&gt;DiGIR&lt;/a&gt; providers that serve museum specimen records. I've not managed to get the map background to appear, but here's a snapshot of the geographical distribution of providers, and their status:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/4123/605/1600/snapshot1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/snapshot1.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The happy faces are DiGIR providers that are live, the sad faces are not responding. What is interesting that a fair chunk are offline, of the 180 registered providers, 25 have &lt;strong&gt;never&lt;/strong&gt; responded, and there are something like 17 variations of the DiGIR schema out there.&lt;br /&gt;&lt;br /&gt;It's a little scary that so many providers are offline, and that they differ in the format of the messages they accept and return. For federated searches that are "live," this spells disaster. His presentation is &lt;a href="http://tdwg2006.tdwg.org/fileadmin/2006meeting/slides/vieglais_TheBigDig.odp"&gt;here&lt;/a&gt; (rather unhelpfully in Open Office format, so I've made a &lt;a href="http://linnaeus.zoology.gla.ac.uk/~rpage/talks/vieglais_TheBigDig.pdf"&gt;PDF&lt;/a&gt;).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-116298749515139816?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/116298749515139816/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=116298749515139816' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116298749515139816'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116298749515139816'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/11/perils-of-federation.html' title='Perils of federation'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-116162534018236491</id><published>2006-10-23T18:30:00.000+01:00</published><updated>2006-10-24T09:02:24.186+01:00</updated><title type='text'>Automatically growing an ant bibliography</title><content type='html'>Earlier on iPhylo I'd mentioned the issue of &lt;a href="http://iphylo.blogspot.com/2006/05/updating-ants.html"&gt;updating&lt;/a&gt; a triple store of ants, or indeed, any data base. As an experiment, I've put together a Perl script that can be used to update a data base in &lt;a href="http://www.connotea.org"&gt;Connotea&lt;/a&gt; with recent papers on ants. The script makes of a number of web services, and &lt;a href="http://www.ubio.org/index.php?pagename=ubioRSS"&gt;uBio's RSS feeds&lt;/a&gt;. It does the following:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Takes an RSS feed&lt;/strong&gt; for &lt;a href="http://names.ubio.org/rss/rss_feed.php?username=rdmpage&amp;rss1=1" _target="_blank"&gt;Formicidae&lt;/a&gt; from uBio. This feed lists recent papers on ants, as identified using uBio's taxonomic name recognition algorithms.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Extracts DOIs or PubMed identifiers&lt;/strong&gt; from the RSS feed. If a DOI isn't found, I see if we can extract one from the &amp;lt;link&amp;gt; tag (typically a URL to the article). uBio does a pretty good job of getting DOIs, but misses some (e.g., for Blackwell and BioOne journals).&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Extracts taxonomic names&lt;/strong&gt; from the content of the &amp;lt;title&amp;gt; and &amp;lt;description&amp;gt; tags using a SOAP call to uBio's &lt;a href="http://names.mbl.edu/tools/recognize.php"&gt;FindIT&lt;/a&gt; web service. Ideally, uBio would do this for us, since it has already parsed the journal feed, but for now I do it.&lt;/li&gt; &lt;br /&gt;&lt;li&gt;&lt;strong&gt;Uses Yahoo's &lt;a href="http://developer.yahoo.com/search/content/V1/termExtraction.html"&gt;term extraction&lt;/a&gt; web service&lt;/strong&gt; to extract keywords&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Submit the article GUID&lt;/strong&gt; (DOI or PubMed id), and the tags to Connotea using the &lt;a href="http://www.connotea.org/wiki/WebAPI"&gt;web API&lt;/a&gt;.&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;Here's a sketch of the process.&lt;br /&gt;&lt;a href="http://photos1.blogger.com/blogger/4123/605/1600/RSS.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/RSS.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;The papers are stored in my &lt;a href="http://www.connotea.org/user/semant" target="_blank"&gt;semant&lt;/a&gt; library. Because it is entirely automated, it could be run regularly (as a cron job, say) to update the library, hence the list of ant papers would grow without any human intervention. At the same, however, users with access to the semant library could manually edit the tags if they feel Yahoo and uBio have missed some relevant terms.&lt;br /&gt;&lt;br /&gt;Note also that names recognised by uBio are tagged with LSIDs for the names as well, which means we could resolve those to RDF. In the same way, the Connotea data base itself can serve RDF (here are the &lt;a href="http://www.connotea.org/rss/user/semant" taget="_blank"&gt;ant papers in RDF&lt;/a&gt;). Hence, we could easily populate a triple store with metadata about papers and names.&lt;br /&gt;&lt;br /&gt;What I like about this script is that it brings together a number of themes.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;GUIDs&lt;/strong&gt; play a key role here. Connotea knows which papers uBio has extracted by using the DOI (or PubMed identifier). Not only does this enable Connotea to know which paper I want, but it uses that identifier to extract metadata about the paper, for example via CrossRef. It also knows whether any other user has already added that paper.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Web services&lt;/strong&gt; mean that I don't have to reinvent the wheel. If I want to pick out taxonomic names, I use uBio. To extract keywords for tagging, I use Yahoo. To store data, I use Connotea's API.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Tagging&lt;/strong&gt; makes it easy to add information to a reference.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Social networking&lt;/strong&gt; through using an open database like Connotea. People can discover other people's libraries through shared papers or shared tags.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;RSS&lt;/strong&gt; pops up at the start and at the end. The whole process starts with a RSS feed (itself an aggregation of numerous journal RSS feeds), and the resulting Connotea data base serves RSS, so others can readily make use of the results.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-116162534018236491?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/116162534018236491/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=116162534018236491' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116162534018236491'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116162534018236491'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/10/automatically-growing-ant-bibliography.html' title='Automatically growing an ant bibliography'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-116111057387822650</id><published>2006-10-17T19:42:00.000+01:00</published><updated>2006-10-17T19:42:54.236+01:00</updated><title type='text'>
   Geonames ontology in OWL - GSWB</title><content type='html'>&lt;p&gt;Via Danny Ayers' blog, I found this discussion of the &lt;a href="http://www.geospatialsemanticweb.com/2006/10/14/geonames-ontology-in-owl"&gt;&lt;br /&gt;   Geonames ontology in OWL - GSWB&lt;/a&gt;, which is described in full at &lt;a href=""&gt;Geonames&lt;/a&gt;. What is nice about this is both that geographic localities get URIs that resolve to RDF (e.g., &lt;a href="http://ws.geonames.org/rdf?geonameId=3020251"&gt;http://ws.geonames.org/rdf?geonameId=3020251&lt;/a&gt;), but also there is an ontology specifying the relationships amongst geographical features. Perhaps another good opportunity for biodiversity informatics to reuse, rather than reinvent.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;(Via &lt;a href="http://dannyayers.com/2006/10/17/recycled-links"&gt;Danny Ayers&lt;/a&gt;.)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-116111057387822650?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/116111057387822650/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=116111057387822650' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116111057387822650'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/116111057387822650'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/10/geonames-ontology-in-owl-gswb.html' title='&#xA;   Geonames ontology in OWL - GSWB'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115990191955902468</id><published>2006-10-03T19:54:00.000+01:00</published><updated>2006-10-03T19:58:39.570+01:00</updated><title type='text'>Formicidae on Flickr</title><content type='html'>&lt;a href="http://photos1.blogger.com/blogger/4123/605/1600/Formicidae.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/Formicidae.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.simon.rycroft.name/"&gt;Simon Rycroft&lt;/a&gt; wrote a nice script to pull all the ant pictures off Flickr. Makes for an interesting image.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115990191955902468?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115990191955902468/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115990191955902468' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115990191955902468'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115990191955902468'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/10/formicidae-on-flickr.html' title='Formicidae on Flickr'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115979976722276646</id><published>2006-10-02T15:35:00.000+01:00</published><updated>2006-10-02T15:36:07.230+01:00</updated><title type='text'>Ants on Flickr</title><content type='html'>&lt;img src="http://static.flickr.com/24/51876254_f9507479fe_m.jpg" align="right"/&gt;&lt;br /&gt;More an insight into how I waste my life than anything else, but saw &lt;a href="http://theantroom.blogspot.com/2006/09/diy-ant-sudoku.html"&gt;this post&lt;/a&gt; and followed the link to &lt;a href="http://www.flickr.com/photos/ants_in_my_pants/sets/1125490/"&gt;JochenB's set of ant photos&lt;/a&gt; on Flickr, which includes some stunning pictures of ants, such as this one of &lt;i&gt;Eciton burchelii&lt;/i&gt; (see &lt;a href="http://www.flickr.com/photos/ants_in_my_pants/51876254/"&gt;original here&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Kewl. Now, we want to get this into a triple store, so I need to play with the Flickr API to convert it to RDF, perhaps using &lt;a href="http://www.kanzaki.com/works/2005/imgdsc/flickr2rdf"&gt;Flickr photo into RDF&lt;/a&gt; as a starting point. Note that we get a wealth of tags, plus information on licensing from the Flickr record.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115979976722276646?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115979976722276646/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115979976722276646' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115979976722276646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115979976722276646'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/10/ants-on-flickr.html' title='Ants on Flickr'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115955038177051853</id><published>2006-09-29T18:19:00.000+01:00</published><updated>2006-09-29T18:19:42.716+01:00</updated><title type='text'>Why can't we spell!?</title><content type='html'>&lt;img src="http://www.evergreen.edu/ants/genera/apterostigma/species/goniodes/INBIOCRI001284082_l.jpg" align="right"/&gt;&lt;br /&gt;I know I'm not a great speller, but it gets frustrating when you discover how many potentially useful links to information are broken due to typos. For example, I stumbled across &lt;a href="http://www.phorid.net/type%20database/details.asp?key=LACM%20ENT%20164470"&gt;this page&lt;/a&gt;, which states that &lt;strong&gt;LACM ENT 164470&lt;/strong&gt; is the type specimen of &lt;i&gt;Apterostigma gonides&lt;/i&gt;. Hmmm ... problem is, there is no such species. What they meant was &lt;i&gt;Apterostigma goni&lt;u&gt;o&lt;/u&gt;des&lt;/i&gt; (note the missing "o"). The fact that the only &lt;a href="http://www.google.com/search?q=Apterostigma%20gonides&amp;ie=utf-8&amp;oe=utf-8"&gt;Google hits&lt;/a&gt; for "Apterostigma gonides" is the LACM page itself is a clue that something's up. &lt;br /&gt;&lt;br /&gt;So, a potentially useful resource listing types housed at the &lt;a href="http://www.lam.mus.ca.us/"&gt;Natural History Museum of Los Angeles County&lt;/a&gt; looses value through a typo. Argh!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115955038177051853?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115955038177051853/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115955038177051853' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115955038177051853'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115955038177051853'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/09/why-cant-we-spell.html' title='Why can&apos;t we spell!?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115946088927927735</id><published>2006-09-28T16:50:00.000+01:00</published><updated>2006-09-28T17:28:09.710+01:00</updated><title type='text'>Organizing the Ant Internet - from The Ant Room</title><content type='html'>From &lt;a href="http://theantroom.blogspot.com/index.html"&gt;The Ant Room&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;a href="http://theantroom.blogspot.com/2006/08/organizing-ant-internet.html"&gt;Organizing the Ant Internet&lt;/a&gt;&lt;br /&gt;This is what I am thinking about today --  I have this organization thing.  I have a very strong need to have things in my life organized.  A big pile of papers and junk drives me crazy.  I just want to go through them all and put them into categories and file them away or throw away the trash and make everything look nice and neat.  I get this feeling sometimes when I am browsing through the internet and looking at ant sites.   I want to take them all off the web, look at them, clean them up a bit, throw away the junk, and put them all together in one well-organized drawer.   There are so many ant sites nowadays and each and everyone seems to want to have everything you could ever want from an ant site, but none of them do.  And I just think, if they could all get together, you really would have the best ant site ever.  I'm not really sure why they don't.  Even just a little bit of information sharing would be helpful.  For instance, you've got &lt;a href="http://www.antweb.org/index.jsp"&gt;AntWeb&lt;/a&gt;, which is a fabulous website if you are hoping to look up ants from Madagascar, but not if you are hoping to look up ants from Costa Rica.  Why is that?  The ants of Costa Rica have a &lt;a href="http://www.antweb.org/press.jsp"&gt;fabulous webpage&lt;/a&gt;.  It doesn't seem like it would be that difficult to import all of those costa rican ants onto AntWeb.   &lt;a href="http://pick5.pick.uga.edu/mp/20q?guide=Ants_Central_America"&gt;DiscoverLife &lt;/a&gt;has done it.  They don't have any checklists from &lt;a href="http://www.ento.csiro.au/science/ants/default.htm"&gt;Australia &lt;/a&gt;or &lt;a href="http://ant.edb.miyakyo-u.ac.jp/E/index.html"&gt;Japan&lt;/a&gt;, though, which are also two groups of ant fauna with great webpages.   Shouldn't we be trying to incorporate all of this information together?  Even just a link to the other websites would be nice.  It took me forever to figure out where the good websites were.  I don't even trust the lists that are on DiscoverLife now -- I have a list of Tiputini ants on Discoverlife -- it is terribly out of date and I can't figure out how to update the list so I've just let it go.  &lt;a href="http://tolweb.org/Formicidae"&gt;Tree of Life&lt;/a&gt; is another webpage that is basically useless to me.  There are these beautiful photos  but when you get down to the species level, you get a statement like "127 described species" but no actual list of species.  Pseudomyrmex, for instance, has no species list on the Tree of Life website.  Why not?  A list certainly exists.  And there are even &lt;a href="http://entomology.ucdavis.edu/faculty/ward/pseudo.html"&gt;labs&lt;/a&gt; that have been looking at this genus for years.    Cephalotes, Procryptocerus, the Attini, Megalomyrmex, Pheidole, and Dolichoderus -- all genera that have no list of species on their tree of life webpage.  Why hasn't someone added more ant information to this website? Or other websites?  It is very frustrating to me.  I wish someone would put me in charge of making one fantastic ant webpage that would incorporate everything.  I know it would drive me insane but it would be very satisfying work. [&lt;a href="http://theantroom.blogspot.com/index.html"&gt;The Ant Room&lt;/a&gt;]&lt;/blockquote&gt;&lt;br /&gt;I guess this is one of the motivations behind SemAnt -- exploring how to integrate diverse resources on ants into a single framework.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115946088927927735?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115946088927927735/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115946088927927735' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115946088927927735'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115946088927927735'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/09/organizing-ant-internet-from-ant-room.html' title='Organizing the Ant Internet - from The Ant Room'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115918688990161219</id><published>2006-09-25T13:21:00.000+01:00</published><updated>2006-09-25T13:21:29.936+01:00</updated><title type='text'>More on Wikis</title><content type='html'>&lt;a href="http://dannyayers.com/2006/09/04/who-writes-wikipedia"&gt;Who writes Wikipedia?&lt;/a&gt;&lt;br /&gt;&lt;p&gt;That's the title of a longish, well-argued, very readable &lt;a href="http://www.aaronsw.com/weblog/whowriteswikipedia"&gt;piece&lt;/a&gt; by Aaron Swartz, the highlight for me being a conclusion he arrived at by mining the data:&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;When you put it all together, the story become clear: an outsider makes one edit to add a chunk of information, then insiders make several edits tweaking and reformatting it. In addition, insiders rack up thousands of edits doing things like changing the name of a category across the entire site -- the kind of thing only insiders deeply care about. As a result, insiders account for the vast majority of the edits. But &lt;strong&gt;it's the outsiders who provide nearly all of the content&lt;/strong&gt;.&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;(My emphasis). This contrasts with the results of some shallower research done by Jimbo Wales, and this kind of thing is why I for one would like to see Aaron on the Wikipedia board (although I couldn't &lt;a href="http://en.wikipedia.org/wiki/User:AaronSw/Election"&gt;vote&lt;/a&gt; because I've done &amp;lt;400 edits).&amp;#160;&lt;/p&gt; [&lt;a href="http://dannyayers.com/"&gt;Raw&lt;/a&gt;]&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115918688990161219?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115918688990161219/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115918688990161219' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115918688990161219'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115918688990161219'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/09/more-on-wikis.html' title='More on Wikis'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115875892501067758</id><published>2006-09-20T14:12:00.000+01:00</published><updated>2006-09-20T14:28:45.023+01:00</updated><title type='text'>Adding triples using EditGrid</title><content type='html'>&lt;a href="http://www.evergreen.edu/ants/genera/acromyrmex/species/coronatus/INBIOCRI001284215_face.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 200px;" src="http://www.evergreen.edu/ants/genera/acromyrmex/species/coronatus/INBIOCRI001284215_face.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://darwin.zoology.gla.ac.uk/~vsmith/"&gt;Vince Smith&lt;/a&gt; has constantly been telling me that for many biologists, "database" means an Excel spreadsheet, and that a big problem is simply getting data into a form that can be used online. Bearing that in mind, and also mindful of how much data is kicking around that isn't in "real" databases, I've been playing with &lt;a href="http://www.editgrid.com/home"&gt;EditGrid&lt;/a&gt; as a tool for adding triples to a triple store. I've &lt;a href="http://iphylo.blogspot.com/2006/08/collaborative-data-matrices-using_29.html"&gt;commented on EditGrid&lt;/a&gt; elsewhere in the context of collaborative data matrices.&lt;br /&gt;&lt;br /&gt;So, here's the situation. In my triple store I have information on ant specimen &lt;a href="http://www.antweb.org/specimen.do?name=inbiocri001284215"&gt;INBIOCRI001284215&lt;/a&gt;, obtained from AntWeb. Now, AntWeb has no pictures of this specimen. However, John Longino's pages on &lt;a href="http://www.evergreen.edu/ants/genera/acromyrmex/species/coronatus/coronatus.html"  target="_new"&gt;&lt;i&gt;Acromyrmex coronatus&lt;/i&gt;&lt;/a&gt; include pictures of this specimen. How do I get that information into my triple store, without writing RDF?&lt;br /&gt;&lt;br /&gt;One approach is to create a spreadsheet with three columns (subject, predicate, object), and create the triples, one per row. Now, I could just do this on my computer using, say, Excel, but that's not nearly cool enough, so I'll use EditGrid. But seriously, I'm going to use EditGrid because:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;You can see it, whereas you can't see a file on my computer&lt;/li&gt;&lt;li&gt;You and I could collaborate on editing the data in EditGrid&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;a href="http://photos1.blogger.com/blogger/4123/605/1600/semant_editgrid.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/semant_editgrid.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;The spreadsheet contains triples, such as these:&lt;br /&gt;&lt;table border="1"&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;subject&lt;/td&gt;&lt;td&gt;predicate&lt;/td&gt;&lt;td&gt;object&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;&lt;a href="http://www.evergreen.edu/ants/genera/acromyrmex/species/coronatus/INBIOCRI001284215_face_orig.jpg"&gt;http://www.evergreen.edu/ ... /INBIOCRI001284215_face_orig.jpg&lt;/a&gt;&lt;/td&gt;&lt;td&gt;foaf:depicts&lt;/td&gt;&lt;td&gt;&lt;a href="http://www.antweb.org/specimen.do?name=inbiocri001284215"&gt;http://www.antweb.org/ ... inbiocri001284215&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;In this case the subject and the object are represented by URIs (here they are URLs, but they could also be LSIDs or DOIs). You can see the complete spreadsheet &lt;a href="http://www.editgrid.com/user/rdmpage/SemAnt"  target="_new"&gt;here&lt;/a&gt;. The triples link the picture to the specimen, tell us that http://www.evergreen.edu/ants/genera/acromyrmex/species/coronatus/INBIOCRI001284215_face_orig.jpg is a picture (dc:type image), that the picture has a thumbnail, and is of &lt;i&gt;Acromyrmex coronatus&lt;/i&gt;. Armed with these triples, I can now find a picture of this ant in my triple store.&lt;br /&gt;&lt;br /&gt;Fine so far, but how do we get this into the triple store I hear you ask? EditGrid's permalink feature can export the spreadsheet in a range of formats, including XML. So, what I do is grab the &lt;a href="http://www.editgrid.com/user/rdmpage/SemAnt.xml" target="_new"&gt;XML&lt;/a&gt;, apply a XSL style sheet to convert it to RDF, then import the resulting RDF into the triple store. The key thing is once the data is in the spreadsheet, the rest is trivial. Here's the XSL style sheet. It has limitations, notably the assumption that URIs are URLs.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;?xml version='1.0' encoding='iso-8859-1'?&amp;gt;&lt;br /&gt;&amp;lt;xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&lt;br /&gt;    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" &lt;br /&gt;    xmlns:foaf="http://xmlns.com/foaf/0.1/" &lt;br /&gt;    xmlns:dc="http://purl.org/dc/elements/1.1/"&amp;gt;&lt;br /&gt;    &amp;lt;xsl:output method="xml" version="1.0" encoding="iso-8859-1" indent="yes"/&amp;gt;&lt;br /&gt;    &amp;lt;xsl:template match="workbook"&amp;gt;&lt;br /&gt;        &amp;lt;rdf:RDF&amp;gt;&lt;br /&gt;            &amp;lt;xsl:apply-templates select="//row"/&amp;gt;&lt;br /&gt;        &amp;lt;/rdf:RDF&amp;gt;&lt;br /&gt;    &amp;lt;/xsl:template&amp;gt;&lt;br /&gt;    &amp;lt;xsl:template match="row"&amp;gt;&lt;br /&gt;        &amp;lt;xsl:if test="@row != '0'"&amp;gt;&lt;br /&gt;            &amp;lt;xsl:element name="rdf:Description"&amp;gt;&lt;br /&gt;                &amp;lt;xsl:attribute name="rdf:about"&amp;gt;&lt;br /&gt;                    &amp;lt;xsl:value-of select="cell[1]/@input"/&amp;gt;&lt;br /&gt;                &amp;lt;/xsl:attribute&amp;gt;&lt;br /&gt;                &amp;lt;xsl:variable name="predicate" select="cell[2]/@input"/&amp;gt;&lt;br /&gt;                &amp;lt;xsl:variable name="object" select="cell[3]/@input"/&amp;gt;&lt;br /&gt;                &amp;lt;xsl:choose&amp;gt;&lt;br /&gt;                    &amp;lt;xsl:when test="contains($object, 'http://')"&amp;gt;&lt;br /&gt;                        &amp;lt;xsl:element name="{$predicate}"&amp;gt;&lt;br /&gt;                            &amp;lt;xsl:attribute name="rdf:resource"&amp;gt;&lt;br /&gt;                                &amp;lt;xsl:value-of select="$object"/&amp;gt;&lt;br /&gt;                            &amp;lt;/xsl:attribute&amp;gt;&lt;br /&gt;                        &amp;lt;/xsl:element&amp;gt;&lt;br /&gt;                    &amp;lt;/xsl:when&amp;gt;&lt;br /&gt;                    &amp;lt;xsl:otherwise&amp;gt;&lt;br /&gt;                        &amp;lt;xsl:element name="{$predicate}"&amp;gt;&lt;br /&gt;                            &amp;lt;xsl:value-of select="$object"/&amp;gt;&lt;br /&gt;                        &amp;lt;/xsl:element&amp;gt;&lt;br /&gt;                    &amp;lt;/xsl:otherwise&amp;gt;&lt;br /&gt;                &amp;lt;/xsl:choose&amp;gt;&lt;br /&gt;            &amp;lt;/xsl:element&amp;gt;&lt;br /&gt;        &amp;lt;/xsl:if&amp;gt;&lt;br /&gt;    &amp;lt;/xsl:template&amp;gt;&lt;br /&gt;&amp;lt;/xsl:stylesheet&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This particular spreadsheet makes some assumptions of the user, namely that they can figure out what is the subject and what is the object, and are comfortable choosing predicates. However, being collaborative, others could help out by editing the spreadsheet. Furthermore, one could create spreadsheets that aren't quite so complicated, and aren't geared towards the developer. For example, one basic source of information I'd like to capture is geographic location, and there is probably a lot more information available in papers than in georeferenced museum collections. Hence, a spreadsheet like this&lt;br /&gt;&lt;table border="1"&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;observation&lt;/td&gt;&lt;td&gt;lat&lt;/td&gt;&lt;td&gt;long&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;locality&lt;/td&gt;&lt;td&gt;-34.0&lt;/td&gt;&lt;td&gt;156.26&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;could be used to capture locality information, and would require minimal effort to convert into RDF. We'd just have to modify the XSL style sheet shown above.&lt;br /&gt;&lt;br /&gt;The key point of all of this is that with minimal effort we can capture information that is not in the triple store, and we can make it eas(ish) for people with data to contribute. Given that EditGrid can import Excel files, somebody interested in sharing their data could do the grunt work in Excel on their own computer, then move everything to EditGrid, which makes it accessible to others.&lt;br /&gt;&lt;br /&gt;Simple and open wins...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115875892501067758?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115875892501067758/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115875892501067758' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115875892501067758'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115875892501067758'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/09/adding-triples-using-editgrid.html' title='Adding triples using EditGrid'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115764805990617088</id><published>2006-09-07T17:49:00.000+01:00</published><updated>2006-09-07T17:55:38.190+01:00</updated><title type='text'>Connotea tags</title><content type='html'>The following SPARQL query returns the "tags" for a Connotea reference using the DOI as the search term (in this case &lt;a href="http://dx.doi.org/10.1007/bf02224026"&gt;doi:10.1007/bf02224026&lt;/a&gt;):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;PREFIX dc: &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;&lt;br /&gt;SELECT ?subject&lt;br /&gt;WHERE {&lt;br /&gt;  ?doi ?bnode 'doi:10.1007/bf02224026'&lt;br /&gt;. ?connoteaURI ?identifier ?doi &lt;br /&gt;. ?item ?connotea ?connoteaURI&lt;br /&gt;. ?item dc:subject ?subject&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The graph being queried is this &lt;a href="http://www.connotea.org/rss/uri/1f458e3a179449c1338bf62430892847"&gt;RSS file&lt;/a&gt;, which I've put in a triple store.&lt;br /&gt;This query is simply following the path in the RDF from the DOI &amp;lt;connotea:idValue&amp;gt;10.1007/bf02224026&amp;lt;/connotea:idValue&amp;gt;.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Note:&lt;/strong&gt; One potential "gotcha" is that DOI's are not case sensitive, but SPARQL queries are (oh oh).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115764805990617088?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115764805990617088/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115764805990617088' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115764805990617088'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115764805990617088'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/09/connotea-tags.html' title='Connotea tags'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115749938987943233</id><published>2006-09-06T00:27:00.000+01:00</published><updated>2006-09-06T00:46:15.200+01:00</updated><title type='text'>GenBank extras</title><content type='html'>Idly playing with ants, it is time to blog two things that come up a few times. The first is that GenBank has links to literature that could do with updating. For example, the sequence &lt;a href="http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&amp;val=595456"&gt;U11912&lt;/a&gt; from the fungus "Atta mexicana symbiont JF-1" is listed as being published in &lt;br /&gt;&lt;pre&gt;&lt;br /&gt;  AUTHORS   Rehner,S.A., Chapela,I.H., Schultz,T.R. and Mueller,U.G.&lt;br /&gt;  TITLE     Evolutionary history of the symbiosis between fungus-growing ants&lt;br /&gt;            and their fungi&lt;br /&gt;  JOURNAL   Unpublished&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Well, this was published in &lt;i&gt;Science&lt;/i&gt; (&lt;a href="http://dx.doi.org/10.1126/science.266.5191.1691"&gt;doi:10.1126/science.266.5191.1691&lt;/a&gt;) in 1994. The DOI seems broken (sigh), so here is a &lt;a href="http://www.sciencemag.org/cgi/content/abstract/266/5191/1691"&gt;direct link&lt;/a&gt;. Ulrich Mueller's web site has a &lt;a href="http://www.biosci.utexas.edu/ib/faculty/mueller/pubs/evol_hist_fungus_ants.pdf"&gt;link to the PDF&lt;/a&gt;.&lt;br /&gt;The other point is that searching the nucleotide database for "Atta mexicana" turns up no ants, but the above mentioned fungus. We get the hit because there is a line in the GenBank record that lists the ant host.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;/specific_host="Atta mexicana"&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;As I've mentioned over on the &lt;a href="http://ispecies.blogspot.com/2006/03/building-encyclopedia-of-life.html"&gt;iSpecies&lt;/a&gt; blog, GenBank records often contain this sort of useful information. Hence, we could search for ants and extract information about their fungal associates.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115749938987943233?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115749938987943233/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115749938987943233' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115749938987943233'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115749938987943233'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/09/genbank-extras.html' title='GenBank extras'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115705329646981037</id><published>2006-08-31T20:23:00.000+01:00</published><updated>2006-08-31T20:41:36.553+01:00</updated><title type='text'>LSIDs and DOIs for ant (and other hymenopteran) literature</title><content type='html'>I've now got LSIDs for literature served by the &lt;a href="http://atbi.biosci.ohio-state.edu:210/hymenoptera/nomenclator.home_page"&gt;Hymenoptera Name Server&lt;/a&gt; and &lt;a href="http://www.antbase.org"&gt;Antbase&lt;/a&gt;. As an added bonus, I serve DOIs where they exist. The RDF metadata for the LSIDs are generated on the fly by querying the Hymenoptera Name Server using a simple XML service Norm Johnson provided. Given a URL of the form &lt;a href="http://atbi.biosci.ohio-state.edu:210/hymenoptera/manage_lit.ris2xml?id="&gt;http://atbi.biosci.ohio-state.edu:210/hymenoptera/manage_lit.ris2xml?id=&lt;/a&gt; the service returns a XML document for the corresponding reference. I simply transform this into RDF.&lt;br /&gt;However, as an added feature, if the publication is an article I use Crossref's &lt;a href="http://www.crossref.org/02publishers/openurl_info.html"&gt;OpenURL resolver&lt;/a&gt; to see if a DOI exists for the publication -- if so, this gets added to the RDF as a &amp;lt;dc:identifier&amp;gt; tag. &lt;br /&gt;Like most other LSIDs that I serve, the service is slower than it could be because everything is done on the fly, and in this case two calls may be needed to generate the metadata (see diagram below). The authority is &lt;strong&gt;antbase.org.lsid.zoology.gla.ac.uk&lt;/strong&gt;, and the namespace is &lt;strong&gt;pub&lt;/strong&gt;. Here is an example LSID, which you can either click on in your browser (&lt;a href="http://lsidres.org/urn:lsid:antbase.org.lsid.zoology.gla.ac.uk:pub:3080"&gt;urn:lsid:antbase.org.lsid.zoology.gla.ac.uk:pub:3080&lt;/a&gt;) or using Launchpad (&lt;a href="lsidres://urn:lsid:antbase.org.lsid.zoology.gla.ac.uk:pub:3080"&gt;urn:lsid:antbase.org.lsid.zoology.gla.ac.uk:pub:3080&lt;/a&gt;) &lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/4123/605/1600/HymenRDF.0.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/400/HymenRDF.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;One encouraging observation is that DOIs are not restricted to recent literature. For example, urn:lsid:antbase.org.lsid.zoology.gla.ac.uk:pub:3080 dates from 1952, but has a DOI (&lt;a href="http://dx.doi.org/10.2307/2422200"&gt;doi:10.2307/2422200&lt;/a&gt;).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115705329646981037?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115705329646981037/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115705329646981037' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115705329646981037'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115705329646981037'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/08/lsids-and-dois-for-ant-and-other.html' title='LSIDs and DOIs for ant (and other hymenopteran) literature'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115680290117881958</id><published>2006-08-28T23:05:00.001+01:00</published><updated>2006-09-22T08:13:58.143+01:00</updated><title type='text'>Wikis</title><content type='html'>Triple stores containing specimens, sequences, images, literature, etc. are all very well, but there is a lot of information that is not captured by such a system. For human users (as opposed to dumb computers), often a simple summary is more informative than a set of images and a map, especially if that summary mentions something interesting, such as why the "Google ant" was so named, or that the trap-jaw ant &lt;i&gt;Odontomachus bauri&lt;/i&gt; has incredibly fast jaws (&lt;a href="http://dx.doi.org/10.1073/pnas.0604290103"&gt;doi:10.1073/pnas.0604290103&lt;/a&gt;). There is also a lot of information that may one day be semantically encoded, but for now will only be captured as text (such as extensive commentaries on web sites). &lt;br /&gt;&lt;br /&gt;Lastly, there is the issue of getting people involved. One of the striking things about biodiversity web sites is the lack of community involvement. My question is "why is this the case?" Here are some thoughts:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;If feedback consists of sending an email to the person/organisation running the site, then there's little incentive to get involved. Why bother typing a detailed summary of why a particular piece of information is wrong, only to have that essay disappear into a black hole?&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Annotations can be valuable, but there is an issue of trust. Why invest effort in a site that may be a short-lived toy? There are so many sites competing for attention.&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;One way to foster involvement might be through a Wiki, whereby anybody can contribute to writing a page about each organism. However, biodiversity Wikis such as Wikispecies have, in my opinion, been a spectacular failure, as evidenced by the number of missing pages or stubs. Perhaps part of the reason is the lack of content, which could be addressed by pre-populating each page with basic information from a database (such as name, any specimens, images, literature, etc.). In other words, each page would start with the level of detail of an &lt;a href="http://ispecies.org"&gt;iSpecies&lt;/a&gt; report (for background to iSpecies visit &lt;a href="http://ispecies.blogspot.com"&gt;my iSpecies blog&lt;/a&gt;). As Kevin Kelly commented at the recent Google Sci Foo camp, people are much more likely to edit existing content than create content &lt;i&gt;de novo&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Wikis are all very well, but my major worry is the potential to &lt;strong&gt;loose&lt;/strong&gt; information. Here's the problem. Suppose I generate a Wiki page for an ant, and include information on its distribution. What happens if the underlying distribution data changes? Now that the page is in Wiki form, it will be out of date. Furthermore, I'm not sure I want users editing distribution records -- these should really be edited at the level of the source database, so that the changes propagate to other users of those data.&lt;br /&gt;&lt;br /&gt;One possibility is to use custom tags in the Wiki. When the HTML page is generated, Wiki tags are rendered in the usual way, but the custom tags are replaced by the results of a database call (for example, a SPARQL query). Hence, something like &lt;font face="Courier"&gt;%DISTRIBUTION&lt;/font&gt; would be replaced by a Google map of the specimens for that taxon. This would mean that the distribution map would always reflect the current database, and the user (assuming they don't delete the &lt;font face="Courier"&gt;%DISTRIBUTION&lt;/font&gt; tag itself) won't be able to alter that information. Of course, this means we need some mechanism for users to inform the curators of the source data of any potential errors. This is particularly important if we pre-populate the Wiki page with information that may be incorrect (such as images harvested from a search engine).&lt;br /&gt;&lt;br /&gt;We could also help things by encouraging standard ways of linking to other resources, or storing data. For example, say a user edits a page and adds a citation to a paper that isn't in the underlying triple store. Ideally we would get that new paper into the triple store (rather than have it languish in the Wiki text). There are some ways to do this, such as extracting metadata from DOIs, and using local links [need to think about this].&lt;br /&gt;&lt;br /&gt;Likewise, with images, if we have a convention that images get posted to, say, Flickr, then we have a means for storing metadata about those images directly in our triple store.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Credits&lt;/b&gt;&lt;br /&gt;These ideas have partly come out of conversations with Rebecca Shapley at Google, and Dave Thau at the  California Academy of Sciences.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115680290117881958?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115680290117881958/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115680290117881958' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115680290117881958'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115680290117881958'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/08/wikis_28.html' title='Wikis'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115522956158604154</id><published>2006-08-10T18:00:00.000+01:00</published><updated>2006-08-10T18:06:01.603+01:00</updated><title type='text'>So that's how they do it - adding a message to mailto</title><content type='html'>The href="mailto:" tag is useful, and I know how to add a subject, but today I learnt how to add a message as well. The example comes from the DOI web site:&lt;br /&gt;&lt;br /&gt;This bit of code:&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&amp;lt;p&amp;gt;If you believe you have requested a DOI name that should be found, you may report this error to &lt;br /&gt;&amp;lt;a href="mailto:doi-help@doi.org?subject=DOI Not Found&amp;body=The%20following%20DOI%20was%20not%20found:%0D%0A...f"&gt;doi-help@doi.org&amp;lt;/a&gt;.  &lt;br /&gt;&amp;lt;b&amp;gt;Please include information regarding where you found the DOI in your message: http://www.journals.royalsoc.ac.uk/....pdf&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Gives this result:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;If you believe you have requested a DOI name that should be found, you may report this error to &lt;br /&gt;&lt;a href="mailto:doi-help@doi.org?subject=DOI Not Found&amp;body=The%20following%20DOI%20was%20not%20found:%0D%0A10.1098/rsbl.2006.0523.%20%0D%0AThe%20referring%20page%20was:%20%0D%0Ahttp://www.journals.royalsoc.ac.uk/media/d6lntlwywjn3hdu6ekdw/contributions/0/5/8/3/058352377848735w.pdf"&gt;doi-help@doi.org&lt;/a&gt;.  &lt;br /&gt;&lt;b&gt;Please include information regarding where you found the DOI in your message: http://www.journals.royalsoc.ac.uk/media/d6lntlwywjn3hdu6ekdw/contributions/0/5/8/3/058352377848735w.pdf&lt;/b&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;The &amp;amp;body= tag gives the text of the email message. Neat.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115522956158604154?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115522956158604154/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115522956158604154' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115522956158604154'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115522956158604154'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/08/so-thats-how-they-do-it-adding-message.html' title='So that&apos;s how they do it - adding a message to mailto'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115306311790151607</id><published>2006-07-16T16:18:00.000+01:00</published><updated>2006-07-16T16:26:26.156+01:00</updated><title type='text'>Taxonomic treatments</title><content type='html'>For a demo for Donat Agosti, I've added some taxonomic treatments to iSpecies.org. I'd done this before, but after the server got hacked I didn't restore the treatments because they are served via a triple store, and I hadn't got that running. Now that the triple store is up, I looked at this again. Here's what is involved.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Treatments&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;TaxonX is a XML mark up for taxonomic descriptions. The idea is to locate and mark blocks of text that describe a taxon. For more details see the &lt;a href="http://research.amnh.org/informatics/taxlit"&gt;AMNH's NSF Taxonomic Literature Projectpages&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Donat has been marking up various ant papers manually as proof of concept, but the process will be automated. I want to be able to serve up a taxonomic description of a name, e.g. "Proceratium google". Because I want everything to be in a triple store, I need to map TaxonX to RDF. Here's what I do:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;The URI of the paper is the link to the PDF in AntBase. This should really be something else (LSID, DOI, Handle, PURL), but it will do for now.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Each treatment is extracted from the TaxonX document using XPath. I use a Perl script to pull out each node matching &lt;strong&gt;//tax:treatment&lt;/strong&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Each treatment gets a URI, based on the URI of the paper containing the treatment, and the XPath to the treatment, e.g. &lt;strong&gt;http://antbase.org/ants/publications/8538_fisher//tax:treatment[1]&lt;/strong&gt;. The idea is that one could use the identifier to extract the relevant block of text from the TaxonX XML document (i.e., the identifier would be useful beyond my triple store). Although I worry that this is not semantically opaque, its seems a useful idea, and my worries eased when I discovered that &lt;a href="http://www.w3.org/2001/Annotea/"&gt;Annotea&lt;/a&gt; uses the same idea.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;The actual treatment is stored as a block of &amp;lt;![CDATA[..]]&amp;gt;, so the original TaxonX markup is preserved.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Each treatment is linked to the containing paper by the Dublin core term &lt;strong&gt;&amp;lt;dcterms:isPartOf&amp;gt;&lt;/strong&gt;. I also have the inverse link &lt;strong&gt;&amp;lt;dcterms:hasPart&amp;gt;&lt;/strong&gt; to link the publication to the treatments it contains.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;I have some minimal metadata about the publication (title, format), and about each treatment (name of taxon stored in &lt;strong&gt;&amp;lt;dc:subject&amp;gt;&lt;/strong&gt;). This is extracted from what is in the TaxonX document - clear TaxonX needs more information on the source.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Each treatment is typed using &amp;lt;dc:type&amp;gt;treatment&amp;lt;dc:type&amp;gt;. I do this so that I can classify results for a query (as part of another project).&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;So, a publication is modelled like this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/4123/605/1600/publication.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/publication.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;And a treatment is modelled like this. &lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/4123/605/1600/treatment.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/treatment.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;SPARQL&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Currently iSpecies treatments are retrieved using RDQL, but SPARQL is rather nicer. Finding the treatment for a taxon is a simple SPARQL query, e.g.:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;PREFIX gla: &amp;lt;urn:lsid:lsid.zoology.gla.ac.uk:predicates:&amp;gt;&lt;br /&gt;PREFIX dc: &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;&lt;br /&gt;PREFIX dcterms: &amp;lt;http://purl.org/dc/terms/&amp;gt;&lt;br /&gt;PREFIX tax: &amp;lt;http://research.amnh.org/informatics/taxlit/taxonx/taxonx1&amp;gt;&lt;br /&gt;SELECT ?uri ?publication ?title ?treatment&lt;br /&gt;WHERE {?uri dc:subject 'Proceratium google'&lt;br /&gt;   ?uri gla:treatment ?treatment .&lt;br /&gt;   ?publication dcterms:hasPart ?uri .&lt;br /&gt;   ?publication dc:title ?title&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Display&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;To display the results I take the SPARQL XML result, convert the encoded TaxonX block to XML mark up, then apply a simple XSLT style sheet. The results aren't pretty, but it works.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Future directions&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;As I've mentioned in an &lt;a href="http://semant.blogspot.com/2006/06/taxonomic-markup-and-guids.html"&gt;earlier post&lt;/a&gt;, what I'd really like is to have GUIDs for these publications sorted out, and more mark up. In particular, literature cited, specimens, and other taxonomic names should be marked up so that these links can be extracted. If this is done well, then we could do things like:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Generate distribution maps for papers that don't have maps&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Generate synonymies from lists of names&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Infer type status even if specimen databases don't have this information&lt;/li&gt;&lt;br /&gt;&lt;li&gt;etc.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;The trick will transforming TaxonX to RDF.&lt;br /&gt;&lt;br /&gt;Currently playing in iTunes: &lt;i&gt;Shelter&lt;/i&gt; by Ray LaMontagne&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115306311790151607?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115306311790151607/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115306311790151607' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115306311790151607'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115306311790151607'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/07/taxonomic-treatments.html' title='Taxonomic treatments'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115235363740854718</id><published>2006-07-08T11:13:00.000+01:00</published><updated>2006-07-08T11:13:57.410+01:00</updated><title type='text'>Disconnected databases</title><content type='html'>One consequence of having multiple databases is that they can get out of sync, that is, information in one database might not be updated to reflect changes in another. I've touched on this earlier when discussing unidentified ants in GenBank (&lt;a href="http://semant.blogspot.com/2006/06/discovering-new-things.html"&gt;Discovering new things&lt;/a&gt; and &lt;a href="http://iphylo.blogspot.com/2006/05/ants-rdf-and-triple-stores.html"&gt;Ants, RDF, and triple stores&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;&lt;img src="http://www.antweb.org/images/casent0005630/casent0005630_p_1_med.jpg" align="right" width="128" hspace="4"/&gt;&lt;br /&gt;I've also come across cases where &lt;a href="http://www.antweb.org"&gt;AntWeb&lt;/a&gt; is out of date. For example, the ant &lt;i&gt;Strumigenys rubigus&lt;/i&gt; was described in 2000 by Brian Fisher. In the &lt;a href=http://research.amnh.org/informatics/taxlit"&gt;TaxonX&lt;/a&gt; marked up version of the original paper (available &lt;a href="http://antbase.org/ants/publications/ocr_xml/8538_fisher_taxonx.html"&gt;here&lt;/a&gt;), the holotype is listed as:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;Holotype worker, Madagascar: Prov. Toamasina, F.C. Andrianantantely, 18 deg. 41.7 min. S, 48 deg. 48.8 min. E, 530 m 4-10.xii.1998, ex rotten log, rainforest, #49-2 (H.J. Ratsirarson) (MCZ).&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src="http://www.antweb.org/images/casent0005630/casent0005630_l_1_low.jpg" align="left" hspace="4"/&gt;Now, in an ideal, joined-up world, we'd have a link from the Fisher paper to the actual specimen. A bit of fussing (i.e., searching for "Strumigenys rubigus" on AntWeb) reveals that the holotype is &lt;a href="http://www.antweb.org/specimen.do?name=casent0005630&amp;shot=p1&amp;project="&gt;casent0005630&lt;/a&gt;. The AntWeb page for this ant has no indication that this is the holotype, although there is a picture of the specimen labels that make it clear that this is what it is.&lt;br /&gt;&lt;br /&gt;Having multiple sources of information makes it harder to keep things up to date, which is another reason why I think RDF and triple stores (or distributed queries) will help. So long as we have metadata about the specimen and the publication, we can make the inference that casent0005630 is the holotype of &lt;i&gt;Strumigenys rubigus&lt;/i&gt;. This may ease the burden on individual databases. Rather than the curators of AntWeb having to update AntWeb manually every time a new name is published, a portal along the lines of my SemAnt toy could summarise this new information easily, if (and it's a big if) we have the links between publication and specimen.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115235363740854718?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115235363740854718/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115235363740854718' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115235363740854718'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115235363740854718'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/07/disconnected-databases.html' title='Disconnected databases'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115172092900893851</id><published>2006-07-01T03:28:00.000+01:00</published><updated>2006-07-01T03:28:49.056+01:00</updated><title type='text'>SPARQL query for classification</title><content type='html'>After spidering &lt;a href="http://www.ubio.org"&gt;uBio&lt;/a&gt; the next task is how to display the classification of a taxon. The following SPARQL query does the trick:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;PREFIX ubio: &amp;lt;urn:lsid:ubio.org:predicates:&amp;gt;&lt;br /&gt;PREFIX gla: &amp;lt;urn:lsid:lsid.zoology.gla.ac.uk:predicates:&amp;gt;&lt;br /&gt;PREFIX rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt;&lt;br /&gt;PREFIX dc: &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;&lt;br /&gt;&lt;br /&gt;SELECT DISTINCT ?node, ?title, ?rank, ?name, ?description&lt;br /&gt;&lt;br /&gt;WHERE {&lt;br /&gt;   &amp;lt;urn:lsid:ubio.org:namebank:2735665&amp;gt; gla:objectiveSynonym ?display .&lt;br /&gt;   ?class ubio:namebankIdentifier ?display .&lt;br /&gt;   ?class ubio:classificationName ?name&lt;br /&gt;   ?class ubio:classificationDescription ?description .&lt;br /&gt;   ?class gla:lineage ?seq .&lt;br /&gt;   ?seq ?li ?node .&lt;br /&gt;   ?node dc:title ?title .&lt;br /&gt;   ?node gla:rank ?rank&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This query takes a canonical name in uBio, finds the display form, and from that the classification. The lines&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;   ?class gla:lineage ?seq .&lt;br /&gt;   ?seq ?li ?node .&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;fetch the lineage which is stored as a sequence:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;gla:lineage&amp;gt;&lt;br /&gt;&amp;lt;rdf:Seq&amp;gt;&lt;br /&gt;&amp;lt;rdf:li rdf:resource="urn:lsid:ubio.org:classificationbank:5178917"/&amp;gt;&lt;br /&gt;&amp;lt;rdf:li rdf:resource="urn:lsid:ubio.org:classificationbank:5095531"/&amp;gt;&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;&amp;lt;rdf:li rdf:resource="urn:lsid:ubio.org:classificationbank:5131593"/&amp;gt;&lt;br /&gt;&amp;lt;/rdf:Seq&amp;gt;&lt;br /&gt;&amp;lt;/gla:lineage&amp;gt; &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This method of describing a taxonomic lineage was described in my paper in &lt;a href="http://jbi.nhm.ku.edu/index.php/jbi/article/view/25"&gt;&lt;br /&gt;&lt;i&gt;Biodiversity Informatics&lt;/i&gt;&lt;br /&gt;&lt;/a&gt;. uBio serves the lineage from lower to higher taxon (i.e., bottom up), but I want to display it top down. I do all display using XSLT style sheets, so we use the &amp;lt;xsl:sort order="descending"&amp;gt; trick (see &lt;a href="http://www.velocityreviews.com/forums/t292299-xslt-select-nodes-in-reverse-order.html"&gt;here&lt;/a&gt; for an example). Then it's simply a case of indenting each node using &amp;lt;xsl:value-of select="18 * (position()-1)"/&amp;gt;, and borrowing uBio's end.png &lt;img src="http://names.ubio.org/browser/end.png"/&gt; to get the tree effect.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115172092900893851?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115172092900893851/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115172092900893851' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115172092900893851'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115172092900893851'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/07/sparql-query-for-classification.html' title='SPARQL query for classification'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115170623317599221</id><published>2006-06-30T23:23:00.000+01:00</published><updated>2006-06-30T23:26:50.820+01:00</updated><title type='text'>Classification</title><content type='html'>Displaying a name by itself isn't very useful, so I'm exploring adding a classification to the ant demo. The question is which one? As a trial I've decided to use ITIS, based on the October 12, 2005 dump used by uBio (you can see it &lt;a href="http://names.ubio.org/browser/classifications.php?conceptID=5095531&amp;namebankID="&gt;here&lt;/a&gt;). There are some &lt;a href="http://www.cbif.gc.ca/pls/itisca/taxachild?p_tsn=154193&amp;p_value=&amp;p_ifx=cbif&amp;p_lang="&gt;15,000 ant taxa&lt;/a&gt; in ITIS.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/4123/605/1600/ubio.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/ubio.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;The plan is to retrieve the classification by spidering the uBio site, starting with the RDF for Formicidae in the ITIS classification (&lt;a href="http://lsidres.org/urn:lsid:ubio.org:classificationbank:5095531"&gt;urn:lsid:ubio.org:classificationbank:5095531&lt;/a&gt;). By following the &amp;lt;ubio:hasChild&amp;gt; tags, we can traverse the complete tree.&lt;br /&gt;&lt;br /&gt;One issue is getting my head around uBio's name structure. I use their &lt;a href="http://names.ubio.org/soap/finditSOAP.php"&gt;FindIT SOAP&lt;/a&gt; service to get LSIDs for names from &lt;a href="http://www.ubio.org/index.php?pagename=namebank"&gt;NameBank&lt;/a&gt;. FindIT returns  canonical name ids, but &lt;a href=http://www.ubio.org/index.php?pagename=classificationbank_home""&gt;ClassificationBank&lt;/a&gt; uses different LSIDs for the names (the "display name"). To give a concrete example, using FindIT to search on "Melissotarsus insularis" yields the LSID &lt;a href="http://lsidres.org/urn:lsid:ubio.org:namebank:2735665"&gt;urn:lsid:ubio.org:namebank:2735665&lt;/a&gt;, whereas in the ITIS classification, the ClassificationBank record (&lt;a href="http://lsidres.org/urn:lsid:ubio.org:classificationbank:5184916"&gt;urn:lsid:ubio.org:classificationbank:5184916&lt;/a&gt;) links to &lt;a href="http://lsidres.org/urn:lsid:ubio.org:namebank:560445"&gt;urn:lsid:ubio.org:namebank:560445&lt;/a&gt;.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/4123/605/1600/Classification.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/Classification.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115170623317599221?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115170623317599221/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115170623317599221' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115170623317599221'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115170623317599221'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/06/classification.html' title='Classification'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115146555064918699</id><published>2006-06-28T04:32:00.000+01:00</published><updated>2006-06-28T04:32:30.653+01:00</updated><title type='text'>TreeBASE rocks</title><content type='html'>&lt;img src="http://www.treebase.org/treebase/icons/treebase.gif" align="right" /&gt;&lt;br /&gt;I gave a talk today ("Dude, where's my tree?") at the &lt;a href="http://www.sunysb.edu/sse2006"&gt;Evolution 2006&lt;/a&gt; meeting at Stony Brook. It was intended as a somewhat tongue-in-check overview of some issues concerning TreeBASE, and broader areas of biodiversity informatics, making use of ants as an example (see my &lt;a href="http://semant.blogspot.com"&gt;SemAnt&lt;/a&gt; project). &lt;br /&gt;&lt;a href="http://www.phylodiversity.net/donoghue/people/michael.html"&gt;Michael Donoghue&lt;/a&gt; took me aside after the talk and made some interesting points. He was a little tired &amp;#8212; understandably &amp;#8212; of hearing that "TreeBASE sucks" (e.g., my &lt;a href="http://iphylo.blogspot.com/2006/02/treebase-talk-at-cipres.html"&gt;CIPRES talk&lt;/a&gt;), and felt that my constantly saying this was counter productive. It could also lead to people not putting their data in TreeBASE because they'd heard that it "sucks".&lt;br /&gt;There is an element of social responsibility here, I guess. I resolutely avoid politics. I don't mean this in a pejorative sense, it's just that I don't have the temperament or skill for it, unlike Michael himself (&lt;a href="http://www.aad.gov.au/default.asp?casid=11240#11241"&gt;Lee Belbin&lt;/a&gt; is another person in this area who strikes me as a very skilled manager).&lt;br /&gt;Now, my talk was intended to be fun, and I was taking the piss out of myself as much as anything. I also think the things we criticise are the things we value the most. But that said, let be make it clear that TreeBASE is very important. As editor of &lt;i&gt;Systematic Biology&lt;/i&gt; I've made authors submit data to it. I have a lot of respect for the work Michael, Bill Piel, and Mike Sanderson put into TreeBASE. If you have phylogenetic data &amp;#8212; submit it to TreeBASE. It's the best we have. It's just that, well, as a community we could do better.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115146555064918699?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115146555064918699/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115146555064918699' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115146555064918699'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115146555064918699'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/06/treebase-rocks.html' title='TreeBASE rocks'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115146279565907110</id><published>2006-06-28T03:46:00.000+01:00</published><updated>2006-06-28T03:46:36.303+01:00</updated><title type='text'>Taxonomic names, metadata, and the Semantic Web</title><content type='html'>My paper "Taxonomic names, metadata, and the Semantic Web" has appeared in &lt;a href="http://jbi.nhm.ku.edu/index.php/jbi/article/view/25"&gt;&lt;i&gt;Biodiversity Informatics&lt;/i&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;Life Science Identifiers (LSIDs) offer an attractive solution to the problem of globally unique identifiers for digital objects in biology. However, I suggest that in the context of taxonomic names, the most compelling benefit of adopting these identifiers comes from the metadata associated with each LSID. By using existing vocabularies wherever possible, and using a simple vocabulary for taxonomy-specific concepts we can quickly capture the essential information about a taxonomic name in the Resource Description Framework (RDF) format. This opens up the prospect of using technologies developed for the Semantic Web to add ``taxonomic intelligence" to biodiversity databases. This essay explores some of these ideas in the context of providing a taxonomic framework for the phylogenetic database TreeBASE.&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115146279565907110?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115146279565907110/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115146279565907110' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115146279565907110'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115146279565907110'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/06/taxonomic-names-metadata-and-semantic.html' title='Taxonomic names, metadata, and the Semantic Web'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115120428692861629</id><published>2006-06-25T03:58:00.000+01:00</published><updated>2006-06-25T03:58:07.516+01:00</updated><title type='text'>Publications via iTunes - kewl!</title><content type='html'>&lt;p&gt;&lt;img src="http://consequently.org/pictures/consequentlyorg_in_itunes.png" align="right" width="300"/&gt;Greg Restall has &lt;a href="http://consequently.org/news/2006/04/08/well_that_was_easy/"&gt;described&lt;/a&gt; how he put his papers into the iTunes music store. How cool is that! A nice demonstration of how RSS makes all sorts of interesting applications possible.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;(Via &lt;a href="http://allmyeye.blogspot.com/2006/05/same-debate-different-forum-self.html"&gt;All My Eye&lt;/a&gt;.)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115120428692861629?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115120428692861629/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115120428692861629' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115120428692861629'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115120428692861629'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/06/publications-via-itunes-kewl.html' title='Publications via iTunes - kewl!'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115084313477355307</id><published>2006-06-20T23:38:00.000+01:00</published><updated>2006-06-20T23:50:14.980+01:00</updated><title type='text'>3Store3</title><content type='html'>Building 3Store3 3.0.14 on Mac OS X is an absolute pain. It's pretty clear the developers haven't done so, because there are a slew of dependencies that aren't mentioned. It builds fine on Linux, so it's a case where the developers haven't realised that the assumptions they make on Linux don't always hold on other platforms (such as my beloved Mac).&lt;br /&gt;&lt;br /&gt;So, what happens after we type  &lt;font face="Courier"&gt;./configure&lt;/font&gt;?&lt;br /&gt;&lt;br /&gt;Firstly, we couldn't find rasqal (part of Redland)&lt;br /&gt;&lt;div style="font-family:Courier"&gt;&lt;br /&gt;configure: error: Package requirements (rasqal &gt;= 0.9.11) were not met:&lt;br /&gt;Consider adjusting the PKG_CONFIG_PATH environment variable if you&lt;br /&gt;installed software in a non-standard prefix.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;OK, so edit your .bash_profile to contain these lines:&lt;br /&gt;&lt;div style="font-family:Courier"&gt;&lt;br /&gt;PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig&lt;br /&gt;:/opt/local/lib/pkgconfig&lt;br /&gt;export PKG_CONFIG_PATH&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;(/usr/local/lib/pkgconfig is where Redland package config files are stored, /opt/local/lib/pkgconfig is used by Darwin Ports, see below.)&lt;br /&gt;&lt;img src="http://www.darwinports.org/img/dp.jpg" align="right"width="200"/&gt;&lt;br /&gt;Next, we don't have glib, so it's off to &lt;a href="http://www.darwinports.org/"&gt;Darwin Ports&lt;/a&gt;, which packages a lot of Open Source tools for Mac OS X. Install it, then at the Terminal type:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;sudo /opt/local/bin/port install glib2-devel&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(this may take a while...). Make sure you have &lt;font face="Courier"&gt;/opt/local/lib/pkgconfig&lt;/font&gt; in your &lt;font face="Courier"&gt; PKG_CONFIG_PATH &lt;/font&gt; variable (see above). Now, we get&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;configure: error: Cannot find Berkeley DB library version 4&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;img src="http://dev.sleepycat.com/img/logo_dz.gif" align="right"/&gt;&lt;br /&gt;Sigh. So, we grab &lt;a href="http://www.sleepycat.com/index.html"&gt;Berkeley DB 4&lt;/a&gt;, cd to the directory, and&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;cd build_unix&lt;br /&gt;../dist/configure&lt;br /&gt;make&lt;br /&gt;sudo make install&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So we're there, right? Not so fast, did you think this was meant to be easy? 3Store3 assumes Berkeley DB4 is somewhere it ain't.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;su&lt;br /&gt;mkdir /usr/include/db4&lt;br /&gt;cd /usr/include/db4&lt;br /&gt;ln -s /usr/local/BerkeleyDB.4.4/include/db.h&lt;br /&gt;./configure LDFLAGS=-L/usr/local/BerkeleyDB.4.4/lib &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Are we there &lt;strong&gt;yet&lt;/strong&gt;?! Yes.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;sudo make install&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;To set up the triple store:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;ts-setup&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Then $£@!#, I discover I need MySQL version 4.1.&lt;i&gt;x&lt;/i&gt; (I'm running 4.0.21). To be fair, the documentation (what that) states this pretty clearly. OK, so we move the data safely out of the way, grab 4.1 from &lt;a href="http://www.mysql.com"&gt;www.mysql.com&lt;/a&gt;, and install it. I was running CompleteMySQL, so I need to remove that from my path, otherwise we get the wrong &lt;font face="Courier"&gt;mysql-config&lt;/font&gt; when rebuilding 3Store.&lt;br /&gt;&lt;br /&gt;So, why did I do all this? In a  word, &lt;a href="http://www.w3.org/TR/rdf-sparql-query/"&gt;SPARQL&lt;/a&gt;. Hope it's worth it...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115084313477355307?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115084313477355307/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115084313477355307' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115084313477355307'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115084313477355307'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/06/3store3.html' title='3Store3'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115075055580931420</id><published>2006-06-19T21:55:00.000+01:00</published><updated>2006-06-19T21:55:56.596+01:00</updated><title type='text'>Donat Agosti enters the blogsphere</title><content type='html'>&lt;img src="http://www.eowilson.org/images/templates/general/inner_logo_02.jpg" align="right"/&gt;Donat Agosti has three blogs, &lt;a href="http://biosyscontext.blogspot.com/"&gt; biosyscontext&lt;/a&gt;, &lt;a href="http://biodivcontext.blogspot.com/"&gt;biodivcontext&lt;/a&gt;, and &lt;a href="http://antbase.blogspot.com/"&gt;antbase&lt;/a&gt; (the later is not populated yet). Not one to mince words, Donat has some pithy &lt;a href="http://biodivcontext.blogspot.com/2006/06/eo-wilson-biodiversity-foundation-it.html"&gt;things to say&lt;/a&gt; about the &lt;a href="http://www.eowilson.org/gen_news.htm"&gt;E. O. Wilson Biodiversity Foundation&lt;/a&gt;, which looks like another example of the triumph of hype over craft (for another approach see my &lt;a href="ispecies.org"&gt;iSpecies&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Donat is also &lt;a href="http://biosyscontext.blogspot.com/2006/06/just-question-of-style-recent.html"&gt;rather critical&lt;/a&gt; of the authors of a recent paper on ant phylogeny (&lt;a href="http://dx.doi.org/0.1126/science.1124891"&gt;doi:0.1126/science.1124891&lt;/a&gt;). Nobody can accuse ants of being dull!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115075055580931420?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115075055580931420/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115075055580931420' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115075055580931420'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115075055580931420'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/06/donat-agosti-enters-blogsphere.html' title='Donat Agosti enters the blogsphere'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-115014280256803147</id><published>2006-06-12T20:53:00.000+01:00</published><updated>2006-06-12T21:06:42.576+01:00</updated><title type='text'>Using links to rank specimens, sequences, etc.</title><content type='html'>The "adundance" problem arises when a search returns too many hits. How does the user decide which ones are relevant (other than wading through the list)? The classical example is web search, where potentially millions of web pages may be returned. So, the challenge is to rank the results so the user needs look at only the top 10 or so (and is confident that what she is after is in the top 10).&lt;br /&gt;For biodiversity searches this is also relevant, especially if a search may return 100s of specimens. How do we rank these (assuming some are more interesting than others)? Well, one approach is to adopt the same approach as Google -- rank things based on links. In the case of specimens, we could use links to sequences, images, and publications as evidence that a specimen "matters" (i.e., people have done work on it, and therefore it is likely to be of interest).&lt;br /&gt;I'd suggested something like this for phylogenies over on &lt;a href="http://iphylo.blogspot.com/2006/01/finding-good-phylogenies-using.html"&gt;iPhylo&lt;/a&gt;, but it dawned on me today that the same idea might make sense for the SemAnt project. As an aside, &lt;a href="http://www.klinewoods.com/"&gt;Ben Szekely&lt;/a&gt; (another participant at TDWG GUID2) and Elias Torres have a &lt;a href="http://www.klinewoods.com/papers/tagrank.pdf"&gt;cool paper&lt;/a&gt; on extending PageRank to tags.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-115014280256803147?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/115014280256803147/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=115014280256803147' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115014280256803147'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/115014280256803147'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/06/using-links-to-rank-specimens.html' title='Using links to rank specimens, sequences, etc.'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-114986072050038109</id><published>2006-06-09T14:45:00.000+01:00</published><updated>2006-06-09T17:51:17.140+01:00</updated><title type='text'>TDWG-GUID demo</title><content type='html'>Just in time for the second meeting of TDWG GUID, I've got a version of the ant triple store up and running &lt;a href="http://linnaeus.zoology.gla.ac.uk/~rpage/ants/"&gt;here&lt;/a&gt;. There are some technical details posted on the &lt;a href="http://wiki.gbif.org/guidwiki/wikka.php?wakka=LSIDBasedIntegrationAntsDemo"&gt;TDWG GUID wiki&lt;/a&gt;.&lt;br /&gt;&lt;strike&gt;Note that the first time you go to the site you get a warning from Google Maps that the API key is invalid. It is, there's an issue to do with the server having multiple IPs (a temporary situation while I restore another machine that got hacked). Just click on &lt;b&gt;OK&lt;/b&gt; and it should work fine&lt;/strike&gt;.  Figured it out. In my haste I'd left some redundant code in the file &lt;b&gt;index.php&lt;/b&gt;, and it contained the wrong Google API key. Simple really...duh!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-114986072050038109?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/114986072050038109/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=114986072050038109' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114986072050038109'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114986072050038109'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/06/tdwg-guid-demo.html' title='TDWG-GUID demo'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-114978421498130042</id><published>2006-06-08T17:15:00.000+01:00</published><updated>2007-03-29T14:22:20.323+01:00</updated><title type='text'>Discovering new things</title><content type='html'>So, why bother with all this effort to aggregate information into a triple store, I hear you ask? Well, the expectation is that we can learn things we previously didn't know.&lt;br /&gt;For example, consider the ant specimen &lt;a href="http://www.antweb.org/specimen.do?name=casent0500379"&gt;casent0500379&lt;/a&gt;, which is recorded as the source of several sequences in GenBank. These sequences have been obtained by different research groups, and published in different papers. &lt;br /&gt;&lt;br /&gt;We can see this immediately if we construct a graph based on the RDF in the triple store. Each node in the graph is a subject. Two nodes, x and y are connected by an edge if there is a triple corresponding to either &lt;font face="Courier"&gt;(?x, ?pred, ?y)&lt;/font&gt; or &lt;font face="Courier"&gt;(?y, ?pred, ?x)&lt;/font&gt;. The neighbourhood of a node is all the nodes adjacent to that node. &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/4123/605/1600/53e5c32e58628a43a89f9042d189a10e.dot.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/53e5c32e58628a43a89f9042d189a10e.dot.png" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This graph represents the neighbourhood of specimen casent0500379. I've expanded the graph by finding the neighbours of all the neighbours of casent0500379. Note that this graph shows that five sequences have been obtained from this specimen (identified by their "gi" numbers), and those five sequences are associated with three different taxonomic names in GenBank(!).&lt;br /&gt;&lt;br /&gt;Two labs have sequenced the 28S ribosomal RNA gene from this specimen, one with accession number DQ353560 (&lt;a href="http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&amp;val=87047406"&gt;gi:87047406&lt;/a&gt;), published by Moreau, et al. in &lt;i&gt;Science&lt;/i&gt; (&lt;a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=pubmed&amp;dopt=Abstract&amp;query_hl=1&amp;list_uids=16601190"&gt;pmid:16601190&lt;/a&gt;, &lt;a href="http://dx.doi.org/10.1126/science.1124891"&gt;doi:10.1126/science.1124891&lt;/a&gt;) and one with accession number DQ401020 (&lt;a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=Nucleotide&amp;list_uids=89477179&amp;dopt=GenBank"&gt;gi:89477179&lt;/a&gt;), published by Ouellette et al. in MPE (&lt;a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=pubmed&amp;dopt=Abstract&amp;list_uids=16630727&amp;query_hl=24&amp;itool=pubmed_docsum"&gt;pmid:16630727&lt;/a&gt;, &lt;a href="http://dx.doi.org/10.1016/j.ympev.2006.03.017 "&gt;doi:10.1016/j.ympev.2006.03.017&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Now, the specimen has not been identified beyond being assigned to the genus &lt;i&gt;Proceratium&lt;/i&gt; (which includes the Google ant). These two research groups have given it different informal names, hence GenBank doesn't realise that these two taxa are the same.&lt;br /&gt;&lt;br /&gt;In fact there is a third taxon, "&lt;i&gt;Proceratium&lt;/i&gt; sp. CSM-2006" for this same specimen, which has been sequenced for wingless.&lt;br /&gt;&lt;br /&gt;This is a small example of where aggregating and visualise links between multiple data sources can tell us something we didn't know before.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Postscript&lt;/b&gt;&lt;br /&gt;My triple store misses the sixth sequence (yet another 28S rRNA sequence) from this specimen (AY325951, &lt;a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=Nucleotide&amp;list_uids=34398469&amp;dopt=GenBank"&gt;gi:34398469&lt;/a&gt;), because the specimen is recorded in the "isolate" field, not the specimen_voucher  field:&lt;br /&gt;&lt;pre&gt;     source          1..1233&lt;br /&gt;                     /organism="Proceratium sp. CS-2003-1"&lt;br /&gt;                     /mol_type="genomic DNA"&lt;br /&gt;                     /isolate="CASENT0500379"&lt;br /&gt;                     /db_xref="taxon:237763"&lt;br /&gt;                     /country="Madagascar"&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-114978421498130042?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/114978421498130042/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=114978421498130042' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114978421498130042'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114978421498130042'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/06/discovering-new-things.html' title='Discovering new things'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-114960343145625374</id><published>2006-06-06T15:17:00.000+01:00</published><updated>2006-06-06T15:17:11.530+01:00</updated><title type='text'>YeastHub</title><content type='html'>&lt;a href="http://yeasthub.gersteinlab.org"&gt;YeastHub&lt;/a&gt; is an interesting example of data integration in bioinformatics using RDF and a triple store. See Cheung et al. for details (&lt;a href="http://dx.doi.org/10.1093/bioinformatics/bti1026"&gt;doi:10.1093/bioinformatics/bti1026&lt;/a&gt;). To quote from the abstract:&lt;br /&gt;&lt;blockquote&gt;As the semantic web technology is maturing and the need for life sciences data integration over the web is growing, it is important to explore how data integration needs can be addressed by the semantic web. The main problem that we face in data integration is a lack of widely-accepted standards for expressing the syntax and semantics of the data. We address this problem by exploring the use of semantic web technologies—including resource description framework (RDF), RDF site summary (RSS), relational-database-to-RDF mapping (D2RQ) and native RDF data repository—to represent, store and query both metadata and data across life sciences datasets.&lt;/blockquote&gt;&lt;br /&gt;Pity the actual site seems broken...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-114960343145625374?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/114960343145625374/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=114960343145625374' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114960343145625374'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114960343145625374'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/06/yeasthub.html' title='YeastHub'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-114949822948157599</id><published>2006-06-05T09:59:00.000+01:00</published><updated>2006-06-05T21:28:29.710+01:00</updated><title type='text'>Taxonomic Markup and GUIDs</title><content type='html'>These notes were put together partly in response to discussions with Donat Agosti, but also as part of my experiments with storing ant data in a triple store. The idea is to mark up a taxonomic paper on ants with links to external sources of information (such as names, specimens, images, etc.).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/4123/605/1600/GoogleAnt.0.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/400/GoogleAnt.0.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;If you want some related inspiration, see Leigh Dodds' post on the scientific paper as a &lt;a href="http://www.ldodds.com/blog/archives/000264.html"&gt;modern palimpsest&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Firstly, I'm going to distinguish between &lt;b&gt;mark up&lt;/b&gt; and &lt;b&gt;metadata&lt;/b&gt;. I'm going to use mark up to mean tagging a manuscript to identify the relevant bits. For taxonomic literature this is largely after the fact, but for modern journals the article itself is represented in XML, which is then converted to a nice display using XSL. The BMC journals are a good example of this. What I'm interested in is how to mark up an article so that metadata about that article, its contents, and its relationships to other articles can be easily recovered (and output, presumably as RDF). Much of the mark up concerns the structure of a document, which in turn is important for presenting the document (say, in a web browser). I'm interested in just those bits relevant to metadata.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Identifiers&lt;/b&gt;&lt;br /&gt;I'm assuming that we have identifiers for the items of interest (i.e., URIs such as URLs, DOIs, Handles, LSIDs). Ideally, there is a way to extract metadata about the object the identifier refers to. LSIDs provide an explicit mechanism for doing this, and CrossRef provides a service to return an XML summary of metadata held for a given DOI. &lt;br /&gt;In my Taxonomic Search Engine I used the Hymenoptera Name Server's SEEK prototype to get metadata about a name, e.g. &lt;a href="http://atbi.biosci.ohio-state.edu:210/hymenoptera/nomenclator.seek_demo?id=HNS195070"&gt;http://atbi.biosci.ohio-state.edu:210/hymenoptera/nomenclator.seek_demo?id=HNS195070&lt;/a&gt; returns an XML document about the "Google ant", &lt;i&gt;Proceratium google&lt;/i&gt;. This document contains these identifiers:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;HNS153344: a taxonomic concept&lt;/li&gt;&lt;li&gt;HNS195070: a taxonomic name&lt;/li&gt;&lt;li&gt;pubHNS153344: a publication&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Only the name identifier (HNS195070) has metadata that can be easily accessed in XML, as far as I can see.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;How to refer to identifiers&lt;/b&gt;&lt;br /&gt;Given the uncertainty about resolving identifiers (i.e., will LSIDs take off), one might adopt the convention used by PubMed and the BMC journals and just include the "local" part of the identifier (see example below), rather than the full blown identifier. Otherwise, a document marked up with the complete identifier will be rendered out of date if the resolution mechanism changes. In English, just use Hymenoptera Namer Server ids, not LSIDs.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Literature cited&lt;/b&gt;&lt;br /&gt;Literature is perhaps the least problematic topic, because there are identifiers for many publications (e.g., DOIs), and tools for looking up identifiers for publications (e.g., CrossRef OpenURL, Google Scholar, PubMed, etc.). &lt;br /&gt;BMC uses the following markup for a bibliograpy entry:&lt;br /&gt;&lt;div&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;bibl id="B21"&amp;gt;&lt;br /&gt; &amp;lt;title&amp;gt;&lt;br /&gt;  &amp;lt;p&amp;gt;Inter-familial relationships of the shorebirds &lt;br /&gt;  (Aves: Charadriiformes) based on nuclear DNA sequence data&amp;lt;/p&amp;gt;&lt;br /&gt; &amp;lt;/title&amp;gt;&lt;br /&gt; &amp;lt;aug&amp;gt;&lt;br /&gt;  &amp;lt;au&amp;gt;&lt;br /&gt;   &amp;lt;snm&amp;gt;Ericson&amp;lt;/snm&amp;gt;&lt;br /&gt;   &amp;lt;fnm&amp;gt;PGP&amp;lt;/fnm&amp;gt;&lt;br /&gt;  &amp;lt;/au&amp;gt;&lt;br /&gt;  .&lt;br /&gt;  .&lt;br /&gt;  .&lt;br /&gt;  &amp;lt;au&amp;gt;&lt;br /&gt;   &amp;lt;snm&amp;gt;Norman&amp;lt;/snm&amp;gt;&lt;br /&gt;   &amp;lt;fnm&amp;gt;JA&amp;lt;/fnm&amp;gt;&lt;br /&gt;  &amp;lt;/au&amp;gt;&lt;br /&gt; &amp;lt;/aug&amp;gt;&lt;br /&gt; &amp;lt;source&amp;gt;BMC Evol Biol&amp;lt;/source&amp;gt;&lt;br /&gt; &amp;lt;pubdate&amp;gt;2003&amp;lt;/pubdate&amp;gt;&lt;br /&gt; &amp;lt;volume&amp;gt;3&amp;lt;/volume&amp;gt;&lt;br /&gt; &amp;lt;fpage&amp;gt;16&amp;lt;/fpage&amp;gt;&lt;br /&gt; &amp;lt;xrefbib&amp;gt;&lt;br /&gt;  &amp;lt;pubidlist&amp;gt;&lt;br /&gt;   &amp;lt;pubid idtype="pmcid"&amp;gt;184354&amp;lt;/pubid&amp;gt;&lt;br /&gt;   &amp;lt;pubid idtype="pmpid" link="fulltext"&amp;gt;12875664&amp;lt;/pubid&amp;gt;&lt;br /&gt;   &amp;lt;pubid idtype="doi"&amp;gt;10.1186/1471-2148-3-16&amp;lt;/pubid&amp;gt;&lt;br /&gt;  &amp;lt;/pubidlist&amp;gt;&lt;br /&gt; &amp;lt;/xrefbib&amp;gt;&lt;br /&gt;&amp;lt;/bibl&amp;gt;&lt;/pre&gt;&lt;/div&gt;&lt;br /&gt;Note that individual elements of the item (such as volume, pagination, etc.) are identified, but more importantly, identifiers are provided (in this case from PubMed Central, PubMed, and CrossRef). BMC is better than PLoS in this respect, as PLoS don't embed the identifiers. &lt;br /&gt;&lt;br /&gt;Note that the mark up above embeds identifiers, not URLs (for example). URLs are fragile and can break. By just using identifiers, BMC avoids this problem, but it means that the user has to know how to make the identifier actionable.&lt;br /&gt;&lt;br /&gt;Marking up literature to this level of detail within even a single paper would be time consuming, but as I've noted elsewhere on &lt;a href="http://ispecies.blogspot.com/2006/01/automatic-extraction-of-references.html"&gt;iSpecies&lt;/a&gt;, tools like &lt;a href="http://paracite.eprints.org/"&gt;ParaCite&lt;/a&gt; would make this tractable. ParaCite includes code to generate OpenURL requests, which means finding DOIs would be straightforward.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What's needed?&lt;/b&gt;&lt;br /&gt;Tools to extract citations from text and locate identifiers. ParaCite would help. Relying on CrossRef's OpenURL server will be limited to those cases where CrossRef knows about the article (i.e., it has a DOI). It would be useful to have similar tools for searching PubMed, taxon-specific bibliographic databases such as &lt;a href="http://www.ars.usda.gov/research/docs.htm?docid=10003&amp;page=1"&gt;FORMIS&lt;/a&gt;, and the Hymenoptera Name Server. By tool I mean a Web API (can be simple as an HTTP GET interface). Google Scholar would also be useful, although there are issues with using it. There is also literature in DSpace repositories, such as the AMNH's wonderful collection of their scientific publications. How do we query this on the fly? In summary, having an OpenURL interface to taxonomic literature would greatly facilitate automated mark up.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Taxonomic names&lt;/b&gt;&lt;br /&gt;I subscribe to uBio's view (see &lt;a href="http://dx.doi.org/10.1080/10635150500541680"&gt;doi:10.1080/10635150500541680&lt;/a&gt;) that names by themselves are useful and should be indexed. A paper may mention a name with nothing to tell us what taxonomic concept is being used. For example, this is how the Google ant paper describes the site the ants were collected from:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Exotic vegetation dominates, most notably a scrub of strawberry guava (&lt;i&gt;Psidium cattleianum&lt;/i&gt;) and privet (&lt;i&gt;Ligustrum robustrum&lt;/i&gt;)—but grassland and &lt;i&gt;Eucalyptus&lt;/i&gt; plantations also occur.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;The paper is about ants, and for &lt;i&gt;Discothyrea berlita&lt;/i&gt; Fisher, &lt;i&gt;Proceratium avium&lt;/i&gt; Brown, &lt;i&gt;Proceratium avioide&lt;/i&gt; de Andrade, and &lt;i&gt;Proceratium google&lt;/i&gt; Fisher, we have a clear concept of what those names refer to (at a minimum, the specimens listed). For the other names (which include ants and plants), we have little to go on apart from the names. So, every occurrence of a taxonomic name in the document should be flagged. BioOne journals do this already. Any scientific name in the HTML is linked to ITIS. The linking is not intelligent as it is a search, not a link to an identifier (i.e., nobody actually checks that ITIS has the name).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What's needed?&lt;/b&gt;&lt;br /&gt;Names in the document should be linked to uBio namebank LSIDs. For ants we could also use the Hymenoptera Name Server.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Taxonomic concepts&lt;/b&gt;&lt;br /&gt;If we think of a concept as "what the name means", then this is most relevant to taxonomic papers describing names (e.g., where the author lists specimens, describes features of the taxon), argues that two taxa are synonyms, etc. uBio has a notion of a concept in the sense that a name may exist in multiple classifications, and each combination of name and classification has its own identifier. Hymenoptera has concepts (my understanding is that this is what one sees when one searches the Name Server through the web interface).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Treatment&lt;/b&gt;&lt;br /&gt;The taxonomic treatment is probably the same as the taxonomic concept (or perhaps, a treatment can be regarded as a detailed taxonomic concept). In my early experiments I assigned GUIDs to the taxonomic treatment, and used the Dublin Core tag &amp;lt;dcterms:isPartOf&amp;gt; to associate each treatment with the publication. As a quick hack the identifier for each treatment within a paper included the XPath query that would locate that treatment in the larger document, e.g. &lt;b&gt;//tax:treatment[1]&lt;/b&gt;. Embedding this much meaning in an identifier is probably not wise, but it meant that given just the identifier, one could extract the treatment from the document.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Authors&lt;/b&gt;&lt;br /&gt;The lack of GUIDs for authors is a long standing issue.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Images&lt;/b&gt;&lt;br /&gt;Most taxonomic images probably reside solely within the publication, but some may be stored in external databases (such as AntWeb). In the later case, the mark up should make the link to the external source.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Specimens&lt;/b&gt;&lt;br /&gt;If the specimen has an electronic existence, then link to that.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-114949822948157599?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/114949822948157599/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=114949822948157599' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114949822948157599'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114949822948157599'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/06/taxonomic-markup-and-guids.html' title='Taxonomic Markup and GUIDs'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-114909703886177129</id><published>2006-05-31T18:37:00.000+01:00</published><updated>2006-05-31T18:37:18.866+01:00</updated><title type='text'>Truncating strings using XSL</title><content type='html'>&lt;b&gt;Problem&lt;/b&gt;&lt;br /&gt;I want to display a RSS feed from Connotea for papers &lt;a href="http://www.connotea.org/rss/recent/user/rdmpage?q=Formicidae"&gt;I've tagged with "Formicidae"&lt;/a&gt;. The titles of these papers can be long, e.g.:&lt;br /&gt;&lt;blockquote&gt;Dracula ant phylogeny as inferred by nuclear 28S rDNA sequences and implications for ant systematics (Hymenoptera: Formicidae: Amblyoponinae)&lt;/blockquote&gt;&lt;br /&gt;This takes up too much space when displayed in a panel on the web page, so I want to truncate the titles in a sensible way.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Solution&lt;/b&gt;&lt;br /&gt;&lt;a href="http://aaronland.info"&gt;aaronland.info&lt;/a&gt; has a nice &lt;a href="http://aaronland.info/xsl/string/truncate-phrase/"&gt;solution&lt;/a&gt; that truncates to a fixed length, optionally at a word boundary. Our verbose title becomes:&lt;br /&gt;&lt;blockquote&gt;Dracula ant phylogeny as inferred by nuclear 28S...&lt;/blockquote&gt;&lt;br /&gt;Much nicer.&lt;br /&gt;&lt;br /&gt;&lt;p&gt;(Via &lt;a href="http://www.google.com/search?q=xsl%20truncate%20string"&gt;Google: xsl truncate string&lt;/a&gt;.)&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;Currently playing in iTunes: &lt;i&gt;Dani California&lt;/i&gt; by Red Hot Chili Peppers&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-114909703886177129?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/114909703886177129/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=114909703886177129' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114909703886177129'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114909703886177129'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/05/truncating-strings-using-xsl.html' title='Truncating strings using XSL'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-114885412712348503</id><published>2006-05-28T23:08:00.000+01:00</published><updated>2006-05-29T00:56:37.913+01:00</updated><title type='text'>Australian ants online</title><content type='html'>&lt;img src="http://www.ento.csiro.au/science/ants/graphics/csiro_sm.gif" align="right" /&gt;&lt;br /&gt;The Australia National Insect Collection has an online resource of Australian ants that provides species maps and Google Earth files of specimen distributions. &lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/4123/605/1600/37501ausMap.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;" src="http://photos1.blogger.com/blogger/4123/605/320/37501ausMap.jpg" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;A potentially useful source of data, but the KML files don't include specimen codes, and there is an unfortunate disconnect between specimen codes and ids used to link to them. For example, in the specimen list for &lt;i&gt;Myrmecia midas&lt;/i&gt;, the first specimen has a "MaterialID" of &lt;a href="http://anic.ento.csiro.au/database/specimen_details.asp?MaterialID=128315&amp;taxaName=Myrmecia%20midas%20Clark,%201951"&gt;128315&lt;/a&gt;, yet the ANIC accession code for this specimen is &lt;b&gt;32-012805&lt;/b&gt;. Pity this identifier is not made of in the link to the data. &lt;br /&gt;&lt;br /&gt;The site also displays  specimen metadata as an HTML table -- this information is also available as a CSV file. This is an example where a touch more effort would make it truly useful as a source of data. Something as simple as RDF with the specimen URL as URI would be a big help, but this could be achieved by some judicious screen scraping...&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Duh!&lt;/b&gt;&lt;br /&gt;Just realised, the data is &lt;a href="http://www.secretariat.gbif.net/portal/ecat_browser.jsp?taxonKey=349836"&gt;served up by GBIF&lt;/a&gt;, which makes life much easier.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-114885412712348503?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/114885412712348503/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=114885412712348503' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114885412712348503'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114885412712348503'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/05/australian-ants-online.html' title='Australian ants online'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-27820238.post-114719753129613963</id><published>2006-05-09T18:58:00.000+01:00</published><updated>2006-05-29T00:54:42.930+01:00</updated><title type='text'>SemAnt?</title><content type='html'>Semantic Ants (geddit?).&lt;br /&gt;&lt;br /&gt;Initially I blogged this project on &lt;a href="http://iphylo.blogspot.com"&gt;iPhylo&lt;/a&gt;, such as an introduction to &lt;a href="http://iphylo.blogspot.com/2006/05/ants-rdf-and-triple-stores.html"&gt;populating the triple store&lt;/a&gt;, and thoughts on how to &lt;a href="http://iphylo.blogspot.com/2006/05/updating-ants.html"&gt;automatically add to its contents&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As the project develops I hope to describe further data sources, possible queries, user interface design, and benchmarks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/27820238-114719753129613963?l=semant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://semant.blogspot.com/feeds/114719753129613963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=27820238&amp;postID=114719753129613963' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114719753129613963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/27820238/posts/default/114719753129613963'/><link rel='alternate' type='text/html' href='http://semant.blogspot.com/2006/05/semant.html' title='SemAnt?'/><author><name>Rod Page</name><uri>http://www.blogger.com/profile/00269598293846172649</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://static.flickr.com/122/255060272_f0f249723d_m.jpg'/></author><thr:total>0</thr:total></entry></feed>
