Monday, June 12, 2006

Using links to rank specimens, sequences, etc.

The "adundance" problem arises when a search returns too many hits. How does the user decide which ones are relevant (other than wading through the list)? The classical example is web search, where potentially millions of web pages may be returned. So, the challenge is to rank the results so the user needs look at only the top 10 or so (and is confident that what she is after is in the top 10).
For biodiversity searches this is also relevant, especially if a search may return 100s of specimens. How do we rank these (assuming some are more interesting than others)? Well, one approach is to adopt the same approach as Google -- rank things based on links. In the case of specimens, we could use links to sequences, images, and publications as evidence that a specimen "matters" (i.e., people have done work on it, and therefore it is likely to be of interest).
I'd suggested something like this for phylogenies over on iPhylo, but it dawned on me today that the same idea might make sense for the SemAnt project. As an aside, Ben Szekely (another participant at TDWG GUID2) and Elias Torres have a cool paper on extending PageRank to tags.


Post a Comment

<< Home