RDF in Ruby revisited

If you’re interested in collaborating on Ruby tools for RDF, please join the public-rdf-ruby@w3.org mailing list at W3C. Just send a note to public-rdf-ruby-request@w3.org with a subject line of “subscribe”.

Last weekend I had the fortune to run into Rich Kilmer at O’Reilly’s ‘Social graph Foo Camp‘ gathering. In addition to helping decorate my tent, Rich told me a bit more about the very impressive Semitar RDF and OWL work he’d done in Ruby, initially as part of the DAML programme. Matt Biddulph was also there, and we discussed again what it would take to include FOAF import into Dopplr. I’d be really happy to see that, both because of Matt’s long history of contributions to the Semantic Web scene, but also because Dopplr and FOAF share a common purpose. I’ve long said that a purpose of FOAF is to engineer more coincidences in the world, and Dopplr comes from the same perspective: to increase serendipity.

Now, the thing about engineering serendipity, is that it doesn’t work without good information flow. And the thing about good information flow, is that it benefits from data models that don’t assume the world around us comes nicely parceled into cleanly distinct domains. Borrowing from American Splendor – “ordinary life is pretty complex stuff“. No single Web site, service, document format or startup is enough; the trick comes when you hook things together in unexpected combinations. And that’s just what we did in the RDF world: created a model for mixed up, cross-domain data sharing.

Dopplr, Tripit, Fire Eagle and other travel and location services may know where you and others are. Social network sites (and there are more every day) knows something of who you are, and something of who you care about. And the big G in the sky knows something of the parts of this story that are on the public record.

Data will always be spread around. RDF is a handy model for ad-hoc data merging from multiple sources. But you can’t do much without an RDF parser and a few other tools. Minimally, an RDF/XML parser and a basic API for navigating the graph. There are many more things you could add. In my old RubyRdf work, I had in-memory and SQL-backed storage, with a Squish query interface to each. I had a donated RDF/XML parser (from Ruby4R) and a much-improved query engine (with support for optionals) from Damian Steer. But the system is code-rotted. I wrote it when I was learning Ruby beginning 7 years ago, and I think it is “one to throw away”. I’m really glad I took the time to declare that project “closed” so as to avoid discouraging others, but it is time to revisit Ruby and RDF again now.

Other tools have other offerings: Dave Beckett’s Redland system (written in C) ships with a Ruby wrapper. Dave’s tools probably have the best RDF parsing facilities around, are fast, but require native code. Rena is a pure Ruby library, which looked like a great start but doesn’t appear to have been developed further in recent years.

I could continue going through the list of libraries, but Paul Stadig has already done a great job of this recently (see also his conclusions, which make perfect sense). There has been a lot of creative work around RDF/RDFS and OWL in Ruby, and collectively we clearly have a lot of talent and code here. But collectively we also lack a finished product. It is a real shame when even an RDF-enthusiast like Matt Biddulph is not in a position to simply “gem install” enough RDF technology to get a simple job done. Let’s get this fixed. As I said above,

If you’re interested in collaborating on Ruby tools for RDF, please join the public-rdf-ruby@w3.org mailing list at W3C. Just send a note to public-rdf-ruby-request@w3.org with a subject line of “subscribe”.

In six months time, I’d like to see at least one solid, well rounded and modern RDF toolkit packaged as a Gem for the Ruby community. It should be able to parse RDF/XML flawlessy, and in addition to the usual unit tests, it should be wired up to the RDF Test Cases (see download) so we can all be assured it is robust. It should allow for a fast C parser such as Raptor to be used if available, falling back on pure Ruby otherwise. There should be a basic API that allows me to navigate an RDF graph of properties and values using clear, idiomatic Ruby. Where available, it should hook up to external stores of data, and include at least a SPARQL protocol client, eventually a full SPARQL implementation. It should allow multiple graphs to be super-imposed and disentangled. Some support for RDFS, OWL and rule languages would be a big plus. Support for other notations such as Turtle, RDFa, or XSLT-based GRDDL transforms would be useful, as would a plugin for microformat import. Transliterating Python code (such as the tiny Euler rule engine) should be considered. Divergence from existing APIs in Python (and Perl, Javascript, PHP etc) should be minimised, and carefully balanced against the pull of the Ruby way. And (thought I lack strong views here) it should be made available under a liberal opensource license that permits redistribution under GPL. It should also be as I18N and Unicode-friendly as is possible in Ruby these days.

I’m not saying that all RDF toolkits should be merged, or that collaboration is compulsory. But we are perilously fragmented right now, and collaboration can be fun. In six months time, people who simply want to use RDF from Ruby ought to be pleasantly suprised rather than frustrated when they take to the ‘net to see what’s out there. If it takes a year instead of six months, sure whatever. But not seven years! It would be great to see some movement again towards a common library…

How hard can it be?

Open IDiomatic? Dada engine hosting as OpenID client app

All of us are dumber than some of us.

Various folk are concerned that OpenID has more provider apps than consumer apps, so here is my little website idea for an OpenID-facilitated collaborative thingy. I’ve loved the Dada Engine for years. The Dada Engine is the clever-clogs backend for grammar-driven nonsense generators such as the wonderful Postmodernism Generator.

Since there are many more people capable of writing funny prose to configure this machine, than there are who can be bothered to do the webhosting sysadmin, I reckon it’d be worthwhile to make a general-purpose hosted version whereby users could create new content through a Web interface. And since everyone would forget their passwords, this seems a cute little project to get my hands dirty with OpenID. To date, all I’ve done server-side with OpenID is install patches to MediaWiki and WordPress. That went pretty smoothly. My new hacking expedition, however, hit a snag already: the PHP libraries on my EC2 sandbox server didn’t like authenticating against LiveJournal. I’m new to PHP so when something stops working I panic and whine in IRC instead of getting to the bottom of the problem.

Back to the app idea, the Dada engine basically eats little config files that define sentence structures, and spews nonsense. Here’s a NSFW subgenial rant:

FlipFlip:scripts danbri$ pb < brag.pb
Fuck ‘em if they can’t take a joke! I’m *immune*! *Yip, yip, YEEEEEEE!*
*Backbone Blowout*! I do it for *fun*! YAH-HOOOO! Now give me some more of…

And here’s another:

FlipFlip:scripts danbri$ pb < brag.pb
I’m a fission reactor, I fart plutonium, power plants are fueled by the breath
of my brow; when they plug *me* in, the lights go out in Hell County! Now give
me some more of…

A fragment of the grammar:

“I say, `” slogan “‘. By God, `” slogan “‘, I say! ” |
“I am ” entity “, I am ” entity “! ” |
“I’ll drive a mile so as not to walk a foot; I am ” entity “! ” |
“Yes, I’m ” entity “! ” |
“I drank *” being “* under ” number ” tables, I am too ” adjective ” to die, I’m insured for acts o’ God *and* Satan! ” |
“I was shanghaied by ” entities ” and ” entities ” from ” place “, and got away with their hubcaps! ” |
“I *cannot* be tracked on radar! ” |
“I wear nothing uniform, I wear *no* ” emphatic ” uniform! ” | ….

To be crystal clear, this hosted app is total vapourware currently. And I’m not sure if it would be a huge piece of work to make sure the dada engine didn’t introduce security holes. Certainly the version that uses the C preprocessor scares me, but the “pb” utility shown above doesn’t use that. Opinions on the safety of this welcomed. But never mind the details, smell the vision! It could always use Novewriting (a Python “knock-off of the Dada Engine”) instead.

The simplest version of a hosting service for this kind of user generated meta-content would basically be textarea editing and a per-user collection of files that are flagged public or private. But given the nature of the system, it would be great if the text generator could be run against grammars that multiple people could contribute to. Scope for infinite folly…

Commandline PHP for loading RDF URLs into ARC (and Twinkle for query UI)


#!/usr/bin/php
<?php
if ($argc != 2 || in_array($argv[1], array('--help', '-help', '-h', '-?'))) {
?>
This is a command line PHP script with one option: URL of RDF document to load
<?php
} else {

$supersecret = "123rememberme"; #Security analysts recommend using data of birth + social security ID here
# *** be careful with real msql passwords ***

include_once("../arc/ARC2.php");
$config = array( 'db_host' => 'localhost', 'db_name' => 'sg1', 'db_user' => 'sparql',
'db_pwd' => $supersecret, 'store_name' => 'crawl', );
$store = ARC2::getStore($config);
if (!$store->isSetUp()) { $store->setUp(); }
$profile = $argv[1];
echo "Loading data from " . $profile ;
$store->query('DELETE FROM <'.$profile.'>');
$store->query('LOAD <'.$profile.'>');
}
?>

FWIW, this is what I’m using to (re)load data into an ARC store from the commandline. I’ll try wiring up my old RDF crawler to this when I get time. Each loaded source is stored as a named graph, with the URI it is loaded from being the named graph URI. ARC just needs the path to the unpacked PHP libraries, and connection details for a MySQL database, and comes with a handy SPARQL endpoint script too, which I’ve been testing with Twinkle.

My public sandbox data is currently loaded up as follows. No promises it’ll stay there, but anyway, adding the following to Twinkle 2.0’s config file section for SPARQL endpoints works for me. The endpoint also directly offers a basic Web interface too, with HTML, XML, JSON etc.


<http://sandbox.foaf-project.org/2008/foaf/ggg.php>
a sources:Endpoint; rdfs:label "FOAF Social Graph Agggregator sandbox".

Google Earth touring via KML

While I’m writing up old hacks, here’s one that I really enjoyed, even if it was a bit clunky. A couple of years ago Mikel Maron implemented (on my urging in irc.oftc.net #geo IRC :) a PHP-based Google Earth touring service, which interconnects a “tour guide” user with “tourists”.

This site facilitates collaborative, realtime exploration of Google Earth. As the “tour guide” navigates, “tourists” will automatically follow along.

When the tour guide’s Google Earth installation is at rest, a specially installed KML network link sends the server an HTTP request, showing the coordinates of the visible area of the globe. This same service is periodically polled (every second) by “tourists” whose Google Earth will dutifully fly to the appropriate spot.

The system seems to be offline currently, but was quite evocative to use, even if tricky. You never quite knew what the other party could actually see, since the picture can load quite slowly when moving around a lot. And the implementation didn’t do anything about angle of view (although this became possible in later versions of KML). I had experimental tours of Dublin guided by Ina (Skyping at same time), and of various places in Iran by Hamed Saber.

I expect in due course (if not already, I don’t track these things) Google Earth and similar products (Worldwind, or the Microsoft thingy) will offer social map browsing, it’s such a nice feature, though it really needs an audio channel open at the same time. Last week I tried to do the same without such a link, my mum talking me thru finding a small village in France. Much harder! “Take the road north out of Chabanais … past a small farm, past the swimming pool…”.

Here is Mikel’s “how it works” writeup:

The web interface generates KML files, which are loaded into Google Earth and create Network Links. The tour guide has a “View Based Refresh” Network Link, which sends the bounding box of the current view to the specified URL whenever the camera stops. That position is stored. Tourists receive a “Time Based Refresh” Network Link, which requests every 10 seconds and receives the last stored position of the guide.

Right now only location and altitude are transmitted. A future release of Google Earth may enable tilt and rotation. Integrated chat would be nice as well.

The fact that they’ve hidden a full flight simulator within Google Earth might make this worth revisiting. And of course there is infinite fun to be had from playing with photos etc on the globe, although my last attempts in that direction (preparing for 3 months in Buenos Aires by studying geo-tagged photos instead of Spanish) tailed off. Everyone was putting pics on maps, I got a bit bored, even though it’s still a worthwhile area with much still to be done.

Some ideas are not meant to be combined though: who really needs a collaborative realtime photo-navigator implemented with Google Earth flight simulator? :)

OpenID plugin for WordPress

I’ve just installed Alan J Castonguay’s WordPress OpenID plugin on my blog, part of a cleanup that included nuking 11000+ comments in the moderation queue using the Spam Karma 2 plugin. Apologies if I zapped any real comments too. There are a few left, at least!

The OpenID thing appears to “just work”. By which I mean, I could log in via it and leave a comment. I’d be super-grateful if those of you with OpenIDs could take a minute to leave a comment on this post, to see if it works as well as it seems to. If it doesn’t, a bug report (to danbrickley@gmail.com) would be much appreciated. Those of you with LiveJournals or AOL/AIM accounts already have OpenID, even if you didn’t notice. See the HTML source for my homepage to see how I use “danbri.org” as an OpenID while delegating the hard work to LiveJournal. For more on OpenID, check out these tutorial slides (flash/pdf) from Simon Willison and David Recordon.

Thinking about OpenID-mediated blog comments, the tempting thing then would be to do something with the accumulated URIs. The plugin keeps its data in nice SQL tables and presumably accessible by other WordPress plugins. It’s been a while since I made a WordPress plugin, but they seem to have a pretty good framework accessible to them now.

mysql> select user_id, url from wp_openid_identities;
+---------+--------------------+
| user_id | url                |
+---------+--------------------+
|      46 | http://danbri.org/ |
+---------+--------------------+
1 row in set (0.28 sec)

At the moment, it’s just me. It’d be fun to try scooping up RDF (FOAF, SKOS, SIOC, feeds…) from any OpenID URIs that accumulate there. Hmm I even wrote up that project idea a while back – SparqlPress. At the time I tried prototyping it in Redland + PHP, but nowadays I’d probably use Benjamin Nowack’s ARC library, which provides SPARQL query of a MySQL-backed RDF store, and is written in PHP. This gives it the same dependencies as WordPress, making it ideal for pluginization. If anyone’s looking for a modest-sized practical SemWeb project to hack on, that one could be a lot of fun.

There’s a lot of interesting and creative fuss about “social networking” site interop around lately, largely thanks to the social graph paper from Brad Fitzpatrick and David Recordon. I lean towards the “show me, don’t tell me” approach regarding buddylists and suchlike (as does Julian Bond with Ecademy), which is why FOAF has only ever had the mild-mannered “knows” relationship in the core vocabulary, rather than trying to over-formalise “bestest friend EVER” and other teenisms. So what I like about this WordPress plugin is that it gives some evidence-based raw material for decentralised social networking apps. Blog comments don’t tell the whole story; nothing tells the whole story. But rather than maintain a FOAF “knows” list (or blogroll, or blog-reader config) by hand, I’d prefer to be able to partially automate it by querying information about whose blogs I’ve commented on, and vice-versa. There’s a lot that could be built, intimidatingly much, that it’s hard to know where to start. I suggest that everyone in the SemWeb scene having an OpenID with a FOAF file linked from it would be an interesting platform from which to start exploring…

Meanwhile, I’ll try generating an RDF blogroll from any URIs that show up in my OpenID WordPress table, so I can generate a planetplanet or chumpologica configuration automatically…

GIS and Spatial Extensions with MySQL

GIS and Spatial Extensions with MySQL.

MySQL 4.1 introduces spatial functionality in MySQL. This article describes some of the uses of spatial extensions in a relational database, how it can be implemented in a relational database, what features are present in MySQL and some simple examples.

I’m hoping to understand the commonalities between this and PostGIS. PostGIS follows the OpenGIS “Simple Features Specification for SQL“. As do the MySQL extensions, apparently. The MySQL pages summarise the extensions as follows:

Data types. There needs to be data types to store the GIS information. This is best illustrated with an example, a POINT in a 2-dimensional system.

Operations. There must be additional operators to support the management of multi-dimensional objects, again, this is best illustrated with an example, a function that computes the AREA of a polygon of any shape.

The ability to input and output GIS data. To make systems interoperable, OGC has specified how contents of GIS objects are represented in binary and text format.

Indexing of spatial data. To use the different operators, some means of indexing of GIS data is needed, or in technical terms, spatial indexing.

I’m currently working on some ideas to prototype a new project (to fill the gap that the completion of SWAD-Europe leaves in my schedule). I’ll be revisiting my Gargonza plan to add a basic SemWeb RDF crawler to personal weblog installations, initially prototyping with Redland addons to WordPress. Ultimately, pure PHP would be better, unless Redland finds its way into the default PHP installation. Since WordPress requires MySQL anyway, it seems worth taking a look at these geo-related extensions. A more thorough investigation would take a look at reflecting GIS SQL concepts into RDF, perhaps exposing them in a SPARQL query environment. But that’s a bit ambitious for now.

What I hope to do for starters is use a blog as a personal SW crawler, scooping up RSS, FOAF, calendar, and photo descriptions from nearby Web sites. It isn’t clear yet exactly how photo metadata should most usefully be structured, but it is clear that we’ll find a way to harvest it into an RDF store. And if that metadata has mappable content, whether basic lat/long tags, richer GML, or something in between, we’ll harvest that too. My working hypothesis is that we’ll need something like MySQL spatial extensions or PostGIS to really make the most of that data, for eg. to expose location-specific, app-centric RSS, KML, etc. feeds such as those available from the flickr-derrived geobloggers.com and brainoff flickr.proxy sites. See mapufacture.com for one possible client app; Google Earth as KML browser is another.

That’s the plan anyway. So the reading list grows. Fortunately, OGC’s GIS SQL spec at least has some nice diagrams…

GIS datatype hierarchy