Embedding queries in RDF – FOAF Group example

Is this crazy or useful? Am not sure yet.

This example uses FOAF vocabulary for groups and openid. So the basic structure here is that Agents (including persons) can have an :openid and can be a :member of a :Group.

From an openid-augmented WordPress, we get a list of all the openids my blog knows about. From an openid-augmented MediaWiki, we get a list of all the openids that contribute to the FOAF project wiki. I dumped each into a basic RDF file (not currently an automated process). But the point here is to explore enumerated groups using queries.

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns=”http://xmlns.com/foaf/0.1/”>
<Group rdf:about=’#both’>
<!– enumerated membership –>
<member><Agent><openid rdf:resource=’http://danbri.org/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://tommorris.org/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://kidehen.idehen.net/dataspace/person/kidehen’/></Agent></member>
<member><Agent><openid rdf:resource=’http://www.wasab.dk/morten/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://kronkltd.net/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://www.kanzaki.com/’/></Agent></member>

<!– rule-based membership –>

<constructor><![CDATA[
PREFIX : <http://xmlns.com/foaf/0.1/>
CONSTRUCT {
<http://danbri.org/yasns/danbri/both.rdf#thegroup> a :Group; :member [ a :Agent; :openid ?id ]
}
WHERE {
GRAPH <http://wiki.foaf-project.org/_users.rdf> { [ a :Group; :member [ a :Agent; :openid ?id ]. ] }
GRAPH <http://danbri.org/yasns/danbri/_group.rdf> { [ a :Group; :member [ a :Agent; :openid ?id ]. ] }
}
]]></constructor>
</Group>
</rdf:RDF>

This RDF description does it both ways. It enumerates (for simple clients) a list of members of a group whose members are those individuals that are both commentators on my blog, and contributors to the FOAF wiki. At least, to the extent they’re detectable via common use of OpenID URIs. But the RDF group description also embeds a SPARQL query, the kind which generates RDF rather than an SQL-like resultset. The RDF essentially regenerates the enumerated list, assuming the query is run against an RDF dataset with the data graphs appropriately populated.

Now I sorta like this, and I sorta don’t. It may be incredibly powerful, or it may be a bit to clever for its own good.

Certainly there’s scope overlap with the W3C RIF rules work, and with the capabilities of OWL. FOAF has long contained an experimental method for using OWL to do something similar, but it hasn’t found traction. The motivation I have here for trying SPARQL here is that it has built-in machinery for talking about the provenance of data; so I could write a group description this way that says “members are anyone listed as a colleague in http://myworkplace.example.com/stafflist.rdf”. Or I could mix in arbitrary descriptive vocabularies; family tree stuff, XFN, language abilities (speaks-reads-writes) etc.

Where I think this could fall down is in the complexity of the workflow. The queries need executing against some SPARQL installation with a configured dataset, and the query lists URIs of data graphs. But I doubt database admins will want to randomly load any/every RDF file mentioned in these shared queries. Perhaps something like SparqlPress, attached to one’s weblog, and social filters to load only files in queries eg. from friends? Also, authoring these kinds of query isn’t something non-geek users are going to do often, and the sorts of queries that will work will depend of course on the data actually available. Sure I could write a query based on matching the openids of former colleagues, but the group will be empty unless the data listing people as former colleagues is actually out there and in the Web, and written in the terms anticipated by the query.

On the other hand, this mechanism does appeal, and could go way beyond FOAF group definitions. We could see a model where people post data in the Web but also post queries, eg. revisiting the old work Libby and I explored around RSS query. On the other other hand, who wants to make their Web queries public? All that said, the same goes for the data being queried. And since this technique embeds queries inside ordinary RDF data, however we deal with the data visibility issue for RDF/FOAF should also work for the query stuff. Perhaps. Can’t blame me for trying…
I realise this isn’t the clearest of explanations. Let’s try again:

RDF is normally for publishing collections of simple claims about the world. This is an experiment in embedding data-generating-queries amongst these claims, where the query is configured to output more RDF claims (aka statements, triples etc), but only when executed against some appropriate body of RDF data. Since the query is written in SPARQL, it allows the data-generation rules to mention interesting things, such as properties of the source of the data being queried.

This particular experiment is couched in terms of FOAF’s “Group” construct, but the technique is entirely general. The example above defines a group of agents called the “both” group, by saying that an Agent is in that group if it its OpenID URI is listed in each of two RDF documents specified, ie. both a commentator on my blog, and a contributor to the FOAF Wiki. Other examples could be “(fe)male employees” or “family members sharing a blood type” or in fact, any descriptive pattern that can match against the data to hand and be expressed in SPARQL.

Amazon EC2: My 131 cents

Early this afternoon, I got it into my head to try out Amazon’s “Elastic Compute Cloud” (EC2) service. I’m quite impressed.

The bill so far, after some playing around, is 1.31 USD. I’ve had the default “getting started” Fedora Linux box up and running for maybe 7 hours, as well as trying out machine images (AMIs) preconfigured for Virtuoso data spaces, and for Ruby on Rails. Being familiar with neither, I’m impressed by the fact that I can rehydrate a pre-prepared Linux machine configured for these apps, simply by clicking a button in the EC2 Firefox addon. Nevertheless I managed to get bogged down with both toolkits; wasn’t in an RTFM mood, and even Rails can be fiddly if you’re me.

I’ve done nothing very compute or bandwidth intensive yet; am sure the costs could crank up a bit if I started web crawling and indexing. But it is an impressive setup, and one you can experiment with easily and cheaply, especially if you’ve already got an Amazon account. Also, being billed in USD is always cheering. The whole thing is controlled through Web service interfaces, which are hidden from me since I use either the Firefox addon, or else the commandline tools, which work fine on MacOSX once you’ve set up some paths and env variables.

I can now just type “ec2-run-instances ami-e2ca2f8b -k sandbox-keypair” or similar (the latter identifies a public key to install in the server), to get a new Linux machine setup within a couple minutes, pre-configured in this case with Virtuoso Data Spaces. And since the whole environment is therefore scriptable, virtual machines can beget virtual machines. Zero-click purchase – just add money :)

So obviously I’m behind on the trends here, but hey, it’s never too late. The initial motivate for checking out EC2 was a mild frustration with the DreamHost setup I’ve been using for personal and FOAF stuff. No huge complaints, just that I was cheap and signed up for shared-box access, and it’s kind of hard not having root access after 12 years or so of having recorse to superpowers when needed. Also DreamHost don’t do Java, which is midly annoying. Some old FOAF stuff of Libby’s is in Java, and of course it’d be good to be able to make more use of Jena.

As I type this, I’m sat across the room from an ageing linux box (with broken CPU fan), taking up space and bringing a tangle of cables to my rather modest living room. I had been thinking about bringing it back to life as a dev box, since I’m otherwise working only on Mac and WinXP laptops. Having tried EC2, I see no reason to clutter my home with it. I’d like to see a stronger story from Amazon about backing up EC2 instances, but … well I’ll like to see a stronger story from me on that topic too :)