OpenSocial schema extraction: via Javascript to RDF/OWL

OpenSocial’s API reference describes a number of classes (‘Person’, ‘Name’, ‘Email’, ‘Phone’, ‘Url’, ‘Organization’, ‘Address’, ‘Message’, ‘Activity’, ‘MediaItem’, ‘Activity’, …), each of which has various properties whose values are either strings, references to instances of other classes, or enumerations. I’d like to make them usable beyond the confines of OpenSocial, so I’m making an RDF/OWL version. OpenSocial’s schema is an attempt to provide an overarching model for much of present-day mainstream ‘social networking’ functionality, including dating, jobs etc. Such a broad effort is inevitably somewhat open-ended, and so may benefit from being linked to data from other complementary sources.

With a bit of help from the shindig-dev list, #opensocial IRC, and Kevin Brown and Kevin Marks, I’ve tracked down the source files used to represent OpenSocial’s data schemas: they’re in the opensocial-resources SVN repository on code.google.com. There is also a downstream copy in the Apache Shindig SVN repo (I’m not very clear on how versioning and evolution is managed between the two). They’re Javascript files, structured so that documentation can be generated via javadoc. The Shindig-PHP schema diagram I posted recently is a representation of this schema.

So – my RDF version. At the moment it is merely a list of classes and their properties (expressed using via rdfs:domain), written using RDFa/HTML. I don’t yet define rdfs:range for any of these, nor handle the enumerated values (opensocial.Enum.Smoker, opensocial.Enum.Drinker, opensocial.Enum.Gender, opensocial.Enum.LookingFor, opensocial.Enum.Presence) that are defined in enum.js.

The code is all in the FOAF SVN, and accessible via “svn co http://svn.foaf-project.org/foaftown/opensocial/vocab/”. I’ve also taken the liberty of including a copy of the OpenSocial *.js files, and Mozilla’s Rhino Javascript interpreter js.jar in there too, for self-containedness.

The code in schemarama.js will simply generate an RDFA/XHTML page describing the schema. This can be checked using the W3C validator, or converted to RDF/XML with the pyRDFa service at W3C.

I’ve tested the output using the OwlSight/pellet service from Clark & Parsia, and with Protege 4. It’s basic but seems OK and a foundation to build from. Here’s a screenshot of the output loaded into Protege (which btw finds 10 classes and 99 properties).

An example view from protege, showing the class browser in one panel, and a few properties of Person in another.

OK so why might this be interesting?

  • Using OpenSocial-derrived vocabulary, OpenSocial-exported data in other contexts
    • databases (queryable via SPARQL)
    • mixed with FOAF
    • mixed with Microformats
    • published directly in RDFa/HTML
  • Mapping OpenSocial terms with other contact and social network schemas

This suggests some goals for continued exploration:

It should be possible to use “OpenSocial markup” in an ordinary homepage or blog (HTML or XHTML), drawing on any of the descriptive concepts they define, through using RDFa’s markup notation. As Mark Birbeck pointed out recently, RDFa is an empty vessel – it does not define any descriptive vocabulary. Instead, the RDF toolset offers an environment in which vocabulary from multiple independent sources can be mixed and merged quite freely. The hard work of the OpenSocial team in analysing social network schemas and finding commonalities, or of the Microformats scene in defining simple building-block vocabularies … these can hopefully be combined within a single environment.

Geographic Queries on Google App Engine

Much cleverness:

In this way, I was able to put together a geographic bounding box query, on top of Google App Engine, using a Geohash-like algorithm as a storage format, and use that query to power a FeatureServer Demo App Engine application, doing geographic queries of non-point features on top of App Engine/BigTable. Simply create a Geoindex object of the bounding box of your feature, and then use lower-left/upper-right points as bounds for your Geohash when querying.

Nokiana: the one about the CIA, Syria, and the N95

Matt Kane resurfaced on Bristol‘s underscore mailing list  with this intriguing snippet, after some travels around the middle-east: ” … discovered N95s (not mine) cannot be taken into Syria”.

I asked for the backstory, which goes like this:

Quite a palaver. Got the train from Istanbul to Syria (amazing trip!). At the border they didn’t search the bags of “westerners” but asked us all to show our phones and cameras. They glanced at them all quickly, checking the brand (“Nikon, ok. SonyEricsson, ok”). One guy had an N95 and they led him off the train. His sister informed us that they’d said it wasn’t allowed in Syria, and that if she knew her brother he’d not give it up without a fight. Despite being on contract, he argued with them for an hour and a half, even calling the embassies in Damascus and Ankara. In the end he gave it up, with a promise that they’d send it on to the airport from where he was leaving. A few days later we’re chatting with a barman and spot his phone – an N95, and yes, he got it in Syria! A few days after that we found out the full story from our hotel owner in Damascus. Apparently the CIA gave a load of bugged N95s to high-ranking Kurdish officials in Iraq, many of which were then smuggled into Syria and given as gifts to various shady characters. After the Hezbollah guy was assassinated in Damascus a few months ago, the Syrians set about trying to root out spies, which led to this ban on bringing N95s into the country. Apparently.

This is the first I’ve heard of it, but searching throws up a few references to rigged N95s as “spy phones”.

Somewhat-unrelated aside: I don’t believe the relevant functionality is exposed in the N95’s widget APIs yet. I had trouble making it vibrate, let alone self-destruct after this message. But at least widget/gadget/app security is getting some attention lately. It can’t be too long before “spy widgets” on your phone become a real concern, particularly since the exposure of phone APIs to 3rd party apps is such a creative combination. I should be clear that AFAIK, Nokia’s N95 widget platform is free of such vulnerabilities currently, and any “spy phone” mischief so far has been achieved through other kinds of interference. But it does make me glad to see a Widgets 1.0: Digital Signature spec moving along at W3C…

Lqraps! Reverse SPARQL

(update: files are now in svn; updated the link here)

I’ve just published a quick writeup (with running toy Ruby code) of a “reverse SPARQL” utility called lqraps, a tool for re-constructing RDF from tabular data.

The idea is that such a tool is passed a tab-separated (eventually, CSV etc.) file, such as might conventionally be loaded into Access, spreadsheets etc. The tool looks for a few lines of special annotation in “#”-comments at the top of the file, and uses these to (re)generate RDF. My simple implementation does this with text-munging of SPARQL construct notation into Turtle. As the name suggests, this could be done against the result tables we get from doing SPARQL queries; however it is more generally applicable to tabular data files.

A richer version might be able to output RDF/XML, but that would require use of libraries for SPARQL and RDF. This is of course close in spirit to D2RQ and SquirrelRDF and other relational/RDF mapping efforts, but the focus here is on something simple enough that we could include it in the top section of actual files.

The correct pronunciation btw is “el craps”. As in the card game

Graph URIs in SPARQL: Using UUIDs as named views

I’ve been using the SPARQL query language to access a very ad-hoc collection of personal and social graph data, and thanks to Bengee’s ARC system this can sit inside my otherwise ordinary WordPress installation. At the moment, everything in there is public, but lately I’ve been discussing oauth with a few folk as a way of mediating access to selected subsets of that data. Which means the data store will need some way of categorising the dozens of misc data source URIs. There are a few ways to do this; here I try a slightly non-obvious approach.

Every SPARQL store can have many graphs inside, named by URI, plus optionally a default graph. The way I manage my store is a kind of structured chaos, with files crawled from links in my own data and my friends. One idea for indicating the structure of this chaos is to keep “table of contents” metadata in the default graph. For example, I might load up <http://danbri.org/foaf.rdf> into a SPARQL graph named with that URI. And I might load up <http://danbri.org/evilfoaf.rdf> into another graph, also using the retrieval URI to identify the data within my SPARQL store. Now, two points to make here: firstly, that the SPARQL spec does not mandate that we do things this way. An application that for example wanted to keep historical versions of a FOAF or RSS of schema document, could keep the triples from each version in a different named graph. Perhaps these might be named with UUIDs, for example. The second point, is that there are many different “meta” claims we want to store about our datasets. And that mixing them all into the store-wide “default graph” could be rather limiting, especially if we mightn’t want to unconditionally believe even those claims.

In the example above for example, I have data from running a PGP check against foaf.rdf (which passed) and evilfoaf.rdf (which doesn’t pass a check against my pgp identity). Now where do I store the results of this PGP checking? Perhaps the default graph, but maybe that’s messy. The idea I’m playing with here is that UUIDs are reasonable identifiers, and that perhaps we’ll find ourselves sharing common UUIDs across stores.

Go back to my sent-mail FOAF crawl example from yesterday. How far did I get? Well the end result was a list of URLs which I looped through, and loaded into my big chaotic SPARQL store. If I run the following query, I get a list of all the data graphs loaded:

SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o . } }

This reveals 54 URLs, basically everything I’ve loaded into ARC in the last month or so. Only 30 of these came from yesterday’s hack, which used Google’s new Social Graph API to allow me to map from hashed mailbox IDs to crawlable data URIs. So today’s game is to help me disentangle the 30 from the 54, and superimpose them on each other, but not always mixed with every other bit of information in the store. In other words, I’m looking for a flexible, query-based way of defining views into my personal data chaos.

So, what I tried. I took the result of yesterday’s hack, a file of data URIs called urls.txt. Then I modified my commandline dataloader script (yeah yeah this should be part of wordpress). My default data loader simply takes each URI, gets the data, and shoves it into the store under a graph name which was the URI used for retrieval. What I did today is, additionally, make a “table of contents” overview graph. To avoid worrying about names, I generated a UUID and used that. So there is a graph called <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> which contains simple asserts of the form:

<http://www.advogato.org/person/benadida/foaf.rdf> a <http://xmlns.com/foaf/0.1/Document> .
<http://www.w3c.es/Personal/Martin/foaf.rdf> a <http://xmlns.com/foaf/0.1/Document> .
<http://www.iandickinson.me.uk/rdf/foaf.rdf> a <http://xmlns.com/foaf/0.1/Document> . # etc

…for each of the 30 files my crawler loaded into the store.

This lets us use <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> as an indirection point for information related to this little mailbox crawler hack. I don’t have to “pollute” the single default graph with this data. And because the uuid: was previously meaningless, it is something we might decided makes sense to use across data visibility boundaries, ie. you might use the same UUID in your own SPARQL store, so we can share queries and app logic.

Here’s a simple query. It says, “ask the mailbox crawler table of contents graph (which we call uuid:320d9etc…) for all things it knows about that are a Document”. Then it says “ask each of those documents, for everything in it”. And then the SELECT clause returns all the property URIs. This gives a first level overview of what’s in the Web of data files found by the crawl. Query was:

PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?p WHERE {
GRAPH <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> { ?crawled a :Document . }
GRAPH ?crawled { ?s ?p ?o . }
}

ORDER BY ?p

I’ll just show the first page full of properties it found; for the rest see link to the complete set. Since W3C’s official SPARQL doesn’t have aggregates, we’d need to write application code (or use something like the SPARQL+ extensions) to get property usage counts. Here are some of the properties that were found in the data:

  http://kota.s12.xrea.com/vocab/uranaibloodtype

http://purl.org/dc/elements/1.1/creator

http://purl.org/dc/elements/1.1/description

http://purl.org/dc/elements/1.1/format

http://purl.org/dc/elements/1.1/title

http://purl.org/dc/terms/created

http://purl.org/dc/terms/modifed

http://purl.org/dc/terms/modified

http://purl.org/net/inkel/rdf/schemas/lang/1.1#masters

http://purl.org/net/inkel/rdf/schemas/lang/1.1#reads

http://purl.org/net/inkel/rdf/schemas/lang/1.1/masters

http://purl.org/net/inkel/rdf/schemas/lang/1.1/reads

http://purl.org/net/inkel/rdf/schemas/lang/1.1/speaks

http://purl.org/net/schemas/quaffing/drankBeerWith

http://purl.org/net/schemas/quaffing/drankLagerWith

http://purl.org/net/vocab/2004/07/visit#caregion

http://purl.org/net/vocab/2004/07/visit#country

http://purl.org/net/vocab/2004/07/visit#usstate

http://purl.org/ontology/mo/hasTrack

http://purl.org/ontology/mo/myspace

http://purl.org/ontology/mo/performed

So my little corner of the Web includes properties that extend FOAF documents to include blood types, countries that have been visited, language skills that people have, music information, and even drinking habits. But remember that this comes from my corner of the Web – people I’ve corresponded with – and probably isn’t indicative of the wider network. But that’s what grassroots decentralised data is all about. The folk who published this data didn’t need to ask permission of any committe to do so, they just mixed in what they wanted to say, alongside terms more widely used like foaf:Person, foaf:name. This is the way it should be: ask forgiveness, not permission, from the language lawyers and standardistas.

Ok, so let’s dig deeper into the messy data I crawled up from my sentmail contacts?

Here’s one that finds some photos, either using FOAF’s :img or :depiction properties:

PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT * WHERE {
GRAPH <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> { ?crawled a :Document . }
GRAPH ?crawled {
{ ?x :depiction ?y1 } UNION { ?x :img ?y2 } .
}
}

Here’s another that asks the crawl results for names and homepages it found:

PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT * WHERE { GRAPH <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> { ?crawled a :Document . }
GRAPH ?crawled { { [ :name ?n; :homepage ?h ] } }
}

To recap, the key point here is that social data in a SPARQL store will be rather chaotic. Information will often be missing, and often be extended. It will come from a variety of parties, some of whom you trust, some of whom you don’t know, and a few of whom will be actively malicious. Later down the line, subsets of the data will need different permissioning: if I export a family tree from ancestry.co.uk, I don’t want everyone to be able to do a SELECT for mother’s maiden name and my date of birth.

So what I suggest here, is that we can use UUID-named graphs as an organizing structure within an otherwise chaotic SPARQL environment. The demo here shows how one such graph can be used as a “table of contents” for other graphs associated with a particular app — in this case, the Google-mediated sentmail crawling app I made yesterday. Other named views might be: those data files from colleagues, those files that are plausibly PGP-signed, those that contain data structured according to some particular application need (eg. calendar, addressbook, photos, …).

Outbox FOAF crawl: using the Google SG API to bring context to email

OK here’s a quick app I hacked up on my laptop largely in the back of a car, ie. took ~1 hour to get from idea to dataset.

Idea is the usual FOAFy schtick about taking an evidential rather than purely claim-based approach to the ‘social graph’.

We take the list of people I’ve emailed, and we probe the social data Web to scoop up user profile etc descriptions of them. Whether this is useful will depend upon how much information we can scoop up, of course. Eventually oauth or similar technology should come into play, for accessing non-public information. For now, we’re limited to data shared in the public Web.

Here’s what I did:

  1. I took the ‘sent mail’ folder on my laptop (a Mozilla Thunderbird installation).
  2. I built a list of all the email addresses I had sent mail to (in Cc: or To: fields). Yes, I grepped.
    • working assumption: these are humans I’m interacting with (ie. that I “foaf:know”)
    • I could also have looked for X-FOAF: headers in mail from them, but this is not widely deployed
    • I threw away anything that matched some company-related mail hosts I wanted to exclude.
  3. For each email address, scrubbed of noise, I prepend mailto: and generate a SHA1 value.
  4. The current Google SG API can be consulted about hashed mailboxes using URIs in the unregistered sgn: space, for example my hashed gmail address (danbrickley@gmail.com) gives sgn://mboxsha1/?pk=6e80d02de4cb3376605a34976e31188bb16180d0 … which can be dropped into the Google parameter playground. I’m talking with Brad Fitzpatrick about how best these URIs might be represented without inventing a new URI scheme, btw.
  5. Google returns a pile of information, indicating connections between URLs, mailboxes, hashes etc., whether verified or merely claimed.
  6. I crudely ignore all that, and simply parse out anything beginning with http:// to give me some starting points to crawl.
  7. I feed these to an RDF FOAF/XFN crawler attached to my SparqlPress-augmented WordPress blog.

With no optimisation, polish or testing, here (below) is the list of documents this process gave me. I have not yet investigated their contents.

The goal here is to find out more about the people who are filling my mailbox, so that better UI (eg. auto-sorting into folders, photos, task-clustering etc) could help make email more contextualised and sanity-preserving.  The next step I think is to come up with a grouping and permissioning model for this crawled FOAF/RDF in the WordPress SPARQL store, ie. a way of asking queries across all social graph data in the store that came from this particular workflow. My store has other semi-personal data, such as the results of crawling the groups of people who have successfuly commented in blogs and wikis that I run. I’m also looking at a Venn-diagram UI for presenting these different groups in a way that allows other groups (eg. “anyone I send mail to OR who commented in my wiki”) to be defined in terms of these evidence-driven primitive sets. But that’s another story.

FOAF files located with the help of the Google API:

 http://www.advogato.org/person/benadida/foaf.rdf

http://www.w3c.es/Personal/Martin/foaf.rdf

http://www.iandickinson.me.uk/rdf/foaf.rdf

http://people.tribe.net/xe0s242

http://www.geocities.com/rmarkwhite/foaf.xml

http://www.l3s.de/~diederich/foaf.rdf

http://www.w3.org/People/maxf/foaf

http://revyu.com/people/mhausenblas/about/rdf

http://www.cs.vu.nl/~laroyo/foaf.rdf

http://wwwis.win.tue.nl/~nstash/foaf.rdf

http://www.nimbustier.net/nimbustier/foaf.rdf

http://www.cubicgarden.com/webdav/profile/foaf.rdf

http://www.ibiblio.org/hhalpin/foaf.rdf

http://www.kjetil.kjernsmo.net/linkedin-contacts.rdf

http://trust.mindswap.org/cgi-bin/FilmTrust/foaf.cgi?user\u003ddino

http://www.kjetil.kjernsmo.net/friends.rdf

http://www.wikier.org/foaf.rdf

http://www.tribe.net/FOAF/4916e791-9793-40c9-875b-21cb25733a32

http://www.zonk.net/ghard/foaf.rdf

http://www.zonk.net/ghard/foaf.rdf

http://www.kjetil.kjernsmo.net/friends.rdf

http://www.kjetil.kjernsmo.net/linkedin-contacts.rdf

http://www.gnowsis.com/leo/foaf.xml

http://people.tribe.net/c09622fd-c817-4935-a039-5cb5d0d6bed8

http://www.few.vu.nl/~wrvhage/foaf.rdf

http://www.zonk.net/ghard/foaf.rdf

http://www.lri.fr/~pietriga/foaf.rdf

http://www.advogato.org/person/Pike/foaf.rdf

http://www.deri.ie/fileadmin/scripts/foaf.php?id\u003d12

For a quick hack, I’m really pleased with this. The sent-mail folder I used isn’t even my only one. A rewrite of the script over IMAP (and proprietary mail host APIs) could be quite fun.

RDF in Ruby revisited

If you’re interested in collaborating on Ruby tools for RDF, please join the public-rdf-ruby@w3.org mailing list at W3C. Just send a note to public-rdf-ruby-request@w3.org with a subject line of “subscribe”.

Last weekend I had the fortune to run into Rich Kilmer at O’Reilly’s ‘Social graph Foo Camp‘ gathering. In addition to helping decorate my tent, Rich told me a bit more about the very impressive Semitar RDF and OWL work he’d done in Ruby, initially as part of the DAML programme. Matt Biddulph was also there, and we discussed again what it would take to include FOAF import into Dopplr. I’d be really happy to see that, both because of Matt’s long history of contributions to the Semantic Web scene, but also because Dopplr and FOAF share a common purpose. I’ve long said that a purpose of FOAF is to engineer more coincidences in the world, and Dopplr comes from the same perspective: to increase serendipity.

Now, the thing about engineering serendipity, is that it doesn’t work without good information flow. And the thing about good information flow, is that it benefits from data models that don’t assume the world around us comes nicely parceled into cleanly distinct domains. Borrowing from American Splendor – “ordinary life is pretty complex stuff“. No single Web site, service, document format or startup is enough; the trick comes when you hook things together in unexpected combinations. And that’s just what we did in the RDF world: created a model for mixed up, cross-domain data sharing.

Dopplr, Tripit, Fire Eagle and other travel and location services may know where you and others are. Social network sites (and there are more every day) knows something of who you are, and something of who you care about. And the big G in the sky knows something of the parts of this story that are on the public record.

Data will always be spread around. RDF is a handy model for ad-hoc data merging from multiple sources. But you can’t do much without an RDF parser and a few other tools. Minimally, an RDF/XML parser and a basic API for navigating the graph. There are many more things you could add. In my old RubyRdf work, I had in-memory and SQL-backed storage, with a Squish query interface to each. I had a donated RDF/XML parser (from Ruby4R) and a much-improved query engine (with support for optionals) from Damian Steer. But the system is code-rotted. I wrote it when I was learning Ruby beginning 7 years ago, and I think it is “one to throw away”. I’m really glad I took the time to declare that project “closed” so as to avoid discouraging others, but it is time to revisit Ruby and RDF again now.

Other tools have other offerings: Dave Beckett’s Redland system (written in C) ships with a Ruby wrapper. Dave’s tools probably have the best RDF parsing facilities around, are fast, but require native code. Rena is a pure Ruby library, which looked like a great start but doesn’t appear to have been developed further in recent years.

I could continue going through the list of libraries, but Paul Stadig has already done a great job of this recently (see also his conclusions, which make perfect sense). There has been a lot of creative work around RDF/RDFS and OWL in Ruby, and collectively we clearly have a lot of talent and code here. But collectively we also lack a finished product. It is a real shame when even an RDF-enthusiast like Matt Biddulph is not in a position to simply “gem install” enough RDF technology to get a simple job done. Let’s get this fixed. As I said above,

If you’re interested in collaborating on Ruby tools for RDF, please join the public-rdf-ruby@w3.org mailing list at W3C. Just send a note to public-rdf-ruby-request@w3.org with a subject line of “subscribe”.

In six months time, I’d like to see at least one solid, well rounded and modern RDF toolkit packaged as a Gem for the Ruby community. It should be able to parse RDF/XML flawlessy, and in addition to the usual unit tests, it should be wired up to the RDF Test Cases (see download) so we can all be assured it is robust. It should allow for a fast C parser such as Raptor to be used if available, falling back on pure Ruby otherwise. There should be a basic API that allows me to navigate an RDF graph of properties and values using clear, idiomatic Ruby. Where available, it should hook up to external stores of data, and include at least a SPARQL protocol client, eventually a full SPARQL implementation. It should allow multiple graphs to be super-imposed and disentangled. Some support for RDFS, OWL and rule languages would be a big plus. Support for other notations such as Turtle, RDFa, or XSLT-based GRDDL transforms would be useful, as would a plugin for microformat import. Transliterating Python code (such as the tiny Euler rule engine) should be considered. Divergence from existing APIs in Python (and Perl, Javascript, PHP etc) should be minimised, and carefully balanced against the pull of the Ruby way. And (thought I lack strong views here) it should be made available under a liberal opensource license that permits redistribution under GPL. It should also be as I18N and Unicode-friendly as is possible in Ruby these days.

I’m not saying that all RDF toolkits should be merged, or that collaboration is compulsory. But we are perilously fragmented right now, and collaboration can be fun. In six months time, people who simply want to use RDF from Ruby ought to be pleasantly suprised rather than frustrated when they take to the ‘net to see what’s out there. If it takes a year instead of six months, sure whatever. But not seven years! It would be great to see some movement again towards a common library…

How hard can it be?

Embedding queries in RDF – FOAF Group example

Is this crazy or useful? Am not sure yet.

This example uses FOAF vocabulary for groups and openid. So the basic structure here is that Agents (including persons) can have an :openid and can be a :member of a :Group.

From an openid-augmented WordPress, we get a list of all the openids my blog knows about. From an openid-augmented MediaWiki, we get a list of all the openids that contribute to the FOAF project wiki. I dumped each into a basic RDF file (not currently an automated process). But the point here is to explore enumerated groups using queries.

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns=”http://xmlns.com/foaf/0.1/”>
<Group rdf:about=’#both’>
<!– enumerated membership –>
<member><Agent><openid rdf:resource=’http://danbri.org/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://tommorris.org/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://kidehen.idehen.net/dataspace/person/kidehen’/></Agent></member>
<member><Agent><openid rdf:resource=’http://www.wasab.dk/morten/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://kronkltd.net/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://www.kanzaki.com/’/></Agent></member>

<!– rule-based membership –>

<constructor><![CDATA[
PREFIX : <http://xmlns.com/foaf/0.1/>
CONSTRUCT {
<http://danbri.org/yasns/danbri/both.rdf#thegroup> a :Group; :member [ a :Agent; :openid ?id ]
}
WHERE {
GRAPH <http://wiki.foaf-project.org/_users.rdf> { [ a :Group; :member [ a :Agent; :openid ?id ]. ] }
GRAPH <http://danbri.org/yasns/danbri/_group.rdf> { [ a :Group; :member [ a :Agent; :openid ?id ]. ] }
}
]]></constructor>
</Group>
</rdf:RDF>

This RDF description does it both ways. It enumerates (for simple clients) a list of members of a group whose members are those individuals that are both commentators on my blog, and contributors to the FOAF wiki. At least, to the extent they’re detectable via common use of OpenID URIs. But the RDF group description also embeds a SPARQL query, the kind which generates RDF rather than an SQL-like resultset. The RDF essentially regenerates the enumerated list, assuming the query is run against an RDF dataset with the data graphs appropriately populated.

Now I sorta like this, and I sorta don’t. It may be incredibly powerful, or it may be a bit to clever for its own good.

Certainly there’s scope overlap with the W3C RIF rules work, and with the capabilities of OWL. FOAF has long contained an experimental method for using OWL to do something similar, but it hasn’t found traction. The motivation I have here for trying SPARQL here is that it has built-in machinery for talking about the provenance of data; so I could write a group description this way that says “members are anyone listed as a colleague in http://myworkplace.example.com/stafflist.rdf”. Or I could mix in arbitrary descriptive vocabularies; family tree stuff, XFN, language abilities (speaks-reads-writes) etc.

Where I think this could fall down is in the complexity of the workflow. The queries need executing against some SPARQL installation with a configured dataset, and the query lists URIs of data graphs. But I doubt database admins will want to randomly load any/every RDF file mentioned in these shared queries. Perhaps something like SparqlPress, attached to one’s weblog, and social filters to load only files in queries eg. from friends? Also, authoring these kinds of query isn’t something non-geek users are going to do often, and the sorts of queries that will work will depend of course on the data actually available. Sure I could write a query based on matching the openids of former colleagues, but the group will be empty unless the data listing people as former colleagues is actually out there and in the Web, and written in the terms anticipated by the query.

On the other hand, this mechanism does appeal, and could go way beyond FOAF group definitions. We could see a model where people post data in the Web but also post queries, eg. revisiting the old work Libby and I explored around RSS query. On the other other hand, who wants to make their Web queries public? All that said, the same goes for the data being queried. And since this technique embeds queries inside ordinary RDF data, however we deal with the data visibility issue for RDF/FOAF should also work for the query stuff. Perhaps. Can’t blame me for trying…
I realise this isn’t the clearest of explanations. Let’s try again:

RDF is normally for publishing collections of simple claims about the world. This is an experiment in embedding data-generating-queries amongst these claims, where the query is configured to output more RDF claims (aka statements, triples etc), but only when executed against some appropriate body of RDF data. Since the query is written in SPARQL, it allows the data-generation rules to mention interesting things, such as properties of the source of the data being queried.

This particular experiment is couched in terms of FOAF’s “Group” construct, but the technique is entirely general. The example above defines a group of agents called the “both” group, by saying that an Agent is in that group if it its OpenID URI is listed in each of two RDF documents specified, ie. both a commentator on my blog, and a contributor to the FOAF Wiki. Other examples could be “(fe)male employees” or “family members sharing a blood type” or in fact, any descriptive pattern that can match against the data to hand and be expressed in SPARQL.

Open IDiomatic? Dada engine hosting as OpenID client app

All of us are dumber than some of us.

Various folk are concerned that OpenID has more provider apps than consumer apps, so here is my little website idea for an OpenID-facilitated collaborative thingy. I’ve loved the Dada Engine for years. The Dada Engine is the clever-clogs backend for grammar-driven nonsense generators such as the wonderful Postmodernism Generator.

Since there are many more people capable of writing funny prose to configure this machine, than there are who can be bothered to do the webhosting sysadmin, I reckon it’d be worthwhile to make a general-purpose hosted version whereby users could create new content through a Web interface. And since everyone would forget their passwords, this seems a cute little project to get my hands dirty with OpenID. To date, all I’ve done server-side with OpenID is install patches to MediaWiki and WordPress. That went pretty smoothly. My new hacking expedition, however, hit a snag already: the PHP libraries on my EC2 sandbox server didn’t like authenticating against LiveJournal. I’m new to PHP so when something stops working I panic and whine in IRC instead of getting to the bottom of the problem.

Back to the app idea, the Dada engine basically eats little config files that define sentence structures, and spews nonsense. Here’s a NSFW subgenial rant:

FlipFlip:scripts danbri$ pb < brag.pb
Fuck ‘em if they can’t take a joke! I’m *immune*! *Yip, yip, YEEEEEEE!*
*Backbone Blowout*! I do it for *fun*! YAH-HOOOO! Now give me some more of…

And here’s another:

FlipFlip:scripts danbri$ pb < brag.pb
I’m a fission reactor, I fart plutonium, power plants are fueled by the breath
of my brow; when they plug *me* in, the lights go out in Hell County! Now give
me some more of…

A fragment of the grammar:

“I say, `” slogan “‘. By God, `” slogan “‘, I say! ” |
“I am ” entity “, I am ” entity “! ” |
“I’ll drive a mile so as not to walk a foot; I am ” entity “! ” |
“Yes, I’m ” entity “! ” |
“I drank *” being “* under ” number ” tables, I am too ” adjective ” to die, I’m insured for acts o’ God *and* Satan! ” |
“I was shanghaied by ” entities ” and ” entities ” from ” place “, and got away with their hubcaps! ” |
“I *cannot* be tracked on radar! ” |
“I wear nothing uniform, I wear *no* ” emphatic ” uniform! ” | ….

To be crystal clear, this hosted app is total vapourware currently. And I’m not sure if it would be a huge piece of work to make sure the dada engine didn’t introduce security holes. Certainly the version that uses the C preprocessor scares me, but the “pb” utility shown above doesn’t use that. Opinions on the safety of this welcomed. But never mind the details, smell the vision! It could always use Novewriting (a Python “knock-off of the Dada Engine”) instead.

The simplest version of a hosting service for this kind of user generated meta-content would basically be textarea editing and a per-user collection of files that are flagged public or private. But given the nature of the system, it would be great if the text generator could be run against grammars that multiple people could contribute to. Scope for infinite folly…