Quick clarification on SPARQL extensions and “Lock-in”

It’s clear from discussion bouncing around IRC, Twitter, Skype and elsewhere that “Lock-in” isn’t a phrase to use lightly.

So I post this to make myself absolutely clear. A few days ago I mentioned in IRC a concern that newcomers to SPARQL and RDF databases might not appreciate which SPARQL extensions are widely implemented, and which are the specialist offerings of the system they happen to be using. I mentioned OpenLink’s Virtuoso in particular as a SPARQL implementation that had a rich and powerful set of extensions.

Since it seems there is some risk I might be mis-interpreted as suggesting OpenLink are actively trying to “do a Microsoft” and trap users in some proprietary pseudo-SPARQL, I’ll state what I took to be obvious background knowledge: OpenLink is a company who owe their success to the promotion of cross-vendor database portability, they have been tireless advocates of a standards-based Semantic Web, and they’re active in proposing extensions to W3C for standardisation. So – no criticism of OpenLink intended. None at all.

All I think we need here, are a few utilities that help developers understand the nature of the various SPARQL dialects and the potential costs/benefits of using them. Perhaps an online validator, alongside those for RDF/XML, RDFa, Turtle etc. Such a validator might usefully list the extensions used in some query, and give pointers (perhaps into a wiki) where the status of the various extensions constructs can be discussed and documented.

Since SPARQL is such a young language, it lacks a lot of things that are taken from granted in the SQL world, and so using rich custom extensions when available is for many developers a sensible choice. My only concern is that it must be a choice, and one entered into consciously.

The Time of Day

(a clock showing no time)

From my Skype logs [2008-06-19 Dan Brickley: 18:24:23]

So I had a drunken dream about online microcurrencies last night. Also about cats and water-slides but that’s another story. Idea was of a karma donation system based on one-off assignments from person to person of specified chunks of their lifetime; ‘giving the time of day’. you’re allowed to give any time of day taken from those days you’ve been alive so far. they’re not directly redistributable, nor necessarily related to what happened during the specified time. there’s no central banker, beyond the notion of ‘the public record’. The system naturally favours the old/experienced, but if someone gets drunk and gives all their time/karma to a porn site, at least in the morning they’ll have another 24h ‘in the bank’. Or they could retract/deny the gift, although doing so a lot would also be visible in the public record and doing so excessively would make one look a bit sketchy, one’s time gifts seem less valuable etc. Anchoring to a real world ‘good’ (time) is supposed to provide some control against runaway inflation, as is non-redistributability, but also the time thing is nice for visualizations and explanation. I’m not really sure if it makes sense but thought i’d write it down before i forget the idea…

One idea would be for the time gifts to be redeemable, but that i think pushes the metaphor too far into being a real currency for a fictional world where hourly rates are flattened. Some Lets schemes probably work that way I guess…

So I’ve been meaning to write this up, but in the absense of having done so, here’s the idea as it first struck me. I had been thinking a bit about online reputation services, and the kinds of information they might aggregate. Garlik’s QDOS and FOAF experiments being a good example of this kind of evidence aggregation. As OpenID, FOAF, microformats etc. take hold, I really think we’ll see a massive parting of waves, red sea style, with the “public record” on one side, and “private stuff” on the other.

And in the public record, we’ll be attaching information about the things we make and do to well-known identifiers for people (and their semi-detached aliases). Various websites have rating and karma mechanisms, but it is far from clear how they’ll look when shared in the public Web. Nor whether something robust and not-too-gameable will come out of it. There are certainly various modelling idioms (eg. advogato do their internal calculations, and then put everyone in one of several broad-brush groups; here’s my advogato FOAF). See also my previous notes on representing expertise.

Now in some IRC channels, there are bots where you can dish out credit by typing things like:

edd++ # xtechy

…and have a bot add up the credits, as well as the comments. In small IRC communities these aren’t gamed except for fun. So I’ve been thinking: how can these kinds of habits ever work in the wider Web, where people are spread across Web sites (but nevetherless identifiable with OpenID and FOAF). How could it not turn hideous? What limited resource do we each have a supply of? No, not kidneys. And in a hungover stupor I came to think that “the time of day” could be such a resource. It’s really just a metaphor, and I’m not sure at all that the quantifiable nature is a benefit. But I also quite like that we each have a neverending supply of the stuff, and that even a fleeting moment can count.

Update: here’s a post from Simon Lucy which has a very similar direction (it was Simon I was drinking with the night before writing this). Excerpt:

And what do you do with your positive balance? Need you do anything? I imagine those that care will publish their balance or compare it with others in similar way to company cars or hi fi tvs. There will always be envy and jealousy.

But no one can steal your balance, misuse it.

So who wants to host the Ego Bank.

The main difference compared to my suggested scheme, is just the ‘the Web’ and the public record it carries, are the “ego bank”, creating a playground for aggregators of karma, credibility and reputation information. “The time of day” would just be one such category of information…

Public Skype RDF presence service

OK I don’t know how this works, or how it happens (other Asemantics people might know more), but for those who didn’t know:

At http://mystatus.skype.com/danbrickley.xml there is a public RDF/XML document reflecting my status in Skype. There seems to be one for every active account name in the system.

Example markup:

<Status rdf:about=”urn:skype:skype.com:skypeweb/1.1″>
<statusCode rdf:datatype=”http://www.skype.com/go/skypeweb”>5</statusCode>
<presence xml:lang=”NUM”>5</presence>
<presence xml:lang=”en”>Do Not Disturb</presence>
<presence xml:lang=”fr”>Ne pas déranger</presence>
<presence xml:lang=”de”>Beschäftigt</presence>
<presence xml:lang=”ja”>取り込み中</presence>
<presence xml:lang=”zh-cn”>請勿打擾</presence>
<presence xml:lang=”zh-tw”>请勿打扰</presence>
<presence xml:lang=”pt”>Ocupado</presence>
<presence xml:lang=”pt-br”>Ocupado</presence>
<presence xml:lang=”it”>Occupato</presence>
<presence xml:lang=”es”>Ocupado</presence>
<presence xml:lang=”pl”>Nie przeszkadzać</presence>
<presence xml:lang=”se”>Stör ej</presence>

In general (expressed in FOAF terms), for any :OnlineAccount that has an :accountServiceHomepage of http://www.skype.com/ you can take the :accountName – let’s call it ?a and plug it into the URI Template http://mystatus.skype.com/{a}.xml to get presence information in curiously cross-cultural RDF. In other words, one’s Skype status is part of the public record on the Web, well beyond the closed P2P network of Skype IM clients.

Thinking about RDF vocabulary design and document formats, the Skype representation is roughly akin to FOAF documents (such as those on LiveJournal currently) that don’t indicate explicitly that they’re a :PersonalProfileDocument, nor say who is the :primaryTopic or :maker of the document. Passed the RDF/XML on its own, you don’t have enough context to know what it is telling you. Whereas, if you know the URI, and the URI template rule shown above, you have a better idea of the meaning of the markup. Still, it’s useful. I suspect it might be time to add foaf:skypeID as an inverse-functional (ie. uniquely identifying) property to the FOAF spec, to avoid longwinded markup and make it easier to bridge profile data and up-to-the-minute status data. Thoughts?

Google boost Jabber + VOIP, Skype releases IM toolkit, Jabber for P2P SPARQL?

Interesting times for the personal Semantic Web: “Any client that supports Jabber/XMPP can connect to the Google Talk service” Google Talk and Open Communications. It does voice calls too, using “a custom XMPP-based signaling protocol and peer-to-peer communication mechanism. We will fully document this protocol. In the near future, we plan to support SIP signaling.”

Meanwhile, Skype (the P2P-based VOIP and messaging system) have apparently released developer tools for their IM system. From a ZDNet article:

“Skype to wants to embrace the rest of Internet,” Skype co-founder Janus Friis said during a recent interview.

He did offer hypothetical examples. Online gamers involved in massive multiple player mayhem could use Skype IM to taunt rivals and discuss strategy with teammates. Skype’s IM features could be incorporated, Friss suggests, into software-based media players for personal computers, Web sites for dating, blogging or “eBay kinds of auctions,” Friis said.

I spent some time recently looking at collaborative globe-browsing with Google Earth (ie. giving and taking of tours), and yesterday, revisiting Jabber/XMPP as a possible transport for SPARQL queries and responses between friends and FOAFs. Both apps could get a healthy boost from these developments in the industry. Skype is great but the technology could do with being more open; maybe the nudge from Google will help there. Jabber is great but … hardly used by the people I chat with (who are split across MSN, Yahoo, AIM, Skype and IRC).

For a long time I’ve wanted to do RDF queries in a P2P context (eg. see book chapter I wrote with Rael Dornfest). Given Apple’s recent boost for Jabber, and now this from Google, the technology looks to have a healthy future. I want to try exposing desktop, laptop etc RDF collections (addressbooks, calendars, music, photos) directly as SPARQL endpoints exposed via Jabber. There will be some fiddly details, but the basic idea is that Jabber users (including Google and Apple customers) could have some way to expose aspects of their local data for query by their friends and FOAFs, without having to upload it all to some central Web site.

Next practical question: which Jabber software library to start hacking with? I was using Rich Kilmer’s Jabber4R but read that it wasn’t unmaintained, so wondering about switching to Perl or Python…