Small databases, loosly joined

Over the last month or so, I’ve had a SPARQL store (just an ARC instance plus a loader script) running on the Amazon EC2 sandbox I set up for FOAF experiments. Yesterday I installed a fresh SparqlPress bundle into my own blog, which runs on another server. So how to get the data across? Since SparqlPress includes a scutter, which despite Urban Dictionary is just the FOAF term for RDF crawler, we can set it to crawl a Web of linked RDF. Here’s a SPARQL query on the former installation which sets up some pointers to each graph (scutters often follow rdfs:seeAlso, amongst other techniques). It’s interesting too because it summarises the data found in each graph, simply by listing properties.

CONSTRUCT {
<http://danbri.org/foaf.rdf#danbri> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?g .
?g <http://purl.org/net/scutter/vocabItem> ?p .
}
WHERE {
GRAPH ?g { ?s ?p ?o }
}

The sc:vocabItem property is a work of fiction; I’ve not searched around to see if similar things already exist. The idea of linking together medium-sized repositories of metadata is an old one. The Harvest crawling/indexing system did this with RDM/SOIF. In a similar vein was WHOIS++, a now-obsolete directory system that some of us tried using for resource discovery in the late 90s – collections of Web site descriptions federated through having each server syndicate a summary of its content (record types, field names, and tokenized field values). I hadn’t really thought of it this way until I wrote this query, but SparqlPress could in some ways support a reimplementation of the Harvest and WHOIS++ design, if we have gatherer and broker roles attached to a network of blogs. All these ideas are of course close to the P2P scene too: little datasets exchanging summaries of their content to aid query routing. The main point here is that we can run a pretty simple SPARQL query, and get a summary of the contents of various named data graphs; the results of the query in turn give a summary of a database that has copies of these graphs.

Public Skype RDF presence service

OK I don’t know how this works, or how it happens (other Asemantics people might know more), but for those who didn’t know:

At http://mystatus.skype.com/danbrickley.xml there is a public RDF/XML document reflecting my status in Skype. There seems to be one for every active account name in the system.

Example markup:

<rdf:RDF>
<Status rdf:about=”urn:skype:skype.com:skypeweb/1.1″>
<statusCode rdf:datatype=”http://www.skype.com/go/skypeweb”>5</statusCode>
<presence xml:lang=”NUM”>5</presence>
<presence xml:lang=”en”>Do Not Disturb</presence>
<presence xml:lang=”fr”>Ne pas déranger</presence>
<presence xml:lang=”de”>Beschäftigt</presence>
<presence xml:lang=”ja”>取り込み中</presence>
<presence xml:lang=”zh-cn”>請勿打擾</presence>
<presence xml:lang=”zh-tw”>请勿打扰</presence>
<presence xml:lang=”pt”>Ocupado</presence>
<presence xml:lang=”pt-br”>Ocupado</presence>
<presence xml:lang=”it”>Occupato</presence>
<presence xml:lang=”es”>Ocupado</presence>
<presence xml:lang=”pl”>Nie przeszkadzać</presence>
<presence xml:lang=”se”>Stör ej</presence>
</Status>
</rdf:RDF>

In general (expressed in FOAF terms), for any :OnlineAccount that has an :accountServiceHomepage of http://www.skype.com/ you can take the :accountName – let’s call it ?a and plug it into the URI Template http://mystatus.skype.com/{a}.xml to get presence information in curiously cross-cultural RDF. In other words, one’s Skype status is part of the public record on the Web, well beyond the closed P2P network of Skype IM clients.

Thinking about RDF vocabulary design and document formats, the Skype representation is roughly akin to FOAF documents (such as those on LiveJournal currently) that don’t indicate explicitly that they’re a :PersonalProfileDocument, nor say who is the :primaryTopic or :maker of the document. Passed the RDF/XML on its own, you don’t have enough context to know what it is telling you. Whereas, if you know the URI, and the URI template rule shown above, you have a better idea of the meaning of the markup. Still, it’s useful. I suspect it might be time to add foaf:skypeID as an inverse-functional (ie. uniquely identifying) property to the FOAF spec, to avoid longwinded markup and make it easier to bridge profile data and up-to-the-minute status data. Thoughts?

Google boost Jabber + VOIP, Skype releases IM toolkit, Jabber for P2P SPARQL?

Interesting times for the personal Semantic Web: “Any client that supports Jabber/XMPP can connect to the Google Talk service” Google Talk and Open Communications. It does voice calls too, using “a custom XMPP-based signaling protocol and peer-to-peer communication mechanism. We will fully document this protocol. In the near future, we plan to support SIP signaling.”

Meanwhile, Skype (the P2P-based VOIP and messaging system) have apparently released developer tools for their IM system. From a ZDNet article:

“Skype to wants to embrace the rest of Internet,” Skype co-founder Janus Friis said during a recent interview.

He did offer hypothetical examples. Online gamers involved in massive multiple player mayhem could use Skype IM to taunt rivals and discuss strategy with teammates. Skype’s IM features could be incorporated, Friss suggests, into software-based media players for personal computers, Web sites for dating, blogging or “eBay kinds of auctions,” Friis said.

I spent some time recently looking at collaborative globe-browsing with Google Earth (ie. giving and taking of tours), and yesterday, revisiting Jabber/XMPP as a possible transport for SPARQL queries and responses between friends and FOAFs. Both apps could get a healthy boost from these developments in the industry. Skype is great but the technology could do with being more open; maybe the nudge from Google will help there. Jabber is great but … hardly used by the people I chat with (who are split across MSN, Yahoo, AIM, Skype and IRC).

For a long time I’ve wanted to do RDF queries in a P2P context (eg. see book chapter I wrote with Rael Dornfest). Given Apple’s recent boost for Jabber, and now this from Google, the technology looks to have a healthy future. I want to try exposing desktop, laptop etc RDF collections (addressbooks, calendars, music, photos) directly as SPARQL endpoints exposed via Jabber. There will be some fiddly details, but the basic idea is that Jabber users (including Google and Apple customers) could have some way to expose aspects of their local data for query by their friends and FOAFs, without having to upload it all to some central Web site.

Next practical question: which Jabber software library to start hacking with? I was using Rich Kilmer’s Jabber4R but read that it wasn’t unmaintained, so wondering about switching to Perl or Python…