Lqraps! Reverse SPARQL

(update: files are now in svn; updated the link here)

I’ve just published a quick writeup (with running toy Ruby code) of a “reverse SPARQL” utility called lqraps, a tool for re-constructing RDF from tabular data.

The idea is that such a tool is passed a tab-separated (eventually, CSV etc.) file, such as might conventionally be loaded into Access, spreadsheets etc. The tool looks for a few lines of special annotation in “#”-comments at the top of the file, and uses these to (re)generate RDF. My simple implementation does this with text-munging of SPARQL construct notation into Turtle. As the name suggests, this could be done against the result tables we get from doing SPARQL queries; however it is more generally applicable to tabular data files.

A richer version might be able to output RDF/XML, but that would require use of libraries for SPARQL and RDF. This is of course close in spirit to D2RQ and SquirrelRDF and other relational/RDF mapping efforts, but the focus here is on something simple enough that we could include it in the top section of actual files.

The correct pronunciation btw is “el craps”. As in the card game

Blood money

It’s in the Daily Mail, so it must be true:

Motorists will be targeted by a new generation of road cameras which work out how many people are in a car by measuring the amount of bodily fluid it contains.

The latest snooping device on the nation’s roads aims to penalise lone drivers who abuse car-sharing lanes, and is part of a Government effort to combat congestion at busy times.

The cameras work by sending an infrared beam through the windscreen of vehicles which detects the unique make-up of blood and water content in human skin.

Coincidentally enough, today’s hackery:

 danbri> add ds danbri http://danbri.org/words/sparqlpress/sparql

jamendo> store added

danbri> add template “FIND BLOOD DONORS” “select ?n ?bt where { ?x <http://xmlns.com/foaf/0.1/nick> ?n  . ?x <http://kota.s12.xrea.com/vocab/uranaibloodtype> ?bt . }”

danbri> FIND BLOOD DONORS

jamendo> “danbri” “A+”

This makes use of the Uranai FOAF extension for recording one’s blood type.

Trying xOperator via Yves’s installation; the above dialog conducted purely through Jabber IM. Nice to see XMPP+SPARQL explorations gaining traction. I really like the extra value-adding layers added by the xOperator text chat UI, and the fact that you can wire in HTTP-based SPARQL endpoints too. Similar functionality in IRC comes from Bengee’s sparqlbot (see intro doc). I think the latter has a slightly fancier templating system, but I haven’t really explored all that xOperator can do. I did get the ‘blood’ example working with sparqlbot in IRC too; it seems most people don’t have their bloodtype in their FOAF file. So much for finding handy donors ;)

Blood and XMPP aside, there’s a lot going on here technically. The fact that I can ask Yves’ installation of xOperator to attach to my SparqlPress installation’s database is interesting. At the moment everything in there is public, so ACL is easy. But we could imagine that picture changing. Perhaps I decide that healthcare-related info goes into a special set of named graphs. How could permissioning for this be handled so that Yves’s bot can still provide me with UI? OAuth is probably part of the answer here, although that means bouncing out into http/html UI to some extent at least. And there’s more to say on SPARQL ACL too. Much more, but not right now.

The other technically rich area is on these natural language templates. It is historically rare for public SQL endpoints to be made available, even read only. Some of the reasons for this (local obscure schemas, closed world logic) have gone away with SPARQL, but the biggest reason – ease of writing very expensive queries – is very much still present. Guha and McCool’s Alpiri/TAP work argued that we needed another lighter data interface; they proposed ‘GetData‘. The templates system we see in xOperator and sparqlbot (see also julie, wh4 and foafbot) suggests another take on this problem. Rather than expose all of my personal datastore to random ‘SELECT * WHERE { ?p ?s ?o }’ crawlers, these bots can provide a wrapping layer, allowing in only queries that are much more tightly constrained, as well as being more human-oriented. Hmm I’m suddenly reminded of Metalog; wonder if Massimo is still working on that area.

A nice step here might be if chat-oriented query templates could be shared between xOperator and sparqlbot. Wonder what would be needed to make that happen…

A friend by any other namespace

Homer goes to Moe’s. He buys Barney a beer, calling him “soulmate,” but Barney says that he’s “really more of a chum,” not his soulmate. Lenny describes himself as a “crony”; Carl, a “acquaintance”; Larry, a “colleague”; Sam, a “sympathizer”; Bumblebee Man, a “compadre”; Kearney, an “associate”; and Hibbert, a “contemporary”.

I’m a well-wisher, in that I don’t wish you any specific harm. — Moe, on his relationship to Homer, “El Viaje Misterioso de Nuestro Homer” [3F24]

See also: wordnet

Friend lists on Facebook – “all lists are private”

Friend Lists are here.

 Now you can easily organize your friends into convenient lists for messaging, invites, and more. You can create whatever kinds of lists you want; all lists are private. Click “Make a New List” to get started.

Just noticed this message on Facebook as I logged in. It was rumoured a little while ago, and in general fits with my goals for FOAF, use of Group markup etc. I just expect that “all lists are private” isn’t the end of the privacy story here. Will friend list data be exposed to 3rd party app? There are lots of reasons it would be great to do so (and foaf:Group provides a notation for sharing this data with downstream standards-based apps). On the other hand, we don’t want to see any more “Find out what your friends really think of you” uglyness. Wonder which way they’ll jump.

JQbus: social graph query with XMPP/SPARQL

Righto, it’s about time I wrote this one up. One of my last deeds at W3C before leaving at the end of 2005, was to begin the specification of an XMPP binding of the SPARQL querying protocol. For the acronym averse, a quick recap. XMPP is the name the IETF give to the Jabber messaging technology. And SPARQL is W3C’s RDF-based approach to querying mixed-up Web data. SPARQL defines a textual query language, an XML result-set format, and a JSON version for good measure. There is also a protocol for interacting with SPARQL databases; this defines an abstract interface, and a binding to HTTP. There is as-yet no official binding to XMPP/Jabber, and existing explorations are flawed. But I’ll argue here, the work is well worth completing.

jqbus diagram

So what do we have so far? Back in 2005, I was working in Java, Chris Schmidt in Python, and Steve Harris in Perl. Chris has a nice writeup of one of the original variants I’d proposed, which came out of my discussions with Peter St Andre. Chris also beat me in the race to have a working implementation, though I’ll attribute this to Python’s advantages over Java ;)

I won’t get bogged down in the protocol details here, except to note that Peter advised us to use IQ stanzas. That existing work has a few slight variants on the idea of sending a SPARQL query in one IQ packet, and returning all the results within another, and that this isn’t quite deployable as-is. When the result set is too big, we can run into practical (rather than spec-mandated) limits at the server-to-server layers. For example, Peter mentioned that jabber.org had a 65k packet limit. At SGFoo last week, someone suggested sending the results as an attachment instead; apparently this one of the uncountably many extension specs produced by the energetic Jabber community. The 2005 work was also somewhat partial, and didn’t work out the detail of having a full binding (eg. dealing with default graphs, named graphs etc).

That said, I think we’re onto something good. I’ll talk through the Java stuff I worked on, since I know it best. The code uses Ignite Online’s Smack API. I have published rough Java code that can communicate with instances of itself across Jabber. This was last updated July 2007, when I fixed it up to use more recent versions of Smack and Jena. I forget if the code to parse out query results from the responses was completed, but it does at least send SPARQL XML results back through the XMPP network.

sparql jabber interaction

sparql jabber interaction

So why is this interesting?

  • SPARQLing over XMPP can cut through firewalls/NAT and talk to data on the desktop
  • SPARQLing over XMPP happens in a social environment; queries are sent from and to Jabber addresses, while roster information is available which could be used for access control at various levels of granularity
  • XMPP is well suited for async interaction; stored queries could return results days or weeks later (eg. job search)
  • The DISO project is integrating some PHP XMPP code with WordPress; SparqlPress is doing same with SPARQL

Both SPARQL and XMPP have mechanisms for batched results, it isn’t clear which, if either, to use here.

XMPP also has some service discovery mechanisms; I hope we’ll wire up a way to inspect each party on a buddylist roster, to see who has SPARQL data available. I made a diagram of this last summer, but no code to go with it yet. There is also much work yet to do on access control systems for SPARQL data, eg. using oauth. It is far from clear how to integrate SPARQL-wide ideas on that with the specific possibilities offered within an XMPP binding. One idea is for SPARQL-defined FOAF groups to be used to manage buddylist rosters, “friend groups”.

Where are we with code? I have a longer page in FOAF SVN for the Jqbus Java stuff, and a variant on this writeup (includes better alt text for the images and more detail). The Java code is available for download. Chris’s Python code is still up on his site. I doubt any of these can currently talk to each other properly, but they show how to deal with XMPP in different languages, which is useful in itself. For Perl people, I’ve uploaded a copy of Steve’s code.

The Java stuff has a nice GUI for debugging, thanks to Smack. I just tried a copy. Basically I can run a client and a server instance from the same filetree, passing it my LiveJournal and Google Talk jabber account details. The screenshot here shows the client on the left having the XML-encoded SPARQL results, and the server on the right displaying the query that arrived. That’s about it really. Nothing else ought to be that different from normal SPARQL querying, except that it is being done in an infrastructure that is more socially-grounded and decentralised than the classic HTTP Web service model.

JQbus debug

Small databases, loosly joined

Over the last month or so, I’ve had a SPARQL store (just an ARC instance plus a loader script) running on the Amazon EC2 sandbox I set up for FOAF experiments. Yesterday I installed a fresh SparqlPress bundle into my own blog, which runs on another server. So how to get the data across? Since SparqlPress includes a scutter, which despite Urban Dictionary is just the FOAF term for RDF crawler, we can set it to crawl a Web of linked RDF. Here’s a SPARQL query on the former installation which sets up some pointers to each graph (scutters often follow rdfs:seeAlso, amongst other techniques). It’s interesting too because it summarises the data found in each graph, simply by listing properties.

CONSTRUCT {
<http://danbri.org/foaf.rdf#danbri> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?g .
?g <http://purl.org/net/scutter/vocabItem> ?p .
}
WHERE {
GRAPH ?g { ?s ?p ?o }
}

The sc:vocabItem property is a work of fiction; I’ve not searched around to see if similar things already exist. The idea of linking together medium-sized repositories of metadata is an old one. The Harvest crawling/indexing system did this with RDM/SOIF. In a similar vein was WHOIS++, a now-obsolete directory system that some of us tried using for resource discovery in the late 90s – collections of Web site descriptions federated through having each server syndicate a summary of its content (record types, field names, and tokenized field values). I hadn’t really thought of it this way until I wrote this query, but SparqlPress could in some ways support a reimplementation of the Harvest and WHOIS++ design, if we have gatherer and broker roles attached to a network of blogs. All these ideas are of course close to the P2P scene too: little datasets exchanging summaries of their content to aid query routing. The main point here is that we can run a pretty simple SPARQL query, and get a summary of the contents of various named data graphs; the results of the query in turn give a summary of a database that has copies of these graphs.

Venn diagrams for Groups UI

Venn groups diagram

This is the result of feeding a relatively small list of groups and their members to the VennMaster java tool. I’ve been looking for swooshy automatic layout tools that might help with interactive visualisation of ‘social graph’ data where people can be clustered by their membership of various groups. I also wanted to explore the possibility of using such a tool as a way of authoring filters from raw evidence, using groups such as “have sent mail to”, “have accepted blog/wiki comment from”.

My gut reaction from this quick experiment is that the UI space is very easily overwhelmed. I used here just a quick hand-coded list of people, in fairly ad-hoc groups (cities, current-and-former workplaces etc.). Real data has more people and more groups. Still, I think there may be something worth investigating here. The venn tool I found was really designed for lifesci data, not people. If anyone knows of other possible software to try here, do let me know. To try this tool, simply run the Java app from the commandline, and use “File >> OpenList” on a file such as people.list.

One other thing I noticed in creating these ad-hoc groups (more or less ‘people tags’), is that representing what people have done felt intuitively as important as what they’re doing right now. For example, places people once lived or worked. This gives another axis of complexity that might need visualising. I’d like the underlying data to know who currently works/lives somewhere, versus “used to”, but in some views the two might appropriately be folded together. Tricky.