Lqraps! Reverse SPARQL

(update: files are now in svn; updated the link here)

I’ve just published a quick writeup (with running toy Ruby code) of a “reverse SPARQL” utility called lqraps, a tool for re-constructing RDF from tabular data.

The idea is that such a tool is passed a tab-separated (eventually, CSV etc.) file, such as might conventionally be loaded into Access, spreadsheets etc. The tool looks for a few lines of special annotation in “#”-comments at the top of the file, and uses these to (re)generate RDF. My simple implementation does this with text-munging of SPARQL construct notation into Turtle. As the name suggests, this could be done against the result tables we get from doing SPARQL queries; however it is more generally applicable to tabular data files.

A richer version might be able to output RDF/XML, but that would require use of libraries for SPARQL and RDF. This is of course close in spirit to D2RQ and SquirrelRDF and other relational/RDF mapping efforts, but the focus here is on something simple enough that we could include it in the top section of actual files.

The correct pronunciation btw is “el craps”. As in the card game

JQbus: social graph query with XMPP/SPARQL

Righto, it’s about time I wrote this one up. One of my last deeds at W3C before leaving at the end of 2005, was to begin the specification of an XMPP binding of the SPARQL querying protocol. For the acronym averse, a quick recap. XMPP is the name the IETF give to the Jabber messaging technology. And SPARQL is W3C’s RDF-based approach to querying mixed-up Web data. SPARQL defines a textual query language, an XML result-set format, and a JSON version for good measure. There is also a protocol for interacting with SPARQL databases; this defines an abstract interface, and a binding to HTTP. There is as-yet no official binding to XMPP/Jabber, and existing explorations are flawed. But I’ll argue here, the work is well worth completing.

jqbus diagram

So what do we have so far? Back in 2005, I was working in Java, Chris Schmidt in Python, and Steve Harris in Perl. Chris has a nice writeup of one of the original variants I’d proposed, which came out of my discussions with Peter St Andre. Chris also beat me in the race to have a working implementation, though I’ll attribute this to Python’s advantages over Java ;)

I won’t get bogged down in the protocol details here, except to note that Peter advised us to use IQ stanzas. That existing work has a few slight variants on the idea of sending a SPARQL query in one IQ packet, and returning all the results within another, and that this isn’t quite deployable as-is. When the result set is too big, we can run into practical (rather than spec-mandated) limits at the server-to-server layers. For example, Peter mentioned that jabber.org had a 65k packet limit. At SGFoo last week, someone suggested sending the results as an attachment instead; apparently this one of the uncountably many extension specs produced by the energetic Jabber community. The 2005 work was also somewhat partial, and didn’t work out the detail of having a full binding (eg. dealing with default graphs, named graphs etc).

That said, I think we’re onto something good. I’ll talk through the Java stuff I worked on, since I know it best. The code uses Ignite Online’s Smack API. I have published rough Java code that can communicate with instances of itself across Jabber. This was last updated July 2007, when I fixed it up to use more recent versions of Smack and Jena. I forget if the code to parse out query results from the responses was completed, but it does at least send SPARQL XML results back through the XMPP network.

sparql jabber interaction

sparql jabber interaction

So why is this interesting?

  • SPARQLing over XMPP can cut through firewalls/NAT and talk to data on the desktop
  • SPARQLing over XMPP happens in a social environment; queries are sent from and to Jabber addresses, while roster information is available which could be used for access control at various levels of granularity
  • XMPP is well suited for async interaction; stored queries could return results days or weeks later (eg. job search)
  • The DISO project is integrating some PHP XMPP code with WordPress; SparqlPress is doing same with SPARQL

Both SPARQL and XMPP have mechanisms for batched results, it isn’t clear which, if either, to use here.

XMPP also has some service discovery mechanisms; I hope we’ll wire up a way to inspect each party on a buddylist roster, to see who has SPARQL data available. I made a diagram of this last summer, but no code to go with it yet. There is also much work yet to do on access control systems for SPARQL data, eg. using oauth. It is far from clear how to integrate SPARQL-wide ideas on that with the specific possibilities offered within an XMPP binding. One idea is for SPARQL-defined FOAF groups to be used to manage buddylist rosters, “friend groups”.

Where are we with code? I have a longer page in FOAF SVN for the Jqbus Java stuff, and a variant on this writeup (includes better alt text for the images and more detail). The Java code is available for download. Chris’s Python code is still up on his site. I doubt any of these can currently talk to each other properly, but they show how to deal with XMPP in different languages, which is useful in itself. For Perl people, I’ve uploaded a copy of Steve’s code.

The Java stuff has a nice GUI for debugging, thanks to Smack. I just tried a copy. Basically I can run a client and a server instance from the same filetree, passing it my LiveJournal and Google Talk jabber account details. The screenshot here shows the client on the left having the XML-encoded SPARQL results, and the server on the right displaying the query that arrived. That’s about it really. Nothing else ought to be that different from normal SPARQL querying, except that it is being done in an infrastructure that is more socially-grounded and decentralised than the classic HTTP Web service model.

JQbus debug

Public Skype RDF presence service

OK I don’t know how this works, or how it happens (other Asemantics people might know more), but for those who didn’t know:

At http://mystatus.skype.com/danbrickley.xml there is a public RDF/XML document reflecting my status in Skype. There seems to be one for every active account name in the system.

Example markup:

<rdf:RDF>
<Status rdf:about=”urn:skype:skype.com:skypeweb/1.1″>
<statusCode rdf:datatype=”http://www.skype.com/go/skypeweb”>5</statusCode>
<presence xml:lang=”NUM”>5</presence>
<presence xml:lang=”en”>Do Not Disturb</presence>
<presence xml:lang=”fr”>Ne pas déranger</presence>
<presence xml:lang=”de”>Beschäftigt</presence>
<presence xml:lang=”ja”>取り込み中</presence>
<presence xml:lang=”zh-cn”>請勿打擾</presence>
<presence xml:lang=”zh-tw”>请勿打扰</presence>
<presence xml:lang=”pt”>Ocupado</presence>
<presence xml:lang=”pt-br”>Ocupado</presence>
<presence xml:lang=”it”>Occupato</presence>
<presence xml:lang=”es”>Ocupado</presence>
<presence xml:lang=”pl”>Nie przeszkadzać</presence>
<presence xml:lang=”se”>Stör ej</presence>
</Status>
</rdf:RDF>

In general (expressed in FOAF terms), for any :OnlineAccount that has an :accountServiceHomepage of http://www.skype.com/ you can take the :accountName – let’s call it ?a and plug it into the URI Template http://mystatus.skype.com/{a}.xml to get presence information in curiously cross-cultural RDF. In other words, one’s Skype status is part of the public record on the Web, well beyond the closed P2P network of Skype IM clients.

Thinking about RDF vocabulary design and document formats, the Skype representation is roughly akin to FOAF documents (such as those on LiveJournal currently) that don’t indicate explicitly that they’re a :PersonalProfileDocument, nor say who is the :primaryTopic or :maker of the document. Passed the RDF/XML on its own, you don’t have enough context to know what it is telling you. Whereas, if you know the URI, and the URI template rule shown above, you have a better idea of the meaning of the markup. Still, it’s useful. I suspect it might be time to add foaf:skypeID as an inverse-functional (ie. uniquely identifying) property to the FOAF spec, to avoid longwinded markup and make it easier to bridge profile data and up-to-the-minute status data. Thoughts?

Outbox FOAF crawl: using the Google SG API to bring context to email

OK here’s a quick app I hacked up on my laptop largely in the back of a car, ie. took ~1 hour to get from idea to dataset.

Idea is the usual FOAFy schtick about taking an evidential rather than purely claim-based approach to the ‘social graph’.

We take the list of people I’ve emailed, and we probe the social data Web to scoop up user profile etc descriptions of them. Whether this is useful will depend upon how much information we can scoop up, of course. Eventually oauth or similar technology should come into play, for accessing non-public information. For now, we’re limited to data shared in the public Web.

Here’s what I did:

  1. I took the ‘sent mail’ folder on my laptop (a Mozilla Thunderbird installation).
  2. I built a list of all the email addresses I had sent mail to (in Cc: or To: fields). Yes, I grepped.
    • working assumption: these are humans I’m interacting with (ie. that I “foaf:know”)
    • I could also have looked for X-FOAF: headers in mail from them, but this is not widely deployed
    • I threw away anything that matched some company-related mail hosts I wanted to exclude.
  3. For each email address, scrubbed of noise, I prepend mailto: and generate a SHA1 value.
  4. The current Google SG API can be consulted about hashed mailboxes using URIs in the unregistered sgn: space, for example my hashed gmail address (danbrickley@gmail.com) gives sgn://mboxsha1/?pk=6e80d02de4cb3376605a34976e31188bb16180d0 … which can be dropped into the Google parameter playground. I’m talking with Brad Fitzpatrick about how best these URIs might be represented without inventing a new URI scheme, btw.
  5. Google returns a pile of information, indicating connections between URLs, mailboxes, hashes etc., whether verified or merely claimed.
  6. I crudely ignore all that, and simply parse out anything beginning with http:// to give me some starting points to crawl.
  7. I feed these to an RDF FOAF/XFN crawler attached to my SparqlPress-augmented WordPress blog.

With no optimisation, polish or testing, here (below) is the list of documents this process gave me. I have not yet investigated their contents.

The goal here is to find out more about the people who are filling my mailbox, so that better UI (eg. auto-sorting into folders, photos, task-clustering etc) could help make email more contextualised and sanity-preserving.  The next step I think is to come up with a grouping and permissioning model for this crawled FOAF/RDF in the WordPress SPARQL store, ie. a way of asking queries across all social graph data in the store that came from this particular workflow. My store has other semi-personal data, such as the results of crawling the groups of people who have successfuly commented in blogs and wikis that I run. I’m also looking at a Venn-diagram UI for presenting these different groups in a way that allows other groups (eg. “anyone I send mail to OR who commented in my wiki”) to be defined in terms of these evidence-driven primitive sets. But that’s another story.

FOAF files located with the help of the Google API:

 http://www.advogato.org/person/benadida/foaf.rdf

http://www.w3c.es/Personal/Martin/foaf.rdf

http://www.iandickinson.me.uk/rdf/foaf.rdf

http://people.tribe.net/xe0s242

http://www.geocities.com/rmarkwhite/foaf.xml

http://www.l3s.de/~diederich/foaf.rdf

http://www.w3.org/People/maxf/foaf

http://revyu.com/people/mhausenblas/about/rdf

http://www.cs.vu.nl/~laroyo/foaf.rdf

http://wwwis.win.tue.nl/~nstash/foaf.rdf

http://www.nimbustier.net/nimbustier/foaf.rdf

http://www.cubicgarden.com/webdav/profile/foaf.rdf

http://www.ibiblio.org/hhalpin/foaf.rdf

http://www.kjetil.kjernsmo.net/linkedin-contacts.rdf

http://trust.mindswap.org/cgi-bin/FilmTrust/foaf.cgi?user\u003ddino

http://www.kjetil.kjernsmo.net/friends.rdf

http://www.wikier.org/foaf.rdf

http://www.tribe.net/FOAF/4916e791-9793-40c9-875b-21cb25733a32

http://www.zonk.net/ghard/foaf.rdf

http://www.zonk.net/ghard/foaf.rdf

http://www.kjetil.kjernsmo.net/friends.rdf

http://www.kjetil.kjernsmo.net/linkedin-contacts.rdf

http://www.gnowsis.com/leo/foaf.xml

http://people.tribe.net/c09622fd-c817-4935-a039-5cb5d0d6bed8

http://www.few.vu.nl/~wrvhage/foaf.rdf

http://www.zonk.net/ghard/foaf.rdf

http://www.lri.fr/~pietriga/foaf.rdf

http://www.advogato.org/person/Pike/foaf.rdf

http://www.deri.ie/fileadmin/scripts/foaf.php?id\u003d12

For a quick hack, I’m really pleased with this. The sent-mail folder I used isn’t even my only one. A rewrite of the script over IMAP (and proprietary mail host APIs) could be quite fun.

RDF in Ruby revisited

If you’re interested in collaborating on Ruby tools for RDF, please join the public-rdf-ruby@w3.org mailing list at W3C. Just send a note to public-rdf-ruby-request@w3.org with a subject line of “subscribe”.

Last weekend I had the fortune to run into Rich Kilmer at O’Reilly’s ‘Social graph Foo Camp‘ gathering. In addition to helping decorate my tent, Rich told me a bit more about the very impressive Semitar RDF and OWL work he’d done in Ruby, initially as part of the DAML programme. Matt Biddulph was also there, and we discussed again what it would take to include FOAF import into Dopplr. I’d be really happy to see that, both because of Matt’s long history of contributions to the Semantic Web scene, but also because Dopplr and FOAF share a common purpose. I’ve long said that a purpose of FOAF is to engineer more coincidences in the world, and Dopplr comes from the same perspective: to increase serendipity.

Now, the thing about engineering serendipity, is that it doesn’t work without good information flow. And the thing about good information flow, is that it benefits from data models that don’t assume the world around us comes nicely parceled into cleanly distinct domains. Borrowing from American Splendor – “ordinary life is pretty complex stuff“. No single Web site, service, document format or startup is enough; the trick comes when you hook things together in unexpected combinations. And that’s just what we did in the RDF world: created a model for mixed up, cross-domain data sharing.

Dopplr, Tripit, Fire Eagle and other travel and location services may know where you and others are. Social network sites (and there are more every day) knows something of who you are, and something of who you care about. And the big G in the sky knows something of the parts of this story that are on the public record.

Data will always be spread around. RDF is a handy model for ad-hoc data merging from multiple sources. But you can’t do much without an RDF parser and a few other tools. Minimally, an RDF/XML parser and a basic API for navigating the graph. There are many more things you could add. In my old RubyRdf work, I had in-memory and SQL-backed storage, with a Squish query interface to each. I had a donated RDF/XML parser (from Ruby4R) and a much-improved query engine (with support for optionals) from Damian Steer. But the system is code-rotted. I wrote it when I was learning Ruby beginning 7 years ago, and I think it is “one to throw away”. I’m really glad I took the time to declare that project “closed” so as to avoid discouraging others, but it is time to revisit Ruby and RDF again now.

Other tools have other offerings: Dave Beckett’s Redland system (written in C) ships with a Ruby wrapper. Dave’s tools probably have the best RDF parsing facilities around, are fast, but require native code. Rena is a pure Ruby library, which looked like a great start but doesn’t appear to have been developed further in recent years.

I could continue going through the list of libraries, but Paul Stadig has already done a great job of this recently (see also his conclusions, which make perfect sense). There has been a lot of creative work around RDF/RDFS and OWL in Ruby, and collectively we clearly have a lot of talent and code here. But collectively we also lack a finished product. It is a real shame when even an RDF-enthusiast like Matt Biddulph is not in a position to simply “gem install” enough RDF technology to get a simple job done. Let’s get this fixed. As I said above,

If you’re interested in collaborating on Ruby tools for RDF, please join the public-rdf-ruby@w3.org mailing list at W3C. Just send a note to public-rdf-ruby-request@w3.org with a subject line of “subscribe”.

In six months time, I’d like to see at least one solid, well rounded and modern RDF toolkit packaged as a Gem for the Ruby community. It should be able to parse RDF/XML flawlessy, and in addition to the usual unit tests, it should be wired up to the RDF Test Cases (see download) so we can all be assured it is robust. It should allow for a fast C parser such as Raptor to be used if available, falling back on pure Ruby otherwise. There should be a basic API that allows me to navigate an RDF graph of properties and values using clear, idiomatic Ruby. Where available, it should hook up to external stores of data, and include at least a SPARQL protocol client, eventually a full SPARQL implementation. It should allow multiple graphs to be super-imposed and disentangled. Some support for RDFS, OWL and rule languages would be a big plus. Support for other notations such as Turtle, RDFa, or XSLT-based GRDDL transforms would be useful, as would a plugin for microformat import. Transliterating Python code (such as the tiny Euler rule engine) should be considered. Divergence from existing APIs in Python (and Perl, Javascript, PHP etc) should be minimised, and carefully balanced against the pull of the Ruby way. And (thought I lack strong views here) it should be made available under a liberal opensource license that permits redistribution under GPL. It should also be as I18N and Unicode-friendly as is possible in Ruby these days.

I’m not saying that all RDF toolkits should be merged, or that collaboration is compulsory. But we are perilously fragmented right now, and collaboration can be fun. In six months time, people who simply want to use RDF from Ruby ought to be pleasantly suprised rather than frustrated when they take to the ‘net to see what’s out there. If it takes a year instead of six months, sure whatever. But not seven years! It would be great to see some movement again towards a common library…

How hard can it be?

Google Social Graph API, privacy and the public record

I’m digesting some of the reactions to Google’s recently announced Social Graph API. ReadWriteWeb ask whether this is a creeping privacy violation, and danah boyd has a thoughtful post raising concerns about whether the privileged tech elite have any right to experiment in this way with the online lives of those who are lack status, knowledge of these obscure technologies, and who may be amongst the more vulnerable users of the social Web.

While I tend to agree with Tim O’Reilly that privacy by obscurity is dead, I’m not of the “privacy is dead, get over it” school of thought. Tim argues,

The counter-argument is that all this data is available anyway, and that by making it more visible, we raise people’s awareness and ultimately their behavior. I’m in the latter camp. It’s a lot like the evolutionary value of pain. Search creates feedback loops that allow us to learn from and modify our behavior. A false sense of security helps bad actors more than tools that make information more visible.

There’s a danger here of technologists seeming to blame those we’re causing pain for. As danah says, “Think about whistle blowers, women or queer folk in repressive societies, journalists, etc.”. Not everyone knows their DTD from their TCP, or understand anything of how search engines, HTML or hyperlinks work. And many folk have more urgent things to focus on than learning such obscurities, let alone understanding the practical privacy, safety and reputation-related implications of their technology-mediated deeds.

Web technologists have responsibilities to the users of the Web, and while media education and literacy are important, those who are shaping and re-shaping the Web ought to be spending serious time on a daily basis struggling to come up with better ways of allowing humans to act and interact online without other parties snooping. The end of privacy by obscurity should not mean the death of privacy.

Privacy is not dead, and we will not get over it.

But it does need to be understood in the context of the public record. The reason I am enthusiastic about the Google work is that it shines a big bright light on the things people are currently putting into the public record. And it does so in a way that should allow people to build better online environments for those who do want their public actions visible, while providing immediate – and sometimes painful – feedback to those who have over-exposed themselves in the Web, and wish to backpedal.

I hope Google can put a user support mechanism on this. I know from our experience in the FOAF community, even with small scale and obscure aggregators, people will find themselves and demand to be “taken down”. While any particular aggregator can remove or hide such data, unless the data is tracked back to its source, it’ll crop up elsewhere in the Web.

I think the argument that FOAF and XFN are particularly special here is a big mistake. Web technologies used correctly (posh – “plain old semantic html” in microformats-speak) already facilitate such techniques. And Google is far from the only search engine in existence. Short of obfuscating all text inside images, personal data from these sites is readily harvestable.

ReadWriteWeb comment:

None the less, apparently the absence of XFN/FOAF data in your social network is no assurance that it won’t be pulled into the new Google API, either. The Google API page says “we currently index the public Web for XHTML Friends Network (XFN), Friend of a Friend (FOAF) markup and other publicly declared connections.” In other words, it’s not opt-in by even publishers – they aren’t required to make their information available in marked-up code.

The Web itself is built from marked-up code, and this is a thing of huge benefit to humanity. Both microformats and the Semantic Web community share the perspective that the Web’s core technologies (HTML, XHTML, XML, URIs) are properly consumed both by machines and by humans, and that any efforts to create documents that are usable only by (certain fortunate) humans is anti-social and discriminatory.

The Web Accessibility movement have worked incredibly hard over many years to encourage Web designers to create well marked up pages, where the meaning of the content is as mechanically evident as possible. The more evident the meaning of a document, the easier it is to repurpose it or present it through alternate means. This goal of device-independent, well marked up Web content is one that unites the accessibility, Mobile Web, Web 2.0, microformat and Semantic Web efforts. Perhaps the most obvious case is for blind and partially sighted users, but good markup can also benefit those with the inability to use a mouse or keyboard. Beyond accessibility, many millions of Web users (many poor, and in poor countries) will have access to the Web only via mobile phones. My former employer W3C has just published a draft document, “Experiences Shared by People with Disabilities and by People Using Mobile Devices”. Last month in Bangalore, W3C held a Workshop on the Mobile Web in Developing Countries (see executive summary).

I read both Tim’s post, and danah’s post, and I agree with large parts of what they’re both saying. But not quite with either of them, so all I can think to do is spell out some of my perhaps previously unarticulated assumptions.

  • There is no huge difference in principle between “normal” HTML Web pages and XFN or FOAF. Textual markup is what the Web is built from.
  • FOAF and XFN take some of the guesswork out of interpreting markup. But other technologies (javascript, perl, XSLT/GRDDL) can also transform vague markup into more machine-friendly markup. FOAF/XFN simply make this process easier and less heuristic, less error prone.
  • Google was not the first search engine, it is not the only search engine, and it will not be the last search engine. To obsess on Google’s behaviour here is to mistake Google for the Web.
  • Deeds that are on the public record in the Web may come to light months or years later; Google’s opening up of the (already public, but fragmented) Usenet historical record is a good example here.
  • Arguing against good markup practice on the Web (accessible, device independent markup) is something that may hurt underprivileged users (with disabilities, or limited access via mobile, high bandwidth costs etc).
  • Good markup allows content to be automatically summarised and re-presented to suit a variety of means of interaction and navigation (eg. voice browsers, screen readers, small screens, non-mouse navigation etc).
  • Good markup also makes it possible for search engines, crawlers and aggregators to offer richer services.

The difference between Google crawling FOAF/XFN from LiveJournal, versus extracting similar information via custom scripts from MySpace, is interesting and important solely to geeks. Mainstream users have no idea of such distinctions. When LiveJournal originally launched their FOAF files in 2004, the rule they followed was a pretty sensible one: if the information was there in the HTML pages, they’d also expose it in FOAF.

We need to be careful of taking a ruthless “you can’t make an omelete without breaking eggs” line here. Whatever we do, people will suffer. If the Web is made inaccessible, with information hidden inside image files or otherwise obfuscated, we exclude a huge constituency of users. If we shine a light on the public record, as Google have done, we’ll embarass, expose and even potentially risk harm to the people described by these interlinked documents. And if we stick our head in the sand and pretend that these folk aren’t exposed, I predict this will come back to bite us in the butt in a few months or years, since all that data is out there, being crawled, indexed and analysed by parties other than Google. Parties with less to lose, and more to gain.

So what to do? I think several activities need to happen in parallel:

  • Best practice codes for those who expose, and those who aggregate, social Web data
  • Improved media literacy education for those who are unwittingly exposing too much of themselves online
  • Technology development around decentralised, non-public record communication and community tools (eg. via Jabber/XMPP)

Any search engine at all, today, is capable of supporting the following bit of mischief:

Take some starting point a collection of user profiles on a public site. Extract all the usernames. Find the ones that appear in the Web less than say 10,000 times, and on other sites. Assume these are unique userIDs and crawl the pages they appear in, do some heuristic name matching, … and you’ll have a pile of smushed identities, perhaps linking professional and dating sites, or drunken college photos to respectable-new-life. No FOAF needed.

The answer I think isn’t to beat up on the aggregators, it’s to improve the Web experience such that people can have real privacy when they need it, rather than the misleading illusion of privacy. This isn’t going to be easy, but I don’t see a credible alternative.

Flickr (Yahoo) upcoming support for OpenID

According to Simon Willison, Flickr look set to support OpenID by allowing your photostream URL (eg. for me, http://www.flickr.com/photos/danbri/) to serve as an OpenID, ie. something you can type wherever you see “login using OpenID” and be bounced to Flickr/Yahoo to provide credentials instead of remembering yet another password. This is rather good news.

For the portability-minded, it’s worth remembering that OpenID lets you put markup in your own Web page to devolve to such services. So my main OpenID is “danbri.org” , which is a document I control, on a domain that I own. In the HTML header I have the following markup:



<link rel="meta" type="application/rdf+xml" title="FOAF" href="http://danbri.org/foaf.rdf" />
<link rel="openid.server" href="http://www.livejournal.com/openid/server.bml" />
<link rel="openid.delegate" href="http://danbri.livejournal.com/" />

…which is enough to defer the details of being an OpenID provider to LiveJournal (thanks, LiveJournal!). Flickr are about to join the group of sites you can use in this way, it seems.

As an aside, this means that the security of our own websites becomes yet more important. Last summer, DreamHost (my webhosting provider) were compromised, and my own homepage was briefly decorated with viagra spam. Fortunately they didn’t touch seem to touch the OpenID markup, but you can see the risk. That’s the price of portability here. As Simon points out, we’ll probably all have several active OpenIDs, and there’s no need to host your own, just as there’s no need for people who want to publish online to buy and host their own domains or HTML sites.

The Flickr implementation, coupled with their existing API, means we could all offer things like “log into my personal site for family (or friends)” and defer buddylist – and FOAF – management to the well-designed Flickr site, assuming all your friends or family have Flickr accounts. Implementing this in a way that works with other providers (eg. LJ) is left as an excercise for the reader ;)

Ruby client for querying SPARQL REST services

I’ve started a Ruby conversion of Ivan Herman’s Python SPARQL client, itself inspired by Lee Feigenbaum’s Javascript library. These are tools which simply transmit a SPARQL query across the ‘net to a SPARQL-protocol database endpoint, and handle the unpacking of the results. These queries can result in yes/no responses, variable-to-value bindings (rather like in SQL), or in chunks of RDF data. The default resultset notation is a simple XML format; JSON results are also widely available.

All I’ve done so far, is to sit down with the core Python file from Ivan’s package, and slog through converting it brainlessly into rather similar Ruby. Take out “self” and “:” from method definitions, write “nil” instead of “None”, write “end” at the end of each method and class, use Ruby’s iteration idioms, and you’re most of the way there. Of course the fiddly detail is where we use external libraries: for URIs, network access, JSON and XML parsing. I’ve cobbled something together which could be the basis for reconstructing all the original functionality in Ivan’s code. Currently, it’s more proof of concept, but enough to be worth posting.

Files are in SVN and browsable online; there’s also a bundled up download for the curious, brave or helpful. The test script shows all we have at the moment: ability to choose XML or JSON results, and access result set binding rows. Other result formats (notably RDF; there’s no RDF/XML parser currently) aren’t handled, and various bits of the code conversion are incomplete. I’d be super happy if someone came along and helped finish this!

Update:  Ruby’s REXML parser is now clumsily wired in; you get get a REXML Document object (see xml.com writeup) as a way of navigating the resultset.

Commandline PHP for loading RDF URLs into ARC (and Twinkle for query UI)


#!/usr/bin/php
<?php
if ($argc != 2 || in_array($argv[1], array('--help', '-help', '-h', '-?'))) {
?>
This is a command line PHP script with one option: URL of RDF document to load
<?php
} else {

$supersecret = "123rememberme"; #Security analysts recommend using data of birth + social security ID here
# *** be careful with real msql passwords ***

include_once("../arc/ARC2.php");
$config = array( 'db_host' => 'localhost', 'db_name' => 'sg1', 'db_user' => 'sparql',
'db_pwd' => $supersecret, 'store_name' => 'crawl', );
$store = ARC2::getStore($config);
if (!$store->isSetUp()) { $store->setUp(); }
$profile = $argv[1];
echo "Loading data from " . $profile ;
$store->query('DELETE FROM <'.$profile.'>');
$store->query('LOAD <'.$profile.'>');
}
?>

FWIW, this is what I’m using to (re)load data into an ARC store from the commandline. I’ll try wiring up my old RDF crawler to this when I get time. Each loaded source is stored as a named graph, with the URI it is loaded from being the named graph URI. ARC just needs the path to the unpacked PHP libraries, and connection details for a MySQL database, and comes with a handy SPARQL endpoint script too, which I’ve been testing with Twinkle.

My public sandbox data is currently loaded up as follows. No promises it’ll stay there, but anyway, adding the following to Twinkle 2.0’s config file section for SPARQL endpoints works for me. The endpoint also directly offers a basic Web interface too, with HTML, XML, JSON etc.


<http://sandbox.foaf-project.org/2008/foaf/ggg.php>
a sources:Endpoint; rdfs:label "FOAF Social Graph Agggregator sandbox".