coding


Matt Kane resurfaced on Bristol’s underscore mailing list  with this intriguing snippet, after some travels around the middle-east: ” … discovered N95s (not mine) cannot be taken into Syria”.

I asked for the backstory, which goes like this:

Quite a palaver. Got the train from Istanbul to Syria (amazing trip!). At the border they didn’t search the bags of “westerners” but asked us all to show our phones and cameras. They glanced at them all quickly, checking the brand (”Nikon, ok. SonyEricsson, ok”). One guy had an N95 and they led him off the train. His sister informed us that they’d said it wasn’t allowed in Syria, and that if she knew her brother he’d not give it up without a fight. Despite being on contract, he argued with them for an hour and a half, even calling the embassies in Damascus and Ankara. In the end he gave it up, with a promise that they’d send it on to the airport from where he was leaving. A few days later we’re chatting with a barman and spot his phone - an N95, and yes, he got it in Syria! A few days after that we found out the full story from our hotel owner in Damascus. Apparently the CIA gave a load of bugged N95s to high-ranking Kurdish officials in Iraq, many of which were then smuggled into Syria and given as gifts to various shady characters. After the Hezbollah guy was assassinated in Damascus a few months ago, the Syrians set about trying to root out spies, which led to this ban on bringing N95s into the country. Apparently.

This is the first I’ve heard of it, but searching throws up a few references to rigged N95s as “spy phones”.

Somewhat-unrelated aside: I don’t believe the relevant functionality is exposed in the N95’s widget APIs yet. I had trouble making it vibrate, let alone self-destruct after this message. But at least widget/gadget/app security is getting some attention lately. It can’t be too long before “spy widgets” on your phone become a real concern, particularly since the exposure of phone APIs to 3rd party apps is such a creative combination. I should be clear that AFAIK, Nokia’s N95 widget platform is free of such vulnerabilities currently, and any “spy phone” mischief so far has been achieved through other kinds of interference. But it does make me glad to see a Widgets 1.0: Digital Signature spec moving along at W3C…

I’ve just published a quick writeup (with running toy Ruby code) of a “reverse SPARQL” utility called lqraps, a tool for re-constructing RDF from tabular data.

The idea is that such a tool is passed a tab-separated (eventually, CSV etc.) file, such as might conventionally be loaded into Access, spreadsheets etc. The tool looks for a few lines of special annotation in “#”-comments at the top of the file, and uses these to (re)generate RDF. My simple implementation does this with text-munging of SPARQL construct notation into Turtle. As the name suggests, this could be done against the result tables we get from doing SPARQL queries; however it is more generally applicable to tabular data files.

A richer version might be able to output RDF/XML, but that would require use of libraries for SPARQL and RDF. This is of course close in spirit to D2RQ and SquirrelRDF and other relational/RDF mapping efforts, but the focus here is on something simple enough that we could include it in the top section of actual files.

The correct pronunciation btw is “el craps”. As in the card game

I’ve been using the SPARQL query language to access a very ad-hoc collection of personal and social graph data, and thanks to Bengee’s ARC system this can sit inside my otherwise ordinary Wordpress installation. At the moment, everything in there is public, but lately I’ve been discussing oauth with a few folk as a way of mediating access to selected subsets of that data. Which means the data store will need some way of categorising the dozens of misc data source URIs. There are a few ways to do this; here I try a slightly non-obvious approach.

Every SPARQL store can have many graphs inside, named by URI, plus optionally a default graph. The way I manage my store is a kind of structured chaos, with files crawled from links in my own data and my friends. One idea for indicating the structure of this chaos is to keep “table of contents” metadata in the default graph. For example, I might load up <http://danbri.org/foaf.rdf> into a SPARQL graph named with that URI. And I might load up <http://danbri.org/evilfoaf.rdf> into another graph, also using the retrieval URI to identify the data within my SPARQL store. Now, two points to make here: firstly, that the SPARQL spec does not mandate that we do things this way. An application that for example wanted to keep historical versions of a FOAF or RSS of schema document, could keep the triples from each version in a different named graph. Perhaps these might be named with UUIDs, for example. The second point, is that there are many different “meta” claims we want to store about our datasets. And that mixing them all into the store-wide “default graph” could be rather limiting, especially if we mightn’t want to unconditionally believe even those claims.

In the example above for example, I have data from running a PGP check against foaf.rdf (which passed) and evilfoaf.rdf (which doesn’t pass a check against my pgp identity). Now where do I store the results of this PGP checking? Perhaps the default graph, but maybe that’s messy. The idea I’m playing with here is that UUIDs are reasonable identifiers, and that perhaps we’ll find ourselves sharing common UUIDs across stores.

Go back to my sent-mail FOAF crawl example from yesterday. How far did I get? Well the end result was a list of URLs which I looped through, and loaded into my big chaotic SPARQL store. If I run the following query, I get a list of all the data graphs loaded:

SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o . } }

This reveals 54 URLs, basically everything I’ve loaded into ARC in the last month or so. Only 30 of these came from yesterday’s hack, which used Google’s new Social Graph API to allow me to map from hashed mailbox IDs to crawlable data URIs. So today’s game is to help me disentangle the 30 from the 54, and superimpose them on each other, but not always mixed with every other bit of information in the store. In other words, I’m looking for a flexible, query-based way of defining views into my personal data chaos.

So, what I tried. I took the result of yesterday’s hack, a file of data URIs called urls.txt. Then I modified my commandline dataloader script (yeah yeah this should be part of wordpress). My default data loader simply takes each URI, gets the data, and shoves it into the store under a graph name which was the URI used for retrieval. What I did today is, additionally, make a “table of contents” overview graph. To avoid worrying about names, I generated a UUID and used that. So there is a graph called <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> which contains simple asserts of the form:

<http://www.advogato.org/person/benadida/foaf.rdf> a <http://xmlns.com/foaf/0.1/Document> .
<http://www.w3c.es/Personal/Martin/foaf.rdf> a <http://xmlns.com/foaf/0.1/Document> .
<http://www.iandickinson.me.uk/rdf/foaf.rdf> a <http://xmlns.com/foaf/0.1/Document> . # etc

…for each of the 30 files my crawler loaded into the store.

This lets us use <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> as an indirection point for information related to this little mailbox crawler hack. I don’t have to “pollute” the single default graph with this data. And because the uuid: was previously meaningless, it is something we might decided makes sense to use across data visibility boundaries, ie. you might use the same UUID in your own SPARQL store, so we can share queries and app logic.

Here’s a simple query. It says, “ask the mailbox crawler table of contents graph (which we call uuid:320d9etc…) for all things it knows about that are a Document”. Then it says “ask each of those documents, for everything in it”. And then the SELECT clause returns all the property URIs. This gives a first level overview of what’s in the Web of data files found by the crawl. Query was:

PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?p WHERE {
GRAPH <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> { ?crawled a :Document . }
GRAPH ?crawled { ?s ?p ?o . }
}

ORDER BY ?p

I’ll just show the first page full of properties it found; for the rest see link to the complete set. Since W3C’s official SPARQL doesn’t have aggregates, we’d need to write application code (or use something like the SPARQL+ extensions) to get property usage counts. Here are some of the properties that were found in the data:

  http://kota.s12.xrea.com/vocab/uranaibloodtype
http://purl.org/dc/elements/1.1/creator
http://purl.org/dc/elements/1.1/description
http://purl.org/dc/elements/1.1/format
http://purl.org/dc/elements/1.1/title
http://purl.org/dc/terms/created
http://purl.org/dc/terms/modifed
http://purl.org/dc/terms/modified
http://purl.org/net/inkel/rdf/schemas/lang/1.1#masters
http://purl.org/net/inkel/rdf/schemas/lang/1.1#reads
http://purl.org/net/inkel/rdf/schemas/lang/1.1/masters
http://purl.org/net/inkel/rdf/schemas/lang/1.1/reads
http://purl.org/net/inkel/rdf/schemas/lang/1.1/speaks
http://purl.org/net/schemas/quaffing/drankBeerWith
http://purl.org/net/schemas/quaffing/drankLagerWith
http://purl.org/net/vocab/2004/07/visit#caregion
http://purl.org/net/vocab/2004/07/visit#country
http://purl.org/net/vocab/2004/07/visit#usstate
http://purl.org/ontology/mo/hasTrack
http://purl.org/ontology/mo/myspace
http://purl.org/ontology/mo/performed

So my little corner of the Web includes properties that extend FOAF documents to include blood types, countries that have been visited, language skills that people have, music information, and even drinking habits. But remember that this comes from my corner of the Web - people I’ve corresponded with - and probably isn’t indicative of the wider network. But that’s what grassroots decentralised data is all about. The folk who published this data didn’t need to ask permission of any committe to do so, they just mixed in what they wanted to say, alongside terms more widely used like foaf:Person, foaf:name. This is the way it should be: ask forgiveness, not permission, from the language lawyers and standardistas.

Ok, so let’s dig deeper into the messy data I crawled up from my sentmail contacts?

Here’s one that finds some photos, either using FOAF’s :img or :depiction properties:

PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT * WHERE {
GRAPH <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> { ?crawled a :Document . }
GRAPH ?crawled {
{ ?x :depiction ?y1 } UNION { ?x :img ?y2 } .
}
}

Here’s another that asks the crawl results for names and homepages it found:

PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT * WHERE { GRAPH <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> { ?crawled a :Document . }
GRAPH ?crawled { { [ :name ?n; :homepage ?h ] } }
}

To recap, the key point here is that social data in a SPARQL store will be rather chaotic. Information will often be missing, and often be extended. It will come from a variety of parties, some of whom you trust, some of whom you don’t know, and a few of whom will be actively malicious. Later down the line, subsets of the data will need different permissioning: if I export a family tree from ancestry.co.uk, I don’t want everyone to be able to do a SELECT for mother’s maiden name and my date of birth.

So what I suggest here, is that we can use UUID-named graphs as an organizing structure within an otherwise chaotic SPARQL environment. The demo here shows how one such graph can be used as a “table of contents” for other graphs associated with a particular app — in this case, the Google-mediated sentmail crawling app I made yesterday. Other named views might be: those data files from colleagues, those files that are plausibly PGP-signed, those that contain data structured according to some particular application need (eg. calendar, addressbook, photos, …).

OK here’s a quick app I hacked up on my laptop largely in the back of a car, ie. took ~1 hour to get from idea to dataset.

Idea is the usual FOAFy schtick about taking an evidential rather than purely claim-based approach to the ’social graph’.

We take the list of people I’ve emailed, and we probe the social data Web to scoop up user profile etc descriptions of them. Whether this is useful will depend upon how much information we can scoop up, of course. Eventually oauth or similar technology should come into play, for accessing non-public information. For now, we’re limited to data shared in the public Web.

Here’s what I did:

  1. I took the ’sent mail’ folder on my laptop (a Mozilla Thunderbird installation).
  2. I built a list of all the email addresses I had sent mail to (in Cc: or To: fields). Yes, I grepped.
    • working assumption: these are humans I’m interacting with (ie. that I “foaf:know”)
    • I could also have looked for X-FOAF: headers in mail from them, but this is not widely deployed
    • I threw away anything that matched some company-related mail hosts I wanted to exclude.
  3. For each email address, scrubbed of noise, I prepend mailto: and generate a SHA1 value.
  4. The current Google SG API can be consulted about hashed mailboxes using URIs in the unregistered sgn: space, for example my hashed gmail address (danbrickley@gmail.com) gives sgn://mboxsha1/?pk=6e80d02de4cb3376605a34976e31188bb16180d0 … which can be dropped into the Google parameter playground. I’m talking with Brad Fitzpatrick about how best these URIs might be represented without inventing a new URI scheme, btw.
  5. Google returns a pile of information, indicating connections between URLs, mailboxes, hashes etc., whether verified or merely claimed.
  6. I crudely ignore all that, and simply parse out anything beginning with http:// to give me some starting points to crawl.
  7. I feed these to an RDF FOAF/XFN crawler attached to my SparqlPress-augmented Wordpress blog.

With no optimisation, polish or testing, here (below) is the list of documents this process gave me. I have not yet investigated their contents.

The goal here is to find out more about the people who are filling my mailbox, so that better UI (eg. auto-sorting into folders, photos, task-clustering etc) could help make email more contextualised and sanity-preserving.  The next step I think is to come up with a grouping and permissioning model for this crawled FOAF/RDF in the Wordpress SPARQL store, ie. a way of asking queries across all social graph data in the store that came from this particular workflow. My store has other semi-personal data, such as the results of crawling the groups of people who have successfuly commented in blogs and wikis that I run. I’m also looking at a Venn-diagram UI for presenting these different groups in a way that allows other groups (eg. “anyone I send mail to OR who commented in my wiki”) to be defined in terms of these evidence-driven primitive sets. But that’s another story.

FOAF files located with the help of the Google API:

 http://www.advogato.org/person/benadida/foaf.rdf
http://www.w3c.es/Personal/Martin/foaf.rdf
http://www.iandickinson.me.uk/rdf/foaf.rdf
http://people.tribe.net/xe0s242
http://www.geocities.com/rmarkwhite/foaf.xml
http://www.l3s.de/~diederich/foaf.rdf
http://www.w3.org/People/maxf/foaf
http://revyu.com/people/mhausenblas/about/rdf
http://www.cs.vu.nl/~laroyo/foaf.rdf
http://wwwis.win.tue.nl/~nstash/foaf.rdf
http://www.nimbustier.net/nimbustier/foaf.rdf
http://www.cubicgarden.com/webdav/profile/foaf.rdf
http://www.ibiblio.org/hhalpin/foaf.rdf
http://www.kjetil.kjernsmo.net/linkedin-contacts.rdf
http://trust.mindswap.org/cgi-bin/FilmTrust/foaf.cgi?user\u003ddino
http://www.kjetil.kjernsmo.net/friends.rdf
http://www.wikier.org/foaf.rdf
http://www.tribe.net/FOAF/4916e791-9793-40c9-875b-21cb25733a32
http://www.zonk.net/ghard/foaf.rdf
http://www.zonk.net/ghard/foaf.rdf
http://www.kjetil.kjernsmo.net/friends.rdf
http://www.kjetil.kjernsmo.net/linkedin-contacts.rdf
http://www.gnowsis.com/leo/foaf.xml
http://people.tribe.net/c09622fd-c817-4935-a039-5cb5d0d6bed8
http://www.few.vu.nl/~wrvhage/foaf.rdf
http://www.zonk.net/ghard/foaf.rdf
http://www.lri.fr/~pietriga/foaf.rdf
http://www.advogato.org/person/Pike/foaf.rdf
http://www.deri.ie/fileadmin/scripts/foaf.php?id\u003d12

For a quick hack, I’m really pleased with this. The sent-mail folder I used isn’t even my only one. A rewrite of the script over IMAP (and proprietary mail host APIs) could be quite fun.

If you’re interested in collaborating on Ruby tools for RDF, please join the public-rdf-ruby@w3.org mailing list at W3C. Just send a note to public-rdf-ruby-request@w3.org with a subject line of “subscribe”.

Last weekend I had the fortune to run into Rich Kilmer at O’Reilly’s ‘Social graph Foo Camp‘ gathering. In addition to helping decorate my tent, Rich told me a bit more about the very impressive Semitar RDF and OWL work he’d done in Ruby, initially as part of the DAML programme. Matt Biddulph was also there, and we discussed again what it would take to include FOAF import into Dopplr. I’d be really happy to see that, both because of Matt’s long history of contributions to the Semantic Web scene, but also because Dopplr and FOAF share a common purpose. I’ve long said that a purpose of FOAF is to engineer more coincidences in the world, and Dopplr comes from the same perspective: to increase serendipity.

Now, the thing about engineering serendipity, is that it doesn’t work without good information flow. And the thing about good information flow, is that it benefits from data models that don’t assume the world around us comes nicely parceled into cleanly distinct domains. Borrowing from American Splendor - “ordinary life is pretty complex stuff“. No single Web site, service, document format or startup is enough; the trick comes when you hook things together in unexpected combinations. And that’s just what we did in the RDF world: created a model for mixed up, cross-domain data sharing.

Dopplr, Tripit, Fire Eagle and other travel and location services may know where you and others are. Social network sites (and there are more every day) knows something of who you are, and something of who you care about. And the big G in the sky knows something of the parts of this story that are on the public record.

Data will always be spread around. RDF is a handy model for ad-hoc data merging from multiple sources. But you can’t do much without an RDF parser and a few other tools. Minimally, an RDF/XML parser and a basic API for navigating the graph. There are many more things you could add. In my old RubyRdf work, I had in-memory and SQL-backed storage, with a Squish query interface to each. I had a donated RDF/XML parser (from Ruby4R) and a much-improved query engine (with support for optionals) from Damian Steer. But the system is code-rotted. I wrote it when I was learning Ruby beginning 7 years ago, and I think it is “one to throw away”. I’m really glad I took the time to declare that project “closed” so as to avoid discouraging others, but it is time to revisit Ruby and RDF again now.

Other tools have other offerings: Dave Beckett’s Redland system (written in C) ships with a Ruby wrapper. Dave’s tools probably have the best RDF parsing facilities around, are fast, but require native code. Rena is a pure Ruby library, which looked like a great start but doesn’t appear to have been developed further in recent years.

I could continue going through the list of libraries, but Paul Stadig has already done a great job of this recently (see also his conclusions, which make perfect sense). There has been a lot of creative work around RDF/RDFS and OWL in Ruby, and collectively we clearly have a lot of talent and code here. But collectively we also lack a finished product. It is a real shame when even an RDF-enthusiast like Matt Biddulph is not in a position to simply “gem install” enough RDF technology to get a simple job done. Let’s get this fixed. As I said above,

If you’re interested in collaborating on Ruby tools for RDF, please join the public-rdf-ruby@w3.org mailing list at W3C. Just send a note to public-rdf-ruby-request@w3.org with a subject line of “subscribe”.

In six months time, I’d like to see at least one solid, well rounded and modern RDF toolkit packaged as a Gem for the Ruby community. It should be able to parse RDF/XML flawlessy, and in addition to the usual unit tests, it should be wired up to the RDF Test Cases (see download) so we can all be assured it is robust. It should allow for a fast C parser such as Raptor to be used if available, falling back on pure Ruby otherwise. There should be a basic API that allows me to navigate an RDF graph of properties and values using clear, idiomatic Ruby. Where available, it should hook up to external stores of data, and include at least a SPARQL protocol client, eventually a full SPARQL implementation. It should allow multiple graphs to be super-imposed and disentangled. Some support for RDFS, OWL and rule languages would be a big plus. Support for other notations such as Turtle, RDFa, or XSLT-based GRDDL transforms would be useful, as would a plugin for microformat import. Transliterating Python code (such as the tiny Euler rule engine) should be considered. Divergence from existing APIs in Python (and Perl, Javascript, PHP etc) should be minimised, and carefully balanced against the pull of the Ruby way. And (thought I lack strong views here) it should be made available under a liberal opensource license that permits redistribution under GPL. It should also be as I18N and Unicode-friendly as is possible in Ruby these days.

I’m not saying that all RDF toolkits should be merged, or that collaboration is compulsory. But we are perilously fragmented right now, and collaboration can be fun. In six months time, people who simply want to use RDF from Ruby ought to be pleasantly suprised rather than frustrated when they take to the ‘net to see what’s out there. If it takes a year instead of six months, sure whatever. But not seven years! It would be great to see some movement again towards a common library…

How hard can it be?

Is this crazy or useful? Am not sure yet.

This example uses FOAF vocabulary for groups and openid. So the basic structure here is that Agents (including persons) can have an :openid and can be a :member of a :Group.

From an openid-augmented Wordpress, we get a list of all the openids my blog knows about. From an openid-augmented MediaWiki, we get a list of all the openids that contribute to the FOAF project wiki. I dumped each into a basic RDF file (not currently an automated process). But the point here is to explore enumerated groups using queries.

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns=”http://xmlns.com/foaf/0.1/”>
<Group rdf:about=’#both’>
<!– enumerated membership –>
<member><Agent><openid rdf:resource=’http://danbri.org/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://tommorris.org/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://kidehen.idehen.net/dataspace/person/kidehen’/></Agent></member>
<member><Agent><openid rdf:resource=’http://www.wasab.dk/morten/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://kronkltd.net/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://www.kanzaki.com/’/></Agent></member>

<!– rule-based membership –>

<constructor><![CDATA[
PREFIX : <http://xmlns.com/foaf/0.1/>
CONSTRUCT {
<http://danbri.org/yasns/danbri/both.rdf#thegroup> a :Group; :member [ a :Agent; :openid ?id ]
}
WHERE {
GRAPH <http://wiki.foaf-project.org/_users.rdf> { [ a :Group; :member [ a :Agent; :openid ?id ]. ] }
GRAPH <http://danbri.org/yasns/danbri/_group.rdf> { [ a :Group; :member [ a :Agent; :openid ?id ]. ] }
}
]]></constructor>
</Group>
</rdf:RDF>

This RDF description does it both ways. It enumerates (for simple clients) a list of members of a group whose members are those individuals that are both commentators on my blog, and contributors to the FOAF wiki. At least, to the extent they’re detectable via common use of OpenID URIs. But the RDF group description also embeds a SPARQL query, the kind which generates RDF rather than an SQL-like resultset. The RDF essentially regenerates the enumerated list, assuming the query is run against an RDF dataset with the data graphs appropriately populated.

Now I sorta like this, and I sorta don’t. It may be incredibly powerful, or it may be a bit to clever for its own good.

Certainly there’s scope overlap with the W3C RIF rules work, and with the capabilities of OWL. FOAF has long contained an experimental method for using OWL to do something similar, but it hasn’t found traction. The motivation I have here for trying SPARQL here is that it has built-in machinery for talking about the provenance of data; so I could write a group description this way that says “members are anyone listed as a colleague in http://myworkplace.example.com/stafflist.rdf”. Or I could mix in arbitrary descriptive vocabularies; family tree stuff, XFN, language abilities (speaks-reads-writes) etc.

Where I think this could fall down is in the complexity of the workflow. The queries need executing against some SPARQL installation with a configured dataset, and the query lists URIs of data graphs. But I doubt database admins will want to randomly load any/every RDF file mentioned in these shared queries. Perhaps something like SparqlPress, attached to one’s weblog, and social filters to load only files in queries eg. from friends? Also, authoring these kinds of query isn’t something non-geek users are going to do often, and the sorts of queries that will work will depend of course on the data actually available. Sure I could write a query based on matching the openids of former colleagues, but the group will be empty unless the data listing people as former colleagues is actually out there and in the Web, and written in the terms anticipated by the query.

On the other hand, this mechanism does appeal, and could go way beyond FOAF group definitions. We could see a model where people post data in the Web but also post queries, eg. revisiting the old work Libby and I explored around RSS query. On the other other hand, who wants to make their Web queries public? All that said, the same goes for the data being queried. And since this technique embeds queries inside ordinary RDF data, however we deal with the data visibility issue for RDF/FOAF should also work for the query stuff. Perhaps. Can’t blame me for trying…
I realise this isn’t the clearest of explanations. Let’s try again:

RDF is normally for publishing collections of simple claims about the world. This is an experiment in embedding data-generating-queries amongst these claims, where the query is configured to output more RDF claims (aka statements, triples etc), but only when executed against some appropriate body of RDF data. Since the query is written in SPARQL, it allows the data-generation rules to mention interesting things, such as properties of the source of the data being queried.

This particular experiment is couched in terms of FOAF’s “Group” construct, but the technique is entirely general. The example above defines a group of agents called the “both” group, by saying that an Agent is in that group if it its OpenID URI is listed in each of two RDF documents specified, ie. both a commentator on my blog, and a contributor to the FOAF Wiki. Other examples could be “(fe)male employees” or “family members sharing a blood type” or in fact, any descriptive pattern that can match against the data to hand and be expressed in SPARQL.

All of us are dumber than some of us.

Various folk are concerned that OpenID has more provider apps than consumer apps, so here is my little website idea for an OpenID-facilitated collaborative thingy. I’ve loved the Dada Engine for years. The Dada Engine is the clever-clogs backend for grammar-driven nonsense generators such as the wonderful Postmodernism Generator.

Since there are many more people capable of writing funny prose to configure this machine, than there are who can be bothered to do the webhosting sysadmin, I reckon it’d be worthwhile to make a general-purpose hosted version whereby users could create new content through a Web interface. And since everyone would forget their passwords, this seems a cute little project to get my hands dirty with OpenID. To date, all I’ve done server-side with OpenID is install patches to MediaWiki and Wordpress. That went pretty smoothly. My new hacking expedition, however, hit a snag already: the PHP libraries on my EC2 sandbox server didn’t like authenticating against LiveJournal. I’m new to PHP so when something stops working I panic and whine in IRC instead of getting to the bottom of the problem.

Back to the app idea, the Dada engine basically eats little config files that define sentence structures, and spews nonsense. Here’s a NSFW subgenial rant:

FlipFlip:scripts danbri$ pb < brag.pb
Fuck ‘em if they can’t take a joke! I’m *immune*! *Yip, yip, YEEEEEEE!*
*Backbone Blowout*! I do it for *fun*! YAH-HOOOO! Now give me some more of…

And here’s another:

FlipFlip:scripts danbri$ pb < brag.pb
I’m a fission reactor, I fart plutonium, power plants are fueled by the breath
of my brow; when they plug *me* in, the lights go out in Hell County! Now give
me some more of…

A fragment of the grammar:

“I say, `” slogan “‘. By God, `” slogan “‘, I say! ” |
“I am ” entity “, I am ” entity “! ” |
“I’ll drive a mile so as not to walk a foot; I am ” entity “! ” |
“Yes, I’m ” entity “! ” |
“I drank *” being “* under ” number ” tables, I am too ” adjective ” to die, I’m insured for acts o’ God *and* Satan! ” |
“I was shanghaied by ” entities ” and ” entities ” from ” place “, and got away with their hubcaps! ” |
“I *cannot* be tracked on radar! ” |
“I wear nothing uniform, I wear *no* ” emphatic ” uniform! ” | ….

To be crystal clear, this hosted app is total vapourware currently. And I’m not sure if it would be a huge piece of work to make sure the dada engine didn’t introduce security holes. Certainly the version that uses the C preprocessor scares me, but the “pb” utility shown above doesn’t use that. Opinions on the safety of this welcomed. But never mind the details, smell the vision! It could always use Novewriting (a Python “knock-off of the Dada Engine”) instead.

The simplest version of a hosting service for this kind of user generated meta-content would basically be textarea editing and a per-user collection of files that are flagged public or private. But given the nature of the system, it would be great if the text generator could be run against grammars that multiple people could contribute to. Scope for infinite folly…

I’ve always found HTML imagemaps to be a curiously neglected technology. They seem somehow to evoke the Web of the mid-to-late 90s, to be terribly ‘1.0′. But there’s glue in the old horse yet…

A client-side HTML imagemap lets you associate links (and via Javascript, behaviour) with regions of an image. As such, they’re a form of image metadata that can have applications including image search, Web accessibility and social networking. They’re also a poor cousin to the Web’s new vector image format, SVG. This morning I dug out some old work on this (much of which from Max, Libby, Jim all of whom btw are currently working at Joost; as am I, albeit part-time).

The first hurdle you hit when you want to play with HTML imagemaps is finding an editor that produces them. The fact that my blog post asking for MacOSX HTML imagemap editors is now top Google hit for “MacOSX HTML imagemap” pretty much says it all. Eventually I found (and paid for) one called YokMak that seems OK.

So the first experiment here, was to take a picture (of me) and make a simple HTML imagemap.

danbri being imagemapped

As a step towards treating this as re-usable metadata, here’s imagemap2svg.xslt from Max back in 2002. The results of running it with xsltproc are online: _output.svg (you need an SVG-happy browser). Firefox, Safari and Opera seem more or less happy with it (ie. they show the selected area against a pink background). This shows that imagemap data can be freed from the clutches of HTML, and repurposed. You can do similar things server-side using Apache Batik, a Java SVG toolkit. There are still a few 2002 examples floating around, showing how bits of the image can be described in RDF that includes imagemap info, and then manipulated using SVG tools driven from metadata.

Once we have this ability to pick out a region of an image (eg. photo) and tag it, it opens up a few fun directions. In the FOAF scene a few years ago, we had fun using RDF to tag image region parts with information about the things they depicted. But we didn’t really get into questions of surface-syntax, ie. how to maker rich claims about the image area directly within the HTML markup. These days, some combination of RDFa or microformats would probably be the thing to use (or perhaps GRDDL). I’ve sent mail to the RDFa group looking for help with this (see that message for various further related-work links too).

Specifically, I’d love to have some clean HTML markup that said, not just “this area of the photo is associated with the URI http://danbri.org/”, but “this area is the Person whose openid is danbri.org, … and this area depicts the thing that is the primary topic of http://en.wikipedia.org/wiki/Eiffel_Tower”. If we had this, I think we’d have some nice tools for finding images, for explaining images to people who can’t see them, and for connecting people and social networks through codepiction.

Codepiction

I’ve started a Ruby conversion of Ivan Herman’s Python SPARQL client, itself inspired by Lee Feigenbaum’s Javascript library. These are tools which simply transmit a SPARQL query across the ‘net to a SPARQL-protocol database endpoint, and handle the unpacking of the results. These queries can result in yes/no responses, variable-to-value bindings (rather like in SQL), or in chunks of RDF data. The default resultset notation is a simple XML format; JSON results are also widely available.

All I’ve done so far, is to sit down with the core Python file from Ivan’s package, and slog through converting it brainlessly into rather similar Ruby. Take out “self” and “:” from method definitions, write “nil” instead of “None”, write “end” at the end of each method and class, use Ruby’s iteration idioms, and you’re most of the way there. Of course the fiddly detail is where we use external libraries: for URIs, network access, JSON and XML parsing. I’ve cobbled something together which could be the basis for reconstructing all the original functionality in Ivan’s code. Currently, it’s more proof of concept, but enough to be worth posting.

Files are in SVN and browsable online; there’s also a bundled up download for the curious, brave or helpful. The test script shows all we have at the moment: ability to choose XML or JSON results, and access result set binding rows. Other result formats (notably RDF; there’s no RDF/XML parser currently) aren’t handled, and various bits of the code conversion are incomplete. I’d be super happy if someone came along and helped finish this!

Update:  Ruby’s REXML parser is now clumsily wired in; you get get a REXML Document object (see xml.com writeup) as a way of navigating the resultset.

Next Page »