When your OpenID provider goes offline…

 My main OpenID provider is currently LiveJournal, delegated from my own danbri.org domain. I suspect it’s much more likely that danbri.org would go offline or be hacked again (sorry DreamHost) than LJ; but either could happen!

In such circumstances, what should a ‘relying party’ (aka consumer) site do? Apparently myopenid has been down today; these are not theoretical scenarios. And my danbri.org site was hacked last year, due to a DreamHost vulnerability. The bad guys merely added viagra adverts; they could easily have messed with my OpenID delegation URL instead.

I don’t know the OpenID 2.0 spec inside-out (to put it mildly!) but one model that strikes me as plausible: the relying party should hang onto FOAF and XFN ‘rel=me’ data that you’ve somehow confirmed (eg. those from http://danbri.org/foaf.rdf or my LJ FOAF) and simply offer to let you log in with another OpenID known to be associated with you. You might not even know in advance that these other accounts of yours offer OpenID; after all there are new services being rolled out on a regular basis. For a confirmed list of ‘my’ URLs, you can poke around to see which are OpenIDs.

danbri$ curl -s http://danbri.livejournal.com/ | grep openid
<link rel=”openid.server” href=”http://www.livejournal.com/openid/server.bml” />

danbri$ curl -s http://flickr.com/photos/danbri/ | grep openid
<link rel=”openid2.provider” href=”https://open.login.yahooapis.com/openid/op/auth” />

Sites do go down. It would be good to have a slicker user experience when this happens. Given that we have formats – FOAF and XFN at least – that allow a user to be associated with multiple (possibly OpenID-capable) URLs, what would it take to have OpenID login make use of this?

IRC RDF logs and foaf:chatEvent

For many years, the 24×7 IRC chatrooms #swig and #foaf (and previously #rdfig) have been logged to HTML and RDF by Dave Beckett‘s IRC logging code. The RDF idiom used here dates from 2001 or so, and looks like this:

 <foaf:ChatChannel rdf:about=”irc://irc.freenode.net/swig”>
<foaf:chatEventList>
<rdf:Seq>
<rdf:li>
<foaf:chatEvent rdf:ID=”T00-05-11″>
<dc:date>2008-04-03T00:05:11Z</dc:date>

<dc:description>
danbri: do you know of good scutter data for playing with codepiction? would be fun to get back into that (esp. parallelization).
</dc:description>
<dc:creator>
<wn:Person foaf:nick=”kasei”/>
</dc:creator>
</foaf:chatEvent>

Dave has offered to make a one-time search-and-replace fix to the old logs, if we want to agree a new idiom. The main driver for this is that the old logs have a class for ‘chat event’  but use an initial lowercase later for the term, ie. ‘chatEvent’ instead of ‘ChatEvent’. None of these properties are yet documented in the FOAF schema, and since the data for this term is highly concentrated, and its maintainer has offered to change it, I suggest we document these FOAF terms as used, with the fix of having a class foaf:ChatEvent instead of foaf:chatEvent.

Almost all  RDF vocabularies stick to the rule of using initial lowercase letters for properties, and initial capitals for classes. This makes RDF easy to read for those who know this trick; consequently a lowercase class name can be very confusing, for experts and beginners alike. I’d therefore rather not introduce one into FOAF if it can be avoided. But I would like to document the IRC logging data format, and continue to use FOAF for it.

The markup also uses the wordnet:Person class from my old Wordnet 1.6 namespace (currently offline, but will be repaired eventually, albeit with a later Wordnet installation). This follows early FOAF practice, where we minimised terms and used Wordnet a lot more. I suggest Dave updates this to use foaf:Person instead. The dc:creator property used here might also use the new DC Terms notion of ‘creator’, since that flavour of ‘creator’ has a range of DC Terms “Agent” that is a more modern and FOAF-compatible idiom. This btw is a candidate for using instead of foaf:maker, which I introduced with some regret only because the old dc:creator property had such weak usage rules. But then if we change the DC namespace used for ‘creator’ here, should we change the other ones too? Hmm hmm hmm etc.

The main known consumer of this IRC log data is XSLT created and maintained by the ubiquitous Dave. If you know of other downstream consumer apps please let us know by commenting in the blog or by email.

While there are other ways in which such chatlogs might be represented in RDF (eg. SIOC, or using something linked to XMPP), let’s keep the scope of this change quite small. The goal is to make a minimal fix to the current archived instance data  and associated tools, so that the FOAF vocabulary it uses can be documented.

Comments welcomed…

One Big Happy Family (FOAF/RDF and Microformats)

I just signed up to give a talk at the Microformats vEvent in London, May 27th; thanks to the organizers (Frances Berriman and Drew McLellen of microformats.org) for inviting me :)

I’ve called it “One Big Happy Family: Practical Collaboration on Meaningful Markup” and my goal really is to help make it easier for enthusiasts for both RDF and Microformats to say ‘we‘ rather than ‘they‘ a bit more often when discussing complementary efforts from this community. As I said on the foaf-dev list yesterday, “anything good for Microformats is good for FOAF”; vice-versa too, I hope. There’s only one Web and we’re all doing our bit, with the tools and techniques we know best.

Here’s the abstract:

This talk explores some ways in which the Microformat and RDF approaches can complement each other, and some ways in which we can share data, tools and experiences between these two technologies. It will outline the often-unarticulated history of the RDF design, the techniques used for parsing and querying RDF data, and the things made easy and hard through this approach. RDF techniques can be contrasted with the different choices made for Microformats. However these differences obscure an underlying similarity that comes from shared ‘Webby’ values.

Edit: it seems I’m incapable of spelling “compl[ie]mentary”. Freudian slip? :)

BTW the London Web Week site has just gone live; check it out…

Language Expertise in FOAF: Speaks, Reads, Writes revisited

Speaks, reads, writes
Stephanie Booth asks:

 I vaguely remember somebody telling me about some emerging “standard” (too big a word) for encoding language skills. Or was it a dream?

That would’ve been me, showing markup from the FOAFX beta from Paola Di Maio and friends, which explores the extension of FOAF with expertise information. This is part of the ExpertFinder discussions alongside the FOAF project (see also wiki, mailing list). FOAFX and the ExpertFinder community are looking at ways of extending FOAF to better describe people’s expertise; both self-described and externally accredited. This is at once a fascinating, important and terrifyingly hard to scope problem area. It touches on longstanding “Web of trust” themes, on educational metadata standards, and on the various ways of characterising topics or domains of expertise. In other words, in any such problem space, there will always be multiple ways of “doing it”. For example, here is how the Advogato community site characterises my expertise regarding opensource software: foaf.rdf (I’m in the Journeyer group, apparently; some weighted average of people’s judgements about me).

One thing FOAFX attempts is to describe language skills. For this, they extend the idiom proposed by Inkel some years ago in his “Speaks, Reads, Writesschema. In the original (which is Spanish, but see also English version), the classification was effectively binary: one could either speak, read, or write a language; or one couldn’t. You could also say you ‘mastered’ it, meaning that you could speak, read and write it. In FOAFX, this is handled differently: we get a 1-5 score. I like this direction, as it allows me to express that I have some basic capability in Spanish, without appearing to boast that I’m anything like “fluent”. But … am I a “1” or a “2”? Should I poll my long-suffering Spanish-speaking friends? Take an online quiz? Introducing numbers gives the impression of mathematical precision, but in skill characterisation this is notoriously hard (and not without controversy).

My take here is that there’s no right thing to do. So progress and experimentation are to be celebrated, even if the solution isn’t perfect. On language skills, I’d love some way also to allow people to say “I’m learning language X”, or “I’m happy to help you practice your English/Spanish/Japanese/etc.”. Who knows, with more such information available, online Social Network sites could even prove useful…

Here btw is the current RDF markup generated by FOAFX:

<foaf:Person rdf:ID="me">
<foaf:mbox_sha1>6e80d02de4cb3376605a34976e31188bb16180d0</foaf:mbox_sha1>
<foaf:givenname>Dan</foaf:givenname>
<foaf:family_name>Brickley</foaf:family_name>
<foaf:homepage rdf:resource="http://danbri.org/" />
<foaf:weblog rdf:resource="http://danbri.org/words/" />
<foaf:depiction rdf:resource="http://danbri.org/images/me.jpg" />
<foaf:jabberID>danbrickley@gmail.com</foaf:jabberID>
<foafx:language>
<foafx:Language>
<foafx:name>English</foafx:name>
<foafx:speaking>5</foafx:speaking>
<foafx:reading>5</foafx:reading>
<foafx:writing>5</foafx:writing>
</foafx:Language>
</foafx:language>
<foafx:language>
<foafx:Language>
<foafx:name>Spanish</foafx:name>
<foafx:speaking>1</foafx:speaking>
<foafx:reading>1</foafx:reading>
<foafx:writing>1</foafx:writing>
</foafx:Language>
</foafx:language>
<foafx:expertise>
<foafx:Expertise>
<foafx:field>::</foafx:field>
<foafx:fluency>
<foafx:Language>
<foafx:name>English</foafx:name>
</foafx:Language>
</foafx:fluency>
</foafx:Expertise>
</foafx:expertise>
</foaf:Person>

The apparent redundancy in the markup (expertise, Expertise) is due to RDF’s so-called “striped” syntax. I have an old introduction to this idea; in short, RDF lets you define properties of things, and categories of thing. The FOAFX design effectively says, “there is a property of a person called “expertise” which relates that person to another thing, an “Expertise”, which itself has properties like “fluency”.

The FOAFX design tries to navigate between generic and specific, by including language-oriented markup as well as more generic skill descriptions. I think this is probably the right way to go. There are many things that we can say about human languages that don’t apply to other areas of expertise (eg. opensource software development). And there many things we can say about expertise in general (like expressions of willingness to learn, to teach, … indications of formal qualification) which are cross domain. Similarly, there are many things we might say in markup about opensource projects (picking up on my Advogato mention earlier) which have nothing to do with human languages. Yet both human language expertise and opensource skills are things we might want to express via FOAF extensions. For example, the DOAP project already lets us describe opensource projects and our roles in them.

The Semantic Web design challenge here is to provide a melting pot for all these different kinds of data, one that allows each specific problem to be solved adequately in a reasonable time-frame, without precluding the possibility for richer integration at a later date. I have a hunch that the Advogato design, which expresses skills in terms of group membership, could be a way to go here.

This is related to the idea of expressing group-membership criteria through writing SPARQL queries. For example, we can talk about the Group of people who work for W3C. Or we can talk about the Group of people who work for W3C as listed authoritatively on the W3C site. Both rules are expressible as queries; the latter a query that says things about the source of claims, as well as about what those claims assert. This notion of a group defined by a query allows for both flavours; the definition could include criteria relating to the provenance (ie. source) of the claims, but it needn’t. So we could express the idea of people who speak Spanish, or the idea of people who speak french according to having passed some particular test, or being certified by some agency. In either case, the unifying notion is “person X is in group Y”, where Y is a group identified by some URL. What I like about this model, is it allows for a very loose division of labour: skill-related markup is necessarily going to be widely varied. Yet the idea that such scattered evidence boils down to people falling into definable groups, gives some overall cohesion to this diversity. I could for example run a query asking for people with (foafx idiom) “Spanish skills of 2 or more”. I could add a constraint that the person be at least a “Journeyer” regarding their opensource skills, according to Advogato, or perhaps mix in data expressed in DOAP terms regarding their roles in opensource project work. These skills effectively define groups (loosly, sets) of people, and skill search can be pictured in venn diagram terms. Of course all this depends on getting enough data out there for any such queries to be worthwhile. Maybe a Facebook app that re-published data outside of Hotel Facebook would be a way of bootstrapping things here?

“Stuff I’ve been thinking about” (SocialNetworkPortability WebCamp) – my slides

I’m in Cork, mainly for the excellent Social Network Portability event on Sunday, but am also staying through Blogtalk’08 which has been great. I’ve uploaded my slides from my talk (slideshare in Flash, included inline here, or a pdf). I have some rough speaking notes too,  maybe I’ll get those online. I have no idea how they relate to whatever actually came out of my mouth during the talk :) Apologies to those without PDF or Flash. I haven’t tried Keynote’s HTML output yet.

Basically much of what I was getting at in the talk, and my thoughts are only just congealing on this … is that the idea of a ‘claim’ is a useful bridge between Semantic Web and Social Networking concerns. Also that it helps us understand how technologies fit together. FOAF defines a dictionary of terms for making claims, as does xfn, hCard. RDF/XML, Microformats, RDFa, GRDDL define textual notations for publishing documents that encode claims, and SPARQL gives us a way of asking questions about the claims made in different documents.

Blood money

It’s in the Daily Mail, so it must be true:

Motorists will be targeted by a new generation of road cameras which work out how many people are in a car by measuring the amount of bodily fluid it contains.

The latest snooping device on the nation’s roads aims to penalise lone drivers who abuse car-sharing lanes, and is part of a Government effort to combat congestion at busy times.

The cameras work by sending an infrared beam through the windscreen of vehicles which detects the unique make-up of blood and water content in human skin.

Coincidentally enough, today’s hackery:

 danbri> add ds danbri http://danbri.org/words/sparqlpress/sparql

jamendo> store added

danbri> add template “FIND BLOOD DONORS” “select ?n ?bt where { ?x <http://xmlns.com/foaf/0.1/nick> ?n  . ?x <http://kota.s12.xrea.com/vocab/uranaibloodtype> ?bt . }”

danbri> FIND BLOOD DONORS

jamendo> “danbri” “A+”

This makes use of the Uranai FOAF extension for recording one’s blood type.

Trying xOperator via Yves’s installation; the above dialog conducted purely through Jabber IM. Nice to see XMPP+SPARQL explorations gaining traction. I really like the extra value-adding layers added by the xOperator text chat UI, and the fact that you can wire in HTTP-based SPARQL endpoints too. Similar functionality in IRC comes from Bengee’s sparqlbot (see intro doc). I think the latter has a slightly fancier templating system, but I haven’t really explored all that xOperator can do. I did get the ‘blood’ example working with sparqlbot in IRC too; it seems most people don’t have their bloodtype in their FOAF file. So much for finding handy donors ;)

Blood and XMPP aside, there’s a lot going on here technically. The fact that I can ask Yves’ installation of xOperator to attach to my SparqlPress installation’s database is interesting. At the moment everything in there is public, so ACL is easy. But we could imagine that picture changing. Perhaps I decide that healthcare-related info goes into a special set of named graphs. How could permissioning for this be handled so that Yves’s bot can still provide me with UI? OAuth is probably part of the answer here, although that means bouncing out into http/html UI to some extent at least. And there’s more to say on SPARQL ACL too. Much more, but not right now.

The other technically rich area is on these natural language templates. It is historically rare for public SQL endpoints to be made available, even read only. Some of the reasons for this (local obscure schemas, closed world logic) have gone away with SPARQL, but the biggest reason – ease of writing very expensive queries – is very much still present. Guha and McCool’s Alpiri/TAP work argued that we needed another lighter data interface; they proposed ‘GetData‘. The templates system we see in xOperator and sparqlbot (see also julie, wh4 and foafbot) suggests another take on this problem. Rather than expose all of my personal datastore to random ‘SELECT * WHERE { ?p ?s ?o }’ crawlers, these bots can provide a wrapping layer, allowing in only queries that are much more tightly constrained, as well as being more human-oriented. Hmm I’m suddenly reminded of Metalog; wonder if Massimo is still working on that area.

A nice step here might be if chat-oriented query templates could be shared between xOperator and sparqlbot. Wonder what would be needed to make that happen…

A friend by any other namespace

Homer goes to Moe’s. He buys Barney a beer, calling him “soulmate,” but Barney says that he’s “really more of a chum,” not his soulmate. Lenny describes himself as a “crony”; Carl, a “acquaintance”; Larry, a “colleague”; Sam, a “sympathizer”; Bumblebee Man, a “compadre”; Kearney, an “associate”; and Hibbert, a “contemporary”.

I’m a well-wisher, in that I don’t wish you any specific harm. — Moe, on his relationship to Homer, “El Viaje Misterioso de Nuestro Homer” [3F24]

See also: wordnet

Friend lists on Facebook – “all lists are private”

Friend Lists are here.

 Now you can easily organize your friends into convenient lists for messaging, invites, and more. You can create whatever kinds of lists you want; all lists are private. Click “Make a New List” to get started.

Just noticed this message on Facebook as I logged in. It was rumoured a little while ago, and in general fits with my goals for FOAF, use of Group markup etc. I just expect that “all lists are private” isn’t the end of the privacy story here. Will friend list data be exposed to 3rd party app? There are lots of reasons it would be great to do so (and foaf:Group provides a notation for sharing this data with downstream standards-based apps). On the other hand, we don’t want to see any more “Find out what your friends really think of you” uglyness. Wonder which way they’ll jump.

JQbus: social graph query with XMPP/SPARQL

Righto, it’s about time I wrote this one up. One of my last deeds at W3C before leaving at the end of 2005, was to begin the specification of an XMPP binding of the SPARQL querying protocol. For the acronym averse, a quick recap. XMPP is the name the IETF give to the Jabber messaging technology. And SPARQL is W3C’s RDF-based approach to querying mixed-up Web data. SPARQL defines a textual query language, an XML result-set format, and a JSON version for good measure. There is also a protocol for interacting with SPARQL databases; this defines an abstract interface, and a binding to HTTP. There is as-yet no official binding to XMPP/Jabber, and existing explorations are flawed. But I’ll argue here, the work is well worth completing.

jqbus diagram

So what do we have so far? Back in 2005, I was working in Java, Chris Schmidt in Python, and Steve Harris in Perl. Chris has a nice writeup of one of the original variants I’d proposed, which came out of my discussions with Peter St Andre. Chris also beat me in the race to have a working implementation, though I’ll attribute this to Python’s advantages over Java ;)

I won’t get bogged down in the protocol details here, except to note that Peter advised us to use IQ stanzas. That existing work has a few slight variants on the idea of sending a SPARQL query in one IQ packet, and returning all the results within another, and that this isn’t quite deployable as-is. When the result set is too big, we can run into practical (rather than spec-mandated) limits at the server-to-server layers. For example, Peter mentioned that jabber.org had a 65k packet limit. At SGFoo last week, someone suggested sending the results as an attachment instead; apparently this one of the uncountably many extension specs produced by the energetic Jabber community. The 2005 work was also somewhat partial, and didn’t work out the detail of having a full binding (eg. dealing with default graphs, named graphs etc).

That said, I think we’re onto something good. I’ll talk through the Java stuff I worked on, since I know it best. The code uses Ignite Online’s Smack API. I have published rough Java code that can communicate with instances of itself across Jabber. This was last updated July 2007, when I fixed it up to use more recent versions of Smack and Jena. I forget if the code to parse out query results from the responses was completed, but it does at least send SPARQL XML results back through the XMPP network.

sparql jabber interaction

sparql jabber interaction

So why is this interesting?

  • SPARQLing over XMPP can cut through firewalls/NAT and talk to data on the desktop
  • SPARQLing over XMPP happens in a social environment; queries are sent from and to Jabber addresses, while roster information is available which could be used for access control at various levels of granularity
  • XMPP is well suited for async interaction; stored queries could return results days or weeks later (eg. job search)
  • The DISO project is integrating some PHP XMPP code with WordPress; SparqlPress is doing same with SPARQL

Both SPARQL and XMPP have mechanisms for batched results, it isn’t clear which, if either, to use here.

XMPP also has some service discovery mechanisms; I hope we’ll wire up a way to inspect each party on a buddylist roster, to see who has SPARQL data available. I made a diagram of this last summer, but no code to go with it yet. There is also much work yet to do on access control systems for SPARQL data, eg. using oauth. It is far from clear how to integrate SPARQL-wide ideas on that with the specific possibilities offered within an XMPP binding. One idea is for SPARQL-defined FOAF groups to be used to manage buddylist rosters, “friend groups”.

Where are we with code? I have a longer page in FOAF SVN for the Jqbus Java stuff, and a variant on this writeup (includes better alt text for the images and more detail). The Java code is available for download. Chris’s Python code is still up on his site. I doubt any of these can currently talk to each other properly, but they show how to deal with XMPP in different languages, which is useful in itself. For Perl people, I’ve uploaded a copy of Steve’s code.

The Java stuff has a nice GUI for debugging, thanks to Smack. I just tried a copy. Basically I can run a client and a server instance from the same filetree, passing it my LiveJournal and Google Talk jabber account details. The screenshot here shows the client on the left having the XML-encoded SPARQL results, and the server on the right displaying the query that arrived. That’s about it really. Nothing else ought to be that different from normal SPARQL querying, except that it is being done in an infrastructure that is more socially-grounded and decentralised than the classic HTTP Web service model.

JQbus debug