Loosly joined

find . -name danbri-\*.rdf -exec rapper –count {} \;


rapper: Parsing file ./facebook/danbri-fb.rdf
rapper: Parsing returned 2155 statements
rapper: Parsing file ./orkut/danbri-orkut.rdf
rapper: Parsing returned 848 statements
rapper: Parsing file ./dopplr/danbri-dopplr.rdf
rapper: Parsing returned 346 statements
rapper: Parsing file ./tribe.net/danbri-tribe.rdf
rapper: Parsing returned 71 statements
rapper: Parsing file ./my.opera.com/danbri-opera.rdf
rapper: Parsing returned 123 statements
rapper: Parsing file ./advogato/danbri-advogato.rdf
rapper: Parsing returned 18 statements
rapper: Parsing file ./livejournal/danbri-livejournal.rdf
rapper: Parsing returned 139 statements

I can run little queries against various descriptions of me and my friends, extracted from places in the Web where we hang out.

Since we’re not yet in the shiny OpenID future, I’m matching people only on name (and setting up the myfb: etc prefixes to point to the relevant RDF files). I should probably take more care around xml:lang, to make sure things match. But this was just a rough test…


SELECT DISTINCT ?n
FROM myfb:
FROM myorkut:
FROM dopplr:
WHERE {
GRAPH myfb: {[ a :Person; :name ?n; :depiction ?img ]}
GRAPH myorkut: {[ a :Person; :name ?n; :mbox_sha1sum ?hash ]}
GRAPH dopplr: {[ a :Person; :name ?n; :img ?i2]}
}

…finds 12 names in common across Facebook, Orkut and Dopplr networks. Between Facebook and Orkut, 46 names. Facebook and Dopplr: 34. Dopplr and Orkut: 17 in common. Haven’t tried the others yet, nor generated RDF for IM and Flickr, which I probably have used more than any of these sites. The Facebook data was exported using the app I described recently; the Orkut data was done via the CSV format dumps they expose (non-mechanisable since they use a CAPCHA), while the Dopplr list was generated with a few lines of Ruby and their draft API: I list as foaf:knows pairs of people who reciprocally share their travel plans. Tribe.net, LiveJournal, my.opera.com and Advogato expose RDF/FOAF directly. Re Orkut, I noticed that they now have the option to flood your GTalk Jabber/XMPP account roster with everyone you know on Orkut. Not sure the wisdom of actually doing so (but I’ll try it), but it is worth noting that this quietly bridges a large ‘social network ing’ site with an open standards-based toolset.

For the record, the names common to my Dopplr, Facebook and Orkut accounts were: Liz Turner, Tom Heath, Rohit Khare, Edd Dumbill, Robin Berjon, Libby Miller, Brian Kelly, Matt Biddulph, Danny Ayers, Jeff Barr, Dave Beckett, Mark Baker. If I keep adding to the query for each other site, presumably the only person in common across all accounts will be …. me.

Who, what, where, when?

A “Who? what? where? when?” of the Semantic Web is taking shape nicely.

Danny Ayers shows some work with FOAF and the hCard microformat, picking up a theme first explored by Dan Connolly back in 2000: inter-conversion between RDF and HTML person descriptions. Danny generates hCards from SPARQL queries of FOAF, an approach which would pair nicely with GRDDL for going in the other direction.

Meanwhile at W3C, the closing days of the SW Best Practices Group have recently produced a draft of an RDF/OWL Representation of Wordnet. Wordnet is a fantastic resource, containing descriptions of pretty much every word in the English language. Anyone who has spent time in committees, deciding which terms to include in some schema/vocabulary, must surely feel the appeal of a schema which simply contains all possible words. There are essentially two approaches to putting Wordnet into the Semantic Web. A lexically-oriented approach, such as the one published at W3C for Wordnet 2.0, presents a description of words. It mirrors the structure of wordnet itself (verbs, nouns, synsets etc.). Consequently it can be a complete and unjudgemental reflection into RDF of all the work done by the Wordnet team.

The alternate, and complementary, approach is to explore ways of projecting the contents of Wordnet into an ontology, so that category terms (from the noun hierarchy) in Wordnet become classes in RDF. I made a simplistic approach at this some time back (see overview). It has appeal (alonside the linguistic version) because it allows RDF to be used to describe instances of classes for each concept in wordnet, with other properties of those instances. See WhyWordnetIsCool in the FOAF wiki for an example of Wordnet’s coverage of everyday objects.

So, getting Wordnet moving into the SW is great step. It gives us URIs to identify a huge number of everyday concepts. It’s coverage isn’t complete, and it’s ontology is kinda quirky. Aldo Gangemi and others have worked on tidying up the hierarchy; I believe only for version 1.6 of Wordnet so far. I hope that work will eventually get published at W3C or elsewhere as stable URIs we can all use.

In addition to Wordnet there are various other efforts that give types that can be used for the “what” of “who/what/where/when”. I’ve been talking with Rob McCool about re-publishing a version of the old TAP knowledge base. The TAP project is now closed, with Rob working for Yahoo and Guha at Google. Stanford maintain the site but aren’t working on it. So I’ve been working on a quick cleanup (wellformed RDF/XML etc.) of TAP that could get it into more mainstream use. TAP, unlike Wordnet, has more modern everyday commercial concepts (have a look), as well as a lot of specific named instances of these classes.

Which brings me to (Semantic) Wikipedia; another approach to identifying things and their types on the Semantic Web. A while back we added isPrimaryTopicOf to FOAF, to make it easier to piggyback on Wikipedia for RDF-identifying things that have Wiki (and other) pages about them. The Semantic Mediawiki project goes much much further in this direction, providing a rich mapping (classes etc.) into RDF for much of Wikipedia’s more data-oriented content. Very exciting, especially if it gets into the main codebase.

So I think the combination of things like Wordnet, TAP, Wikipedia, and instance-identifying strategies such as “isPrimaryTopicOf”, will give us a solid base for identifying what the things are that we’re describing in the Semantic Web.

And regarding. “Where?” and “when?” … on the UI front, we saw a couple of announcements recently: OpenLayers v1.0, which provides Google-maps-like UI functionality, but opensource and standards friendly. And for ‘when’, a similar offering: the timeline widget. This should allow for fancy UIs to be wired in with RDF calendar or RDF-geo tagged data.

Talking of which… good news of the week: W3C has just announced a Geo incubator group (see detailed charter), whose mission includes updates for the basic Geo (ie. lat/long etc) vocabulary we created in the SW Interest Group.

Ok, I’ve gone on at enough length already, so I’ll talk about SKOS another time. In brief – it fits in here in a few places. When extended with more lexical stuff (for describing terms, eg. multi-lingual thesauri) it could be used as a base for representing the lexically-oriented version of Wordnet. And it also fits in nicely with Wikipedia, I believe.

Last thing, don’t get me wrong — I’m not claiming these vocabs and datasets and bits of UI are “the” way to do ‘who/what/where/when’ on the Semantic Web. They’re each one of several options. But it is great to have a whole pile of viable options at last :)

Four notions of “wishlist” for FOAF

After the recent SWAD-Europe meeting on ImageDescription techniques, I made some scribbled notes in a bar on different notions of “wishlist” that might make sense to use with FOAF.

At the ImageDescription meeting we talked a lot about the difficulty of scoping such technical activities, since problem spaces overlap in ways that create opportunities and frustration in equal measure. The idea of having a “wishlist” association with a person’s homepage or FOAF profile(s) illustrates just this. A related idea is that of a foaf:tipjar, ie. a relationship between a person and a document (or part of a document) which explains how to pay or otherwise reward that person. The idea of public wishlists is associated with the practice of companies such as Amazon, who allow people to expose lists of things they’d like bought for them (eg. as Birthday presents). Danny Ayers has done some work on transforming Amazon wishlists into a FOAF/RDF format. This can also be seen as one half of the concept of bartering explored by Ian Davis, and in earlier FOAFShop scribbles. It is far from clear how all this stuff fits together in a way that best suits incrementally fancy deployment.

Anyway, here are four notions of Wishlist as discussed in Spain.

Sense 1: Wishlists as true descriptions

This notion of a wishlist can be characterised as a relationship between a Person (or Agent) and a description of the world. The idea is that the “wish” is for the description to be true. Any RDF description of the world can be used. For example, I might wish for a job at Microsoft (ie. for the foaf:Person whose foaf:homepage is http://danbri.org/ to have a foaf:workplaceHomepage of http://www.microsoft.com/). Or I might wish to get to know someone (expressible with foaf:knows), or for anything else that RDF can describe. A variation on this design is to express that I wish some RDF-expressible state of affairs not to be a true description of the world.

Sense 2: to own particular things

This is a common sense of wishlist. A wishlist in this sense is just a list of things, associated with a Person or Agent that wishes to own them.

Sense 3: an information-oriented expression of interest

Somewhat different, this one. Idea is that we might want to express informational wishes, for example we might want to leave “questions” in the Web, and have services answer them, notifying us of the answers (via email, RSS/Atom, etc.). This kind of wishlist could be implemented through users publishing queries (expressed in terms of RDF/XML descriptions) alongside their FOAF files.

Sense 4: Wanting to own things that match some particular template description.

This is a hybrid of 2 and 3, and can be seen as a common generalisation of 2. In practice, the notion of wishlist as a “list of items one wants to own” is likely to be implemented through a description of the things on that list. The implicit assumption is that for each thing on the list, there is at most one desired object that matches the description (eg. some particular book with ISBN and title). However in practice there are often multiple objects that meet a description, and it is actually quite difficult to constrain item descriptions so that there is only one possible match.

Instead of saying that one wants to own the car whose numberplate is ABC-FOAF-1 you could say “I’d like to own a wn:Car that is RED and whose foaf:maker has a foaf:homepage of http://www.audi.co.uk/ and which is the foaf:primaryTopic of some specified page.

Synthesis?

Senses 2 and 4 could be re-characterised using sense 1, using notions such as foaf:owns (which doesn’t exist at the time of writing). In other words, ownership-oriented wishlist mechanisms can be seen as being based expressions of a desire that “it should be true that … owns …”, for some particular person and entity.

The big question is… what to do next? what would it make sense to add to, or use with, FOAF?