Flock browser RDF: describing accounts

Flock is a mozilla-based browser that emphasises social and “web2″ themes. From a social-network-mobility thread, I’m reminded to take another look at Flock by Ian McKellar’s recent comments…

I wrote a bunch of that code when I was at Flock.

It’s all in RDF, I think it’s currently in a SQLite triple store in the user’s profile directory

I took a look. Seems not to be in SQLite files, at least in my fresh 0.9.1.0 installation. Instead there is a file flock-data.rdf which looks to be the product of Mozilla’s ageing RDF engine. I had to clean things up slightly before I could process it with modern (Redland in this case) tools, since it uses Netscape’s pre-RDFCore datatyping notation:

cat flock-data.rdf | sed -e s/NC:parseType/RDF:datatype/

With that tweak out of the way, I can nose around the data using SPARQL. I’m interested in the “social graph” mobility discussions, and in mapping FOAF usage to Brad Fitzpatrick’s model (see detail in his slides).

The model in the writeup from Brad and David Recordon has nodes (standing roughly for accounts) and “is” relations amongst them where two accounts are known to share an owner, or “claims” relations to record a claim associated with one such account of shared ownership with another.

For example in my Flickr account (username “danbri”) I might claim to own the del.icio.us account (also username “danbri”). However you’d be wise not to believe flickr-me without more proof; this could come from many sources and techniques. Their graph model is focussed on such data.

FOAF by contrast emphasises the human social network, with the node graph being driven by “knows” relationships amongst people. We do have the OnlineAccount construct, which is closer to the kind of nodes we see in the “Thoughts on the Social Graph” paper, although they also include nodes for email, IM and hashed mailbox, I believe. The SIOC spec elaborates on this level, by sub-classing its notion of User from OnlineAccount rather than from Person.

So anyway, I’m looking at transformations between such representations, and FLock seems a nice source of data, since it watches over my browsing and when I use a site it knows to be “social”, it keeps a record of the account in RDF. For now, here’s a quick query to give you an idea of the shape of the data:

PREFIX fl: <http://flock.com/rdf#>
PREFIX nc: <http://home.netscape.com/NC-rdf#>
SELECT DISTINCT *
FROM <flock-data-fixed.rdf>
WHERE {
?x fl:flockType “Account” .
?x nc:Name ?name .
?x nc:URL ?url .
?x fl:serviceId ?serviceId .
?x fl:accountId ?accountId .
}

Running this with Redland’s “roqet” utility in JSON mode gives:

{
“head”: {
“vars”: [ "x", "name", "url", "serviceId", "accountId" ]
},
“results”: {
“ordered” : false,
“distinct” : true,
“bindings” : [
{
"x" : { "type": "uri", "value": "urn:flock:ljdanbri" },
"name" : { "type": "literal", "value": "danbri" },
"url" : { "type": "literal", "value": "http://www.livejournal.com/portal" },
"serviceId" : { "type": "literal", "value": "@flock.com/people/livejournal;1" },
"accountId" : { "type": "literal", "value": "danbri" }
},
{
"x" : { "type": "uri", "value": "urn:typepad:service:danbri" },
"name" : { "type": "literal", "value": "danbri" },
"url" : { "type": "literal", "value": "http://www.typepad.com" },
"serviceId" : { "type": "literal", "value": "@flock.com/blog/typepad;1" },
"accountId" : { "type": "literal", "value": "danbri" }
},
{
"x" : { "type": "uri", "value": "urn:flock:flickr:account:35468151816@N01" },
"name" : { "type": "literal", "value": "danbri" },
"url" : { "type": "literal", "value": "http://www.flickr.com/people/35468151816@N01/" },
"serviceId" : { "type": "literal", "value": "@flock.com/?photo-api-flickr;1" },
"accountId" : { "type": "literal", "value": "35468151816@N01" }
},
{
"x" : { "type": "uri", "value": "urn:wordpress:service:danbri" },
"name" : { "type": "literal", "value": "danbri" },
"url" : { "type": "literal", "value": "http://www.wordpress.com" },
"serviceId" : { "type": "literal", "value": "@flock.com/people/wordpress;1" },
"accountId" : { "type": "literal", "value": "danbri" }
},
{
"x" : { "type": "uri", "value": "urn:flock:youtube:modanbri" },
"name" : { "type": "literal", "value": "modanbri" },
"url" : { "type": "literal", "value": "http://www.youtube.com/profile?user=modanbri" },
"serviceId" : { "type": "literal", "value": "@flock.com/?photo-api-youtube;1" },
"accountId" : { "type": "literal", "value": "modanbri" }
},
{
"x" : { "type": "uri", "value": "urn:delicious:service:danbri" },
"name" : { "type": "literal", "value": "danbri" },
"url" : { "type": "literal", "value": "http://del.icio.us/danbri" },
"serviceId" : { "type": "literal", "value": "@flock.com/delicious-service;1" },
"accountId" : { "type": "literal", "value": "danbri" }
}
]
}
}

You can see there are several bits of information to squeeze in here. Which reminds me to chase up the “accountHomepage” issue in FOAF. They sometimes use a generic URL, eg. http://www.livejournal.com/portal, while other times an account-specific one, eg. http://del.icio.us/danbri. They also distinguish an nc:Name property of the account from a fl:accountId, allowing Flickr’s human readable account names to be distinguished from the generated ID you’re originally assigned. The fl:serviceId is an internal software service identifier it seems, following Mozilla conventions.

Last little experiment: a variant of the above query, but using CONSTRUCT instead of SELECT, to transform into FOAF’s idiom for representing accounts:

CONSTRUCT {
?x a foaf:OnlineAccount .
?x foaf:name ?name .
?x foaf:accountServiceHomepage ?url .
?x foaf:accountName ?accountId .
}

Seems to work… There’s load of other stuff in flock-data.rdf too, but I’ve not looked around much. For eg. you can search tagged URLs –

WHERE {[fl:tag "funny"; nc:URL ?url]}

Begin again

facebook grabThere was an old man named Michael Finnegan
He went fishing with a pinnegan
Caught a fish and dropped it in again
Poor old Michael Finnegan
Begin again.

Let me clear something up. Danny mentions a discussion with Tim O’Reilly about SemWeb themes.

Much as I generally agree with Danny, I’m reaching for a ten-foot bargepole on this one point:

While Facebook may have achieved pretty major adoption for their approach, it’s only very marginally useful because of their overly simplistic treatment of relationships.

Facebook, despite the trivia, the endless wars between the ninja zombies and the pirate vampires; despite being centralised, despite [insert grumble] is massively useful. Proof of that pudding: it is massively used. “Marginal” doesn’t come into it. The real question is: what happens next?

Imagine 35 million people. Imagine them marching thru your front room. Jumping off a table at the same time. Sending you an email. Or turning the tap off when they brush their teeth. 35 million is a fair-sized nation. Taking that 35 million figure I’ve heard waved around, and placing it in the ever scientific Wikipedia listing … that puts the land of Facebook somewhere between Kenya and Algeria in the population charts. Perhaps the figures are exagerrated. Perhaps a few million have wandered off, or forgotten their passwords. Doubtless some only use it every month or few.

Even a million is a lot of use; and a lot of usefulness.

Don’t let anything I ever say here in this blog be taken as claiming such sites and services are only marginally useful. To be used is to be useful; and that’s something SemWeb people should keep in the forefront of their minds. And usually they do, I think, although the community tends towards the forward-looking.

But let’s be backwards-looking for a minute. My concern with these sites is not that they’re marginally useful, but that they could be even more useful. Slight difference of emphasis. SixDegrees.com was great, back in 2000 when we started FOAF. But it was a walled garden. It had cool graph traversal stuff that evocatively showed your connection path to anyone else in the network. Their network. Then followed Friendster, which got slow as it proved useful to too many people. Ditto Orkut, which everyone signed up to, then wandered off from when it proved there was rather little to do there except add people. MySpace and Facebook cracked that one, … but guess what, there’ll be more.

I got a signup to Yahoo’s Mash yesterday. Anyone wanna be my friend? It has fun stuff (“Mecca Ibrahim smacked The Mash Pet (your Mash pet)!”), … wiki-like profile editing, extension modules … and I’d hope given that this is 2007, eventually some form of API. People won’t live in Facebook-land forever. Nor in Mash, however fun it is. I still lean towards Jabber/XMPP as the long-term infrastructure for this sort of system, but that’s for another time. The appeal of SixDegrees, of Friendster, of Orkut … wasn’t ever the technology. It was the people. I was there ‘cos others were there. Nothing more. And I don’t see this changing, no matter how much the underlying technology evolves. And people move around, drift along to the next shiny thing, … go wherever their friends are. Which is our only real problem here.

Begin again.

I’ve been messing with RDF a bit. I made a sample SPARQL query that asks (exported RDF from) a few networks about my IM addresses; here are the results from Redland/Rasqal JSON.

Querying Facebook in SPARQL

A fair few people have been asking about FOAF exporters from Facebook. I’m not entirely sure what else is out there, but Matthew Rowe has just announced a Facebook FOAF generator. It doesn’t dump all 35 million records into your Web browser, thankfully. But it will export a minimal description of you and your Facebook associates. At the moment, you get name, a photo URL, and (in this revision of the tool) a Facebook account name using FOAF’s OnlineAccount construct.

As an aside, this part of the FOAF design provides a way for identifiers from arbitrary services to be described in FOAF without special-purpose support. Some services have shortcut property names, eg. msnChatID and we may add more, but it is also important to allow this kind of freeform, decentralised identification. People shouldn’t have to petition the FOAF spec editors before any given Social Network site’s IDs can be supported; they can always use their own vocabulary alongside FOAF, or use the OnlineAccount construct as shown here.

I’ve saved my Facebook export on my Web site, working on the assumption that Facebook IDs are not private data. If people think otherwise, let me know and I’ll change the setup. We might also discuss whether even sharing the names and connectivity graph will upset people’s privacy expectations, but that’s for another day. Let me know if you’re annoyed!

Here is a quick SPARQL query, which simply asks for details of each person mentioned in the file who has an account on Facebook.


PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?name, ?pic, ?id
WHERE {
[ a :Person;
:name ?name;
:depiction ?pic;
:holdsAccount [ :accountServiceHomepage <http://www.facebook.com/> ; :accountName ?id ]
]
}
ORDER BY ?name

I tested this online using Dave Beckett’s Rasqal-based Web service. It should return a big list of the first 200 people matched by the query, ordered alphabetically by name.

For “Web 2.0″ fans, SPARQL‘s result sets are essentially tabular (just like SQL), and have encodings in both simple XML and JSON. So whatever you might have heard about RDF’s syntactic complexity, you can forget it when dealing with a SPARQL engine.

Here’s a fragment of the JSON results from the above query:


{
"name" : { "type": "literal", "value": "Dan Brickley" },
"pic" : { "type": "uri", "value": "http://danbri.org/yasns/facebook/danbri-fb.rdf" },
"id" : { "type": "literal", "value": "624168" }
},
{
"name" : { "type": "literal", "value": "Dan Brickley" },
"pic" : { "type": "uri", "value": "http://profile.ak.facebook.com/profile5/575/66/s501730978_7421.jpg" },
"id" : { "type": "literal", "value": "501730978" }
}, ...

What’s going on here? (a) Why are there two of me? (b) And why does it think that one of us has my Facebook FOAF file’s URL as a mugshot picture?

There’s no big mystery here. Firstly, there’s another guy who has the cheek to be called Dan Brickley. We’re friends on Facebook, even though we should probably be mortal enemies or something. Secondly, why does it give him the wrong URL for his photo? This is also straightforward, if a little technical. Basically, it’s an easily-fixed bug in this version of the FOAF exporter I used. When an image URL is not available, the convertor is still generating markup like “<foaf:depiction rdf:resource=””/>”. This empty URL is treated in RDF as the extreme case of a relative link, ie. the same kind of thing as writing “../../images/me.jpg” in a normal Web page. And since RDF is all about de-contextualising information, your RDF parser will try to resolve the relative link before passing the data on to storage or query systems (fiddly details are available to those that care). If the foaf:depiction property were simply ommitted when no photo was present, this problem wouldn’t arise. We’d then have to make the query a little more flexible, so that it still matched people even if there was no depiction, but that’s easy. I’ll show it next time.

I mentioned a couple of days ago that SPARQL is a query language with built-in support for asking questions about data provenance, ie. we can mix in “according to Facebook”, “according to Jabber” right into the WHERE clause of queries such as the one I show here. I’m not going to get into that today, but I will close with a visual observation about why that is important.

yasn map, borrowed from data junk, valleywag blog
To state the obvious, there’ll always be multiple Web sites where people hang out and socialise. A friend sent me this link the other day; a world map of social networks (thumbnail version copied here). I can’t vouch for the science behind it, but it makes the point that we risk fragmenting Web communities on geographic boundaries if we don’t bridge the various IM and YASN networks. There are lots of ways this can be done, each with different implications for user experience, business model, cost and practicality. But it has to happen. And when it does, we’ll be wanting ways of asking questions against aggregations from across these sites…