Flock browser RDF: describing accounts

Flock is a mozilla-based browser that emphasises social and “web2″ themes. From a social-network-mobility thread, I’m reminded to take another look at Flock by Ian McKellar’s recent comments…

I wrote a bunch of that code when I was at Flock.

It’s all in RDF, I think it’s currently in a SQLite triple store in the user’s profile directory

I took a look. Seems not to be in SQLite files, at least in my fresh 0.9.1.0 installation. Instead there is a file flock-data.rdf which looks to be the product of Mozilla’s ageing RDF engine. I had to clean things up slightly before I could process it with modern (Redland in this case) tools, since it uses Netscape’s pre-RDFCore datatyping notation:

cat flock-data.rdf | sed -e s/NC:parseType/RDF:datatype/

With that tweak out of the way, I can nose around the data using SPARQL. I’m interested in the “social graph” mobility discussions, and in mapping FOAF usage to Brad Fitzpatrick’s model (see detail in his slides).

The model in the writeup from Brad and David Recordon has nodes (standing roughly for accounts) and “is” relations amongst them where two accounts are known to share an owner, or “claims” relations to record a claim associated with one such account of shared ownership with another.

For example in my Flickr account (username “danbri”) I might claim to own the del.icio.us account (also username “danbri”). However you’d be wise not to believe flickr-me without more proof; this could come from many sources and techniques. Their graph model is focussed on such data.

FOAF by contrast emphasises the human social network, with the node graph being driven by “knows” relationships amongst people. We do have the OnlineAccount construct, which is closer to the kind of nodes we see in the “Thoughts on the Social Graph” paper, although they also include nodes for email, IM and hashed mailbox, I believe. The SIOC spec elaborates on this level, by sub-classing its notion of User from OnlineAccount rather than from Person.

So anyway, I’m looking at transformations between such representations, and FLock seems a nice source of data, since it watches over my browsing and when I use a site it knows to be “social”, it keeps a record of the account in RDF. For now, here’s a quick query to give you an idea of the shape of the data:

PREFIX fl: <http://flock.com/rdf#>
PREFIX nc: <http://home.netscape.com/NC-rdf#>
SELECT DISTINCT *
FROM <flock-data-fixed.rdf>
WHERE {
?x fl:flockType “Account” .
?x nc:Name ?name .
?x nc:URL ?url .
?x fl:serviceId ?serviceId .
?x fl:accountId ?accountId .
}

Running this with Redland’s “roqet” utility in JSON mode gives:

{
“head”: {
“vars”: [ "x", "name", "url", "serviceId", "accountId" ]
},
“results”: {
“ordered” : false,
“distinct” : true,
“bindings” : [
{
"x" : { "type": "uri", "value": "urn:flock:ljdanbri" },
"name" : { "type": "literal", "value": "danbri" },
"url" : { "type": "literal", "value": "http://www.livejournal.com/portal" },
"serviceId" : { "type": "literal", "value": "@flock.com/people/livejournal;1" },
"accountId" : { "type": "literal", "value": "danbri" }
},
{
"x" : { "type": "uri", "value": "urn:typepad:service:danbri" },
"name" : { "type": "literal", "value": "danbri" },
"url" : { "type": "literal", "value": "http://www.typepad.com" },
"serviceId" : { "type": "literal", "value": "@flock.com/blog/typepad;1" },
"accountId" : { "type": "literal", "value": "danbri" }
},
{
"x" : { "type": "uri", "value": "urn:flock:flickr:account:35468151816@N01" },
"name" : { "type": "literal", "value": "danbri" },
"url" : { "type": "literal", "value": "http://www.flickr.com/people/35468151816@N01/" },
"serviceId" : { "type": "literal", "value": "@flock.com/?photo-api-flickr;1" },
"accountId" : { "type": "literal", "value": "35468151816@N01" }
},
{
"x" : { "type": "uri", "value": "urn:wordpress:service:danbri" },
"name" : { "type": "literal", "value": "danbri" },
"url" : { "type": "literal", "value": "http://www.wordpress.com" },
"serviceId" : { "type": "literal", "value": "@flock.com/people/wordpress;1" },
"accountId" : { "type": "literal", "value": "danbri" }
},
{
"x" : { "type": "uri", "value": "urn:flock:youtube:modanbri" },
"name" : { "type": "literal", "value": "modanbri" },
"url" : { "type": "literal", "value": "http://www.youtube.com/profile?user=modanbri" },
"serviceId" : { "type": "literal", "value": "@flock.com/?photo-api-youtube;1" },
"accountId" : { "type": "literal", "value": "modanbri" }
},
{
"x" : { "type": "uri", "value": "urn:delicious:service:danbri" },
"name" : { "type": "literal", "value": "danbri" },
"url" : { "type": "literal", "value": "http://del.icio.us/danbri" },
"serviceId" : { "type": "literal", "value": "@flock.com/delicious-service;1" },
"accountId" : { "type": "literal", "value": "danbri" }
}
]
}
}

You can see there are several bits of information to squeeze in here. Which reminds me to chase up the “accountHomepage” issue in FOAF. They sometimes use a generic URL, eg. http://www.livejournal.com/portal, while other times an account-specific one, eg. http://del.icio.us/danbri. They also distinguish an nc:Name property of the account from a fl:accountId, allowing Flickr’s human readable account names to be distinguished from the generated ID you’re originally assigned. The fl:serviceId is an internal software service identifier it seems, following Mozilla conventions.

Last little experiment: a variant of the above query, but using CONSTRUCT instead of SELECT, to transform into FOAF’s idiom for representing accounts:

CONSTRUCT {
?x a foaf:OnlineAccount .
?x foaf:name ?name .
?x foaf:accountServiceHomepage ?url .
?x foaf:accountName ?accountId .
}

Seems to work… There’s load of other stuff in flock-data.rdf too, but I’ve not looked around much. For eg. you can search tagged URLs –

WHERE {[fl:tag "funny"; nc:URL ?url]}

Begin again

facebook grabThere was an old man named Michael Finnegan
He went fishing with a pinnegan
Caught a fish and dropped it in again
Poor old Michael Finnegan
Begin again.

Let me clear something up. Danny mentions a discussion with Tim O’Reilly about SemWeb themes.

Much as I generally agree with Danny, I’m reaching for a ten-foot bargepole on this one point:

While Facebook may have achieved pretty major adoption for their approach, it’s only very marginally useful because of their overly simplistic treatment of relationships.

Facebook, despite the trivia, the endless wars between the ninja zombies and the pirate vampires; despite being centralised, despite [insert grumble] is massively useful. Proof of that pudding: it is massively used. “Marginal” doesn’t come into it. The real question is: what happens next?

Imagine 35 million people. Imagine them marching thru your front room. Jumping off a table at the same time. Sending you an email. Or turning the tap off when they brush their teeth. 35 million is a fair-sized nation. Taking that 35 million figure I’ve heard waved around, and placing it in the ever scientific Wikipedia listing … that puts the land of Facebook somewhere between Kenya and Algeria in the population charts. Perhaps the figures are exagerrated. Perhaps a few million have wandered off, or forgotten their passwords. Doubtless some only use it every month or few.

Even a million is a lot of use; and a lot of usefulness.

Don’t let anything I ever say here in this blog be taken as claiming such sites and services are only marginally useful. To be used is to be useful; and that’s something SemWeb people should keep in the forefront of their minds. And usually they do, I think, although the community tends towards the forward-looking.

But let’s be backwards-looking for a minute. My concern with these sites is not that they’re marginally useful, but that they could be even more useful. Slight difference of emphasis. SixDegrees.com was great, back in 2000 when we started FOAF. But it was a walled garden. It had cool graph traversal stuff that evocatively showed your connection path to anyone else in the network. Their network. Then followed Friendster, which got slow as it proved useful to too many people. Ditto Orkut, which everyone signed up to, then wandered off from when it proved there was rather little to do there except add people. MySpace and Facebook cracked that one, … but guess what, there’ll be more.

I got a signup to Yahoo’s Mash yesterday. Anyone wanna be my friend? It has fun stuff (“Mecca Ibrahim smacked The Mash Pet (your Mash pet)!”), … wiki-like profile editing, extension modules … and I’d hope given that this is 2007, eventually some form of API. People won’t live in Facebook-land forever. Nor in Mash, however fun it is. I still lean towards Jabber/XMPP as the long-term infrastructure for this sort of system, but that’s for another time. The appeal of SixDegrees, of Friendster, of Orkut … wasn’t ever the technology. It was the people. I was there ‘cos others were there. Nothing more. And I don’t see this changing, no matter how much the underlying technology evolves. And people move around, drift along to the next shiny thing, … go wherever their friends are. Which is our only real problem here.

Begin again.

I’ve been messing with RDF a bit. I made a sample SPARQL query that asks (exported RDF from) a few networks about my IM addresses; here are the results from Redland/Rasqal JSON.

Loosly joined

find . -name danbri-\*.rdf -exec rapper –count {} \;


rapper: Parsing file ./facebook/danbri-fb.rdf
rapper: Parsing returned 2155 statements
rapper: Parsing file ./orkut/danbri-orkut.rdf
rapper: Parsing returned 848 statements
rapper: Parsing file ./dopplr/danbri-dopplr.rdf
rapper: Parsing returned 346 statements
rapper: Parsing file ./tribe.net/danbri-tribe.rdf
rapper: Parsing returned 71 statements
rapper: Parsing file ./my.opera.com/danbri-opera.rdf
rapper: Parsing returned 123 statements
rapper: Parsing file ./advogato/danbri-advogato.rdf
rapper: Parsing returned 18 statements
rapper: Parsing file ./livejournal/danbri-livejournal.rdf
rapper: Parsing returned 139 statements

I can run little queries against various descriptions of me and my friends, extracted from places in the Web where we hang out.

Since we’re not yet in the shiny OpenID future, I’m matching people only on name (and setting up the myfb: etc prefixes to point to the relevant RDF files). I should probably take more care around xml:lang, to make sure things match. But this was just a rough test…


SELECT DISTINCT ?n
FROM myfb:
FROM myorkut:
FROM dopplr:
WHERE {
GRAPH myfb: {[ a :Person; :name ?n; :depiction ?img ]}
GRAPH myorkut: {[ a :Person; :name ?n; :mbox_sha1sum ?hash ]}
GRAPH dopplr: {[ a :Person; :name ?n; :img ?i2]}
}

…finds 12 names in common across Facebook, Orkut and Dopplr networks. Between Facebook and Orkut, 46 names. Facebook and Dopplr: 34. Dopplr and Orkut: 17 in common. Haven’t tried the others yet, nor generated RDF for IM and Flickr, which I probably have used more than any of these sites. The Facebook data was exported using the app I described recently; the Orkut data was done via the CSV format dumps they expose (non-mechanisable since they use a CAPCHA), while the Dopplr list was generated with a few lines of Ruby and their draft API: I list as foaf:knows pairs of people who reciprocally share their travel plans. Tribe.net, LiveJournal, my.opera.com and Advogato expose RDF/FOAF directly. Re Orkut, I noticed that they now have the option to flood your GTalk Jabber/XMPP account roster with everyone you know on Orkut. Not sure the wisdom of actually doing so (but I’ll try it), but it is worth noting that this quietly bridges a large ‘social network ing’ site with an open standards-based toolset.

For the record, the names common to my Dopplr, Facebook and Orkut accounts were: Liz Turner, Tom Heath, Rohit Khare, Edd Dumbill, Robin Berjon, Libby Miller, Brian Kelly, Matt Biddulph, Danny Ayers, Jeff Barr, Dave Beckett, Mark Baker. If I keep adding to the query for each other site, presumably the only person in common across all accounts will be …. me.

Querying Facebook in SPARQL

A fair few people have been asking about FOAF exporters from Facebook. I’m not entirely sure what else is out there, but Matthew Rowe has just announced a Facebook FOAF generator. It doesn’t dump all 35 million records into your Web browser, thankfully. But it will export a minimal description of you and your Facebook associates. At the moment, you get name, a photo URL, and (in this revision of the tool) a Facebook account name using FOAF’s OnlineAccount construct.

As an aside, this part of the FOAF design provides a way for identifiers from arbitrary services to be described in FOAF without special-purpose support. Some services have shortcut property names, eg. msnChatID and we may add more, but it is also important to allow this kind of freeform, decentralised identification. People shouldn’t have to petition the FOAF spec editors before any given Social Network site’s IDs can be supported; they can always use their own vocabulary alongside FOAF, or use the OnlineAccount construct as shown here.

I’ve saved my Facebook export on my Web site, working on the assumption that Facebook IDs are not private data. If people think otherwise, let me know and I’ll change the setup. We might also discuss whether even sharing the names and connectivity graph will upset people’s privacy expectations, but that’s for another day. Let me know if you’re annoyed!

Here is a quick SPARQL query, which simply asks for details of each person mentioned in the file who has an account on Facebook.


PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?name, ?pic, ?id
WHERE {
[ a :Person;
:name ?name;
:depiction ?pic;
:holdsAccount [ :accountServiceHomepage <http://www.facebook.com/> ; :accountName ?id ]
]
}
ORDER BY ?name

I tested this online using Dave Beckett’s Rasqal-based Web service. It should return a big list of the first 200 people matched by the query, ordered alphabetically by name.

For “Web 2.0″ fans, SPARQL‘s result sets are essentially tabular (just like SQL), and have encodings in both simple XML and JSON. So whatever you might have heard about RDF’s syntactic complexity, you can forget it when dealing with a SPARQL engine.

Here’s a fragment of the JSON results from the above query:


{
"name" : { "type": "literal", "value": "Dan Brickley" },
"pic" : { "type": "uri", "value": "http://danbri.org/yasns/facebook/danbri-fb.rdf" },
"id" : { "type": "literal", "value": "624168" }
},
{
"name" : { "type": "literal", "value": "Dan Brickley" },
"pic" : { "type": "uri", "value": "http://profile.ak.facebook.com/profile5/575/66/s501730978_7421.jpg" },
"id" : { "type": "literal", "value": "501730978" }
}, ...

What’s going on here? (a) Why are there two of me? (b) And why does it think that one of us has my Facebook FOAF file’s URL as a mugshot picture?

There’s no big mystery here. Firstly, there’s another guy who has the cheek to be called Dan Brickley. We’re friends on Facebook, even though we should probably be mortal enemies or something. Secondly, why does it give him the wrong URL for his photo? This is also straightforward, if a little technical. Basically, it’s an easily-fixed bug in this version of the FOAF exporter I used. When an image URL is not available, the convertor is still generating markup like “<foaf:depiction rdf:resource=””/>”. This empty URL is treated in RDF as the extreme case of a relative link, ie. the same kind of thing as writing “../../images/me.jpg” in a normal Web page. And since RDF is all about de-contextualising information, your RDF parser will try to resolve the relative link before passing the data on to storage or query systems (fiddly details are available to those that care). If the foaf:depiction property were simply ommitted when no photo was present, this problem wouldn’t arise. We’d then have to make the query a little more flexible, so that it still matched people even if there was no depiction, but that’s easy. I’ll show it next time.

I mentioned a couple of days ago that SPARQL is a query language with built-in support for asking questions about data provenance, ie. we can mix in “according to Facebook”, “according to Jabber” right into the WHERE clause of queries such as the one I show here. I’m not going to get into that today, but I will close with a visual observation about why that is important.

yasn map, borrowed from data junk, valleywag blog
To state the obvious, there’ll always be multiple Web sites where people hang out and socialise. A friend sent me this link the other day; a world map of social networks (thumbnail version copied here). I can’t vouch for the science behind it, but it makes the point that we risk fragmenting Web communities on geographic boundaries if we don’t bridge the various IM and YASN networks. There are lots of ways this can be done, each with different implications for user experience, business model, cost and practicality. But it has to happen. And when it does, we’ll be wanting ways of asking questions against aggregations from across these sites…

Symbol languages and the Semantic Web

OK so I just stumbled upon this…

Bomb in Baghdad

…via Jonathan Chetwynd’s ever-inventive and SVG-happy Peepo.com.

The “Car Bomb in Baghdad” story is from a site created by Widgit Software, and explains itself as follows:

Symbolworld has been set up to provide a web site with material suitable for symbol readers of all ages. The internet is an important medium which many people really like to use. Sadly there is very little material that is appropriate or accessible by people with learning difficulties.

The copyright statement for Symbolworld says “symbols on this page are copyright of the commercial owners. They may not be copied or used in any other format without the written permission from the owner.“, which initially struck me as a potential challenge to any use of this particular symbol-set for online communication. But I don’t really know this scene, and I guess this copyright could be just the same as the way eg. fonts are copyrighted.

So I don’t know much about this particular project/company/product, but it reminded me of some similar work I heard about a few years ago. Back when the EU, in their occasionally infinite wisdom, funded SWAD-Europe to run around talking to interesting people about standards and the Semantic Web (and giving them t-shirts) , Chaals organised a great developer workshop in Madrid on Image annotation. We had the usual fun with SVG and RDF (which btw I’m still betting on) and I got to learn a bit about CCF. Seeing the Baghdad example this morning reminded me of all this. I’ve been clicking around and trying to gather my sprawling thoughts.

CCF, the Concept Coding Framework, is kinda image annotation in reverse. Instead of focussing on the description of the content, concepts etc associated with images, the emphasis is on the use of images to illustrate some enumerated set of concepts. From Chaals’ workshop report re outcomes, I’m reminded that we discussed…

How to use Creative Commons and similar vocabularies to determine whether a particular symbol can be freely used (typically in commercial systems the symbols themselves are proprietary, which can be a major barrier to communication between people who have different systems).

CCF was using some variant of SKOS (another SWAD-Europe activity). This found its way into SKOS itself, where we now have prefSymbol and altSymbol relationships that associate a skos:Concept with a dcmitype:Image. Borrowing an illustration from the SKOS Core guide here:

skos symbol diagram

The guide also notes a distinction between symbolic labelling and “depiction” in the FOAF sense; some symbols are purely symbolic, and have no representational content.

So, catching up with this area of work a little, I find the Bliss Symbolics, the WWAAC project final report, and various other accessibility-related efforts. But I’ve not really figured out where I’d start if I wanted to build something simple using a freely available symbol-set, nor what the main options/projects currently are. But there’s plenty of reading, including pages from a recent Bliss “think tank” meeting.

The latest I can find on CCF is that it has moved sites and that there are some Web interfaces available, but “sorry – there are no downloads for this project yet.”. Ah, apparently the SYMBERED project is continuing development of CCF (aside: Bliss with swedish translation; doubly incomprehensible to me!). There is a nice example on their site showing the multilingual aspect to this work, as well as contrasting Bliss with a more representationally-oriented symbol set; see their site for details.

Here’s a simple example just in English:

I want coffee and milk and cookies.

In case anyone thinks this whole exploration is a bit niche and obscure, take a look at how people use MSN and other IM systems sometime. And of course the Jabber/XMPP guys have been exploring specs for standard emoticons. Chaals also points out some connections to VoiceXML where “there are a handful of options available in an interaction designed to be through voice, and developers will define assorted ways of recognising from a user’s speech which of the relevant concepts is being matched.”

We’re also only a few clicks away from the survey conducted by the dreamily named W3C Emotion incubator group into use cases and technologies for the description of emotion using markup languages.

If an RDF-ized Wordnet is also thrown into the mix, assigning URIs to Synsets, WordSenses, Words, I think we might actually be getting somewhere. The version of Wordnet in RDF published at W3C doesn’t currently use SKOS, although this was discussed, and of course there are other representations that make more use of RDF class hierarchies for nouns (at the expense of linguistic lossyness and losing non-noun content). Princeton’s original English-language Wordnet has spawned many related projects and translations, but as far as I know, there is little integration amongst them, and not all of the data is public or freely re-usable.

I once had a silly dream of taking a photo to go with each and every noun term in Wordnet. A kind of SemWeb I-Spy, a cousin to Immuexa’s FOAF bingo. Or better, of doing that with friends. The rise of Flickr and tagging means that we’d probably do this now by aggregate using Flickr and similar sites. But it seems conceivable to me that such an “illustrated wordnet” could be made, using either photo-oriented or symbol-oriented illustrations.

OK what might that buy us? Let’s try these two samples.

Not perfect, … but imho Wordnet gives a nice set of common and identifiable concepts that can be used as a hub for all kinds of different projects. And all we’d need is a huge pile of shared data shaped like this:

wordnet:word-cookie skos.prefLabel wikicommons:Explosion.svg .

OK it’s not going to bring about world peace and the return of esperanto, and of course there’s much more to language and communication than nouns and verbs (the easiest part of wordnet to turn into visual symbols), but it does strike me as a fun little (big) project…. Wordnet is too huge to be useful in every context where we’d want a modest-sized symbol set (eg. IM emoticons), … but it is nicely searchable, and would provide a framework for such subsets to evolve and be interconnected.

History time

From the O’Reilly Factor, Sept 12.

Ron Paul: … so I see the Iranians as acting logically and defensively. We’ve been fighting the Iranians since 1953. We overthrew their government through the CIA in 1953. We were allies with Saddam Hussein in the 1980s and we encouraged him to invade Iran…

Bill O’Reilly: Allright, so I just want to get … we don’t need the history lesson … but I do want to get this on the record. I do understand the region … but we don’t have time to do the history lesson tonight. … You don’t fear Iran, even though Iran has demonstrated, it can start a war, which it did last summer with its Hezbollah surrogates and it stated, it stated, that it wants to do damage to Israel, to wipe it off the face of the earth. And is developing a nuclear weapon. And you don’t fear them? …

“The World is now closed”

Facebook in many ways is pretty open for a ‘social networking’ site. It gives extension apps a good amount of access to both data and UI. But the closed world language employed in their UI betrays the immodest assumption “Facebook knows all”.

  • Eric Childress and Stuart Weibel are now friends with Charles McCathienevile.
  • John Doe is now in a relationship.
  • You have 210 friends.

To state the obvious: maybe Eric, Stu and Chaals were already friends. Maybe Facebook was the last to know about John’s relationship; maybe friendship isn’t countable. As the walls between social networking sites slowly melt (I put Jabber/XMPP first here, with OpenID, FOAF, SPARQL and XFN as helper apps), me and my 210 closest friends will share fragments of our lives with a wide variety of sites. If we choose to make those descriptions linkable, the linked sites will increasingly need to refine their UI text to be a little more modest: even the biggest site doesn’t get the full story.

Closed World Assumption (Abort/Retry/Fail)
Facebook are far from alone in this (see this Xbox screenshot too, “You do not have any friends!”); but even with 35M users, the mistake is jarring, and not just to Semantic Web geeks of the missing isn’t broken school. It’s simply a mistake to fail to distinguish the world from its description, or the territory from the map.

A description of me and my friends hosted by a big Web site isn’t “my social network”. Those sites are just a database containing claims made by different people, some verified, some not. And with, inevitably, lots missing. My “social network” is an abstractification of a set of interlinked real-world histories. You could make the case that there has only ever been one “social network” since the distant beginnings of human society; certainly those who try to do geneology with Web data formats run into this in a weaker form, including the need to balance competing and partial information. We can do better than categorised “buddylists” when describing people, their inter-connections and relationships. And in many ways Facebook is doing just great here. Aside from the Pirates-vs-Ninjas noise, many extension applications on Facebook allow arbitrary events from elsewhere in the Web to bubble up through their service and be seen (or filtered) by others who are linked to me in their database. For example:

Facebook is good at reporting events, generally. Especially those sourced outside the system. Where it isn’t so great is when reporting internal-events, eg. someone telling it about a relationship. Event descriptions are nice things to syndicate btw since they never go out of date. Syndicating descriptions of the changeable properties of the world, on the other hand, is more slippery since you need to have all other relevant facts to be able to say how the world is right now (or implicitly, how it used to be, before). “Dan has painted his car red” versus “Dan’s car is now red”. “Dan has bookmarked the Jabber user profile spec” versus “Dan now has 1621 bookmarks”. “Dan has added Charles to his Facebook profile” versus “Dan is now friends with Charles”.

We need better UI that reflects what’s really going on. There will be users who choose to live much of their lives in public view, spread across sites, sharing enough information for these accounts to be linked. Hopefully they’ll be as privacy-smart and selective as Pew suggests. Personas and ‘characters’ can be spread across sites without either site necessarily revealing a real-world identity; secrets are keepable, at least in theory. But we will see people’s behaviour and claims from one site leak into another, and with approval. I don’t think this will be just through some giant “social graph” of strictly enumerated relationships, but through a haze of vaguer data.

What we’re most missing is a style of end-user UI here that educates users about this world that spans websites, couching things in terms of claims hosted in sites, rather than in absolutist terms. I suppose I probably don’t have 210 “friends” (whatever that means) in real life, although I know a lot of great people and am happy to be linked to them online. But I have 210 entries in a Facebook-hosted database. My email whitelist file has 8785 email addresses in it currently; email accounts that I’m prepared to assume aren’t sending me spam. I’m sure I can’t have 8785 friends. My Google Mail (and hence GTalk Jabber) account claims 682 contacts, and has some mysterious relationship to my Orkut account where I have 200+ (more randomly selected) friends. And now the OpenID roster on my blog gives another list (as of today, 19 OpenIDs that made it past the WordPress spam filter). Modern social websites shouldn’t try to tell me how many friends I have; that’s just silly. And they shouldn’t assume their database knows it all. What they can do is try to tell me things that are interesting to me, with some emphasis on things that touch my immediate world and the extended world of those I’m variously connected to.

So what am I getting at here? I guess it’s just that we need these big social sites to move away from making teen-talk claims about how the world is – “Sally (now) loves John” – and instead become reflectors for the things people are saying, “Sally announces that she’s in love with John”; “John says that he used to work for Microsoft” versus “John worked for Microsoft 2004-2006″; “Stanford University says Sally was awarded a PhD in 2008″. Today’s young internet users are growing up fast, and the Web around them needs also to mature.

One of the most puzzling criticisms you’ll often hear about the Semantic Web initiative is that is requires a single universal truth, a monolithic ontology to model all of human knowledge. Those of us in the SW community know that this isn’t so; we’ve been saying for a long time that our (meta)data architecture is designed to allow people to publish claims “in which
statements can draw upon multiple vocabularies that are managed in a decentralised fashion by various communities of expertise.”
As the SemWeb technology stack now has a much better approach to representing data provenance (SPARQL named graphs replacing RDF’99 statement reification) I believe we should now be putting more emphasis on a related theme: Semantic Web data can represent disputes, competing claims, and contradictions. And we can query it in an SQL-like language (SPARQL) that allows us to ask questions not just of some all-knowing database, but about what different databases are telling us.

The closed world approach to data gives us a lot, don’t get me wrong. I’m not the only one with a love-hate relationship with SQL. There are many optimisations we can do in a traditional SQL or XML Schema environment which become hard in an RDF context. In particular, going “open world” makes for a harder job when hosting and managing data rather than merely aggregating and integrating it. Nevertheless, if you’re looking for a modern Web data environment for aggregating claims of the “Stanford University says Sally was awarded a PhD in 1995″ form, SPARQL has a lot to offer.

When we’re querying a single, all-knowing, all-trusted database, SQL will do the job (eg. see Facebook’s FQL for example). When we need to take a bit more care with “who said what” and “according to whom?” aspects, coupled with schema extensibility and frequently missing data, SQL starts to hurt. If we’re aggregating (and building UI for) ‘social web’ claims about the world rather than simple buddylists (which XMPP/Jabber gives us out of the box), I suspect aggregators will get burned unless they take care to keep careful track of who said what, whether using SPARQL or some home-grown database system in the same spirit. And I think they’ll find that doing so will be peculiarly rewarding, giving us a foundation for applications that do substantially more than merely listing your buddies…

Google Earth touring via KML

While I’m writing up old hacks, here’s one that I really enjoyed, even if it was a bit clunky. A couple of years ago Mikel Maron implemented (on my urging in irc.oftc.net #geo IRC :) a PHP-based Google Earth touring service, which interconnects a “tour guide” user with “tourists”.

This site facilitates collaborative, realtime exploration of Google Earth. As the “tour guide” navigates, “tourists” will automatically follow along.

When the tour guide’s Google Earth installation is at rest, a specially installed KML network link sends the server an HTTP request, showing the coordinates of the visible area of the globe. This same service is periodically polled (every second) by “tourists” whose Google Earth will dutifully fly to the appropriate spot.

The system seems to be offline currently, but was quite evocative to use, even if tricky. You never quite knew what the other party could actually see, since the picture can load quite slowly when moving around a lot. And the implementation didn’t do anything about angle of view (although this became possible in later versions of KML). I had experimental tours of Dublin guided by Ina (Skyping at same time), and of various places in Iran by Hamed Saber.

I expect in due course (if not already, I don’t track these things) Google Earth and similar products (Worldwind, or the Microsoft thingy) will offer social map browsing, it’s such a nice feature, though it really needs an audio channel open at the same time. Last week I tried to do the same without such a link, my mum talking me thru finding a small village in France. Much harder! “Take the road north out of Chabanais … past a small farm, past the swimming pool…”.

Here is Mikel’s “how it works” writeup:

The web interface generates KML files, which are loaded into Google Earth and create Network Links. The tour guide has a “View Based Refresh” Network Link, which sends the bounding box of the current view to the specified URL whenever the camera stops. That position is stored. Tourists receive a “Time Based Refresh” Network Link, which requests every 10 seconds and receives the last stored position of the guide.

Right now only location and altitude are transmitted. A future release of Google Earth may enable tilt and rotation. Integrated chat would be nice as well.

The fact that they’ve hidden a full flight simulator within Google Earth might make this worth revisiting. And of course there is infinite fun to be had from playing with photos etc on the globe, although my last attempts in that direction (preparing for 3 months in Buenos Aires by studying geo-tagged photos instead of Spanish) tailed off. Everyone was putting pics on maps, I got a bit bored, even though it’s still a worthwhile area with much still to be done.

Some ideas are not meant to be combined though: who really needs a collaborative realtime photo-navigator implemented with Google Earth flight simulator? :)