Schema.org and One Hundred Years of Search

A talk from London SemWeb meetup hosted by the BBC Academy in London, Mar 30 2012….

Slides and video are already in the Web, but I wanted to post this as an excuse to plug the new Web History Community Group that Max and I have just started at W3C. The talk was part of the Libraries, Media and the Semantic Web meetup hosted by the BBC in March. It gave an opportunity to run through some forgotten history, linking Paul Otlet, the Universal Decimal Classification, schema.org and some 100 year old search logs from Otlet’s Mundaneum. Having worked with the BBC Lonclass system (a descendant of Otlet’s UDC), and collaborated with the Aida Slavic of the UDC on their publication of Linked Data, I was happy to be given the chance to try to spell out these hidden connections. It also turned out that Google colleagues have been working to support the Mundaneum and the memory of this early work, and I’m happy that the talk led to discussions with both the Mundaneum and Computer History Museum about the new Web History group at W3C.

So, everything’s connected. Many thanks to W. Boyd Rayward (Otlet’s biographer) for sharing the ancient logs that inspired the talk (see slides/video for a few more details). I hope we can find more such things to share in the Web History group, because the history of the Web didn’t begin with the Web…

WordPress, TinyMCE and RDFa editors

I’m writing this in WordPress’s ‘Visual’ mode WYSIWYG HTML editor, and thinking “how could it be improved to support RDFa?”

Well let’s think. Humm. In RDFa, every section of text is always ‘about’ something, and then has typed links or properties associated with that thing. So there are icons ‘B’ for bold, ‘I’ for italics, etc. And thingies for bulleted lists, paragraphs, fonts. I’m not sure how easily it would handle the nesting of full RDFa, but some common case could be a start: a paragraph that was about some specific thing, and within which the topical focus didn’t shift.

So, here is a paragraph about me. How to say it’s really about me? There could be a ‘typeof’ attribute on the paragraph element, with value foaf:Person, and an ‘about’ attribute with a URI for me, eg. http://danbri.org/foaf.rdf#danbri

In this UI, I just type textual paragraphs and then double newlines are interpreted by the publishing engine as separate HTML paragraphs. I don’t see any paragraph-level or even post-level properties in the user interface where an ‘about’ URI could be stored. But I’m sure something could be hacked. Now what about properties and links? Let’s see…

I’d like to link to TinyMCE now, since that is the HTML editor that is embedded in WordPress. So here is a normal link to TinyMCE. To make it, I searched for the TinyMCE homepage, copied its URL into the copy/paste buffer, then pressed the ‘add link’ icon here after highlighting the text I wanted to turn into a link.

It gave me a little HTML/CSS popup window with some options – link URL, target window, link title and (CSS) class; I’ve posted a screenshot below. This bit of UI seems like it could be easily extended for typed links. For example, if I were adding a link to a paragraph that was about a Film, and I was linking to (a page about) its director or an actor, we might want to express that using an annotation on the link. Due to the indirection (we’re trying to say that the link is to a page whose primary topic is the director, etc.), this might complicate our markup. I’ll come back to that later.

tinymce link edit screenshop

tinymce link edit screenshot

I wonder if there’s any mention of RDFa on the TinyMCE site? Nope. Nor even any mention of Microformats.

Perhaps the simplest thing to try to build into TinyMCE would be for a situation where a type on the link expressed a relationship between the topic of the blog post (or paragraph), and some kind of document.

For example, a paragraph about me, might want to annotate a link to my old school’s homepage (ie. http://www.westergate.w-sussex.sch.uk/ ) with a rel=”foaf:schoolHomepage” property.

So our target RDFa would be something like (assuming this markup is escaped and displayed nicely):

<p about=”http://danbri.org/foaf.rdf#danbri” typeof=”foaf:Person” xmlns:foaf=”http://xmlns.com/foaf/0.1/”>

I’m Dan, and I went to <a rel=”foaf:schoolHomepage” href=”http://www.westergate.w-sussex.sch.uk/”>Westergate School</a>

</p>.

See recent discussion on the foaf-dev list where we contrast this idiom with one that uses a relationship (eg. ‘school’) that directly links a person to a school, rather than to the school’s homepage. Toby made some example RDFa output of the latter style, in response to a debate about whether the foaf:schoolHomepage idiom is an ‘anti-pattern’. Excerpt:

<p about=”http://danbri.org/foaf.rdf#danbri” typeof=”foaf:Person” xmlns:foaf=”http://xmlns.com/foaf/0.1/”>

My name is <span property=”foaf:name”>Dan Brickley</span> and I spent much of the ’80s at <span rel=”foaf:school”><a typeof=”foaf:Organization” property=”foaf:name” rel=”foaf:homepage” href=”http://www.westergate.w-sussex.sch.uk/”>Westergate School</a>.

</p>

So there are some differences between these two idioms. From the RDF side they tell you similar things: the school I went to. The latter idiom is more expressive, and allows additional information to be expressed about the school, without mixing it up with the school’s homepage. The former simply says “this link is to the homepage of Dan’s school”; and presumably defers to the school site or elsewhere for more details.

I’m interested to learn more about how these expressivity tradeoffs relate to the possibilities for visual / graphical editors. Could TinyMCE be easily extended to support either idiom? How would the GUI recognise a markup pattern suitable for editing in either style? Can we avoid having to express relationships indirectly, ie. via pages? Would it be a big job to get some basic RDFa facilities into WordPress via TinyMCE?

For related earlier work, see the 2003 SWAD-Europe report on Semantic blogging from the team then at HP Labs Bristol.

The Time of Day

(a clock showing no time)

From my Skype logs [2008-06-19 Dan Brickley: 18:24:23]

So I had a drunken dream about online microcurrencies last night. Also about cats and water-slides but that’s another story. Idea was of a karma donation system based on one-off assignments from person to person of specified chunks of their lifetime; ‘giving the time of day’. you’re allowed to give any time of day taken from those days you’ve been alive so far. they’re not directly redistributable, nor necessarily related to what happened during the specified time. there’s no central banker, beyond the notion of ‘the public record’. The system naturally favours the old/experienced, but if someone gets drunk and gives all their time/karma to a porn site, at least in the morning they’ll have another 24h ‘in the bank’. Or they could retract/deny the gift, although doing so a lot would also be visible in the public record and doing so excessively would make one look a bit sketchy, one’s time gifts seem less valuable etc. Anchoring to a real world ‘good’ (time) is supposed to provide some control against runaway inflation, as is non-redistributability, but also the time thing is nice for visualizations and explanation. I’m not really sure if it makes sense but thought i’d write it down before i forget the idea…

One idea would be for the time gifts to be redeemable, but that i think pushes the metaphor too far into being a real currency for a fictional world where hourly rates are flattened. Some Lets schemes probably work that way I guess…

So I’ve been meaning to write this up, but in the absense of having done so, here’s the idea as it first struck me. I had been thinking a bit about online reputation services, and the kinds of information they might aggregate. Garlik’s QDOS and FOAF experiments being a good example of this kind of evidence aggregation. As OpenID, FOAF, microformats etc. take hold, I really think we’ll see a massive parting of waves, red sea style, with the “public record” on one side, and “private stuff” on the other.

And in the public record, we’ll be attaching information about the things we make and do to well-known identifiers for people (and their semi-detached aliases). Various websites have rating and karma mechanisms, but it is far from clear how they’ll look when shared in the public Web. Nor whether something robust and not-too-gameable will come out of it. There are certainly various modelling idioms (eg. advogato do their internal calculations, and then put everyone in one of several broad-brush groups; here’s my advogato FOAF). See also my previous notes on representing expertise.

Now in some IRC channels, there are bots where you can dish out credit by typing things like:

edd++ # xtechy

…and have a bot add up the credits, as well as the comments. In small IRC communities these aren’t gamed except for fun. So I’ve been thinking: how can these kinds of habits ever work in the wider Web, where people are spread across Web sites (but nevetherless identifiable with OpenID and FOAF). How could it not turn hideous? What limited resource do we each have a supply of? No, not kidneys. And in a hungover stupor I came to think that “the time of day” could be such a resource. It’s really just a metaphor, and I’m not sure at all that the quantifiable nature is a benefit. But I also quite like that we each have a neverending supply of the stuff, and that even a fleeting moment can count.

Update: here’s a post from Simon Lucy which has a very similar direction (it was Simon I was drinking with the night before writing this). Excerpt:

And what do you do with your positive balance? Need you do anything? I imagine those that care will publish their balance or compare it with others in similar way to company cars or hi fi tvs. There will always be envy and jealousy.

But no one can steal your balance, misuse it.

So who wants to host the Ego Bank.

The main difference compared to my suggested scheme, is just the ‘the Web’ and the public record it carries, are the “ego bank”, creating a playground for aggregators of karma, credibility and reputation information. “The time of day” would just be one such category of information…

Querying Facebook in SPARQL

A fair few people have been asking about FOAF exporters from Facebook. I’m not entirely sure what else is out there, but Matthew Rowe has just announced a Facebook FOAF generator. It doesn’t dump all 35 million records into your Web browser, thankfully. But it will export a minimal description of you and your Facebook associates. At the moment, you get name, a photo URL, and (in this revision of the tool) a Facebook account name using FOAF’s OnlineAccount construct.

As an aside, this part of the FOAF design provides a way for identifiers from arbitrary services to be described in FOAF without special-purpose support. Some services have shortcut property names, eg. msnChatID and we may add more, but it is also important to allow this kind of freeform, decentralised identification. People shouldn’t have to petition the FOAF spec editors before any given Social Network site’s IDs can be supported; they can always use their own vocabulary alongside FOAF, or use the OnlineAccount construct as shown here.

I’ve saved my Facebook export on my Web site, working on the assumption that Facebook IDs are not private data. If people think otherwise, let me know and I’ll change the setup. We might also discuss whether even sharing the names and connectivity graph will upset people’s privacy expectations, but that’s for another day. Let me know if you’re annoyed!

Here is a quick SPARQL query, which simply asks for details of each person mentioned in the file who has an account on Facebook.


PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?name, ?pic, ?id
WHERE {
[ a :Person;
:name ?name;
:depiction ?pic;
:holdsAccount [ :accountServiceHomepage <http://www.facebook.com/> ; :accountName ?id ]
]
}
ORDER BY ?name

I tested this online using Dave Beckett’s Rasqal-based Web service. It should return a big list of the first 200 people matched by the query, ordered alphabetically by name.

For “Web 2.0″ fans, SPARQL‘s result sets are essentially tabular (just like SQL), and have encodings in both simple XML and JSON. So whatever you might have heard about RDF’s syntactic complexity, you can forget it when dealing with a SPARQL engine.

Here’s a fragment of the JSON results from the above query:


{
"name" : { "type": "literal", "value": "Dan Brickley" },
"pic" : { "type": "uri", "value": "http://danbri.org/yasns/facebook/danbri-fb.rdf" },
"id" : { "type": "literal", "value": "624168" }
},
{
"name" : { "type": "literal", "value": "Dan Brickley" },
"pic" : { "type": "uri", "value": "http://profile.ak.facebook.com/profile5/575/66/s501730978_7421.jpg" },
"id" : { "type": "literal", "value": "501730978" }
}, ...

What’s going on here? (a) Why are there two of me? (b) And why does it think that one of us has my Facebook FOAF file’s URL as a mugshot picture?

There’s no big mystery here. Firstly, there’s another guy who has the cheek to be called Dan Brickley. We’re friends on Facebook, even though we should probably be mortal enemies or something. Secondly, why does it give him the wrong URL for his photo? This is also straightforward, if a little technical. Basically, it’s an easily-fixed bug in this version of the FOAF exporter I used. When an image URL is not available, the convertor is still generating markup like “<foaf:depiction rdf:resource=””/>”. This empty URL is treated in RDF as the extreme case of a relative link, ie. the same kind of thing as writing “../../images/me.jpg” in a normal Web page. And since RDF is all about de-contextualising information, your RDF parser will try to resolve the relative link before passing the data on to storage or query systems (fiddly details are available to those that care). If the foaf:depiction property were simply ommitted when no photo was present, this problem wouldn’t arise. We’d then have to make the query a little more flexible, so that it still matched people even if there was no depiction, but that’s easy. I’ll show it next time.

I mentioned a couple of days ago that SPARQL is a query language with built-in support for asking questions about data provenance, ie. we can mix in “according to Facebook”, “according to Jabber” right into the WHERE clause of queries such as the one I show here. I’m not going to get into that today, but I will close with a visual observation about why that is important.

yasn map, borrowed from data junk, valleywag blog
To state the obvious, there’ll always be multiple Web sites where people hang out and socialise. A friend sent me this link the other day; a world map of social networks (thumbnail version copied here). I can’t vouch for the science behind it, but it makes the point that we risk fragmenting Web communities on geographic boundaries if we don’t bridge the various IM and YASN networks. There are lots of ways this can be done, each with different implications for user experience, business model, cost and practicality. But it has to happen. And when it does, we’ll be wanting ways of asking questions against aggregations from across these sites…