Shadows on the Web

This is a quick note, inspired by the recent burst of posts passing through Planet RDF about RDF, WebArch and a second “shadow” Web. Actually it’s not about that thread at all, except to note that Ian Davis asks just the right kind of questions when thinking about the WebArch claim that the Web ships with a hardcoded, timeless and built-in ontology, carving up the Universe between “information resources” and “non-information resources”. Various Talis folk are heading towards Bristol this week, so I expect we’ll pick up this theme offline again shortly! (Various other Talis folk – I’m happy to be counted as Talis person, even if they choose the worst picture of me for their blog :).

Anyhow, I made my peace with the TAG’s http-range-14 resolution long ago, and prefer the status quo to a situation where “/”-terminated namespaces are treated as risky and broken (as was the case pre-2005). But RDFa brings the “#” WebArch mess back to the forefront, since RDF and HTML can be blended within the same environment. Perhaps – reluctantly – we do need to revisit this perma-thread one last time. But not today! All I wanted to write about right now is the “shadow” metaphor. It crops up in Ian’s posts, and he cites Rob McCool’s writings. Since I’m unequiped with an IEEE login, I’m unsure where the metaphor originated. Ian’s usage is in terms of RDF creating a redundant and secondary structure that ordinary Web users don’t engage with, ie. a “shadow of the real thing”. I’m not going to pursue that point here, except to say I have sympathies, but am not too worried. GRDDL, RDFa etc help.

Instead, I’m going to suggest that we recycle the metaphor, since (when turned on its head) it gives an interesting metaphorical account of what the SW is all about. Like all 1-line metaphorical explanations of complex systems, the real value comes in picking it apart, and seeing where it doesn’t quote hold:

‘Web documents are the shadows cast on the Web by things in the world.

What do I mean here? Perhaps this is just pretentious, I’m not sure :) Let’s go back to the beginnings of the SW effort to picture this. In 1994 TimBL gave a Plenary talk at the first International WWW Conference. Amongst other things, he announced the creation of W3C, and described the task ahead of us in the Semantic Web community. This was two years before we had PICS, and three years before the first RDF drafts. People were all excited about Mosaic, to help date this. But even then, the description was clear, and well illustrated in a series of cartoon diagrams, eg:

TimBL 1994 Web semantics diagram

I’ve always liked these diagrams, and the words that went with them.

So much so that when I had the luxury of my own EU project to play with, they got reworked for us by Liz Turner: we made postcards and tshirts, which Libby delighted in sending to countless semwebbers to say “thanks!”. Here’s the postcard version:

SWAD-Europe postcard

The basic idea is just that Web documents are not intrinsically interesting things; what’s interesting, generally, is what they’re about. Web users don’t generally care much about sequences of unicode characters or bytes; we care about what they mean in our real lives. The objects they’re about, the agreements they describe, the real-world relationships and claims they capture. There is a Web of relationships in the world, describable in countless ways by countless people, and the information we put into the Web is just a pale shadow of that.

The Web according to TimBL, back in ’94:

To a computer, then, the web is a flat, boring world devoid of meaning. This is a pity, as in fact documents on the web describe real objects and imaginary concepts, and give particular relationships between them. For example, a document might describe a person. The title document to a house describes a house and also the ownership relation with a person. [...]

On this thinking, Web documents are the secondary thing; the shadow. What matters is the world and it’s mapping into digital documents, rather than the digital stuff alone. The shadow metaphor breaks down a little, if you think of the light source as something like the Sun, ie. with each real-world entity shadowed by a single (authoritative?) document in the Web. Life’s not like that, and the Web’s not like that either. Objects and relationships in the real world show up in numerous ways on the Web; or sometimes (thankfully) not at all. If Web documents can be thought of as shadows, they’re shadows cast in many lights, many colours, and by multiple independent light sources. Some indistinct, soft and flattering; occasionally frustratingly vague. Others bright, harshly precise and rigorously scientific (and correspondingly expensive and experty to use). But the core story is that it’s the same shared world that we’re seeing in all these different lights, and that the Web and the world are both richer because life is illustrated from multiple perspectives, and because the results can be visible to all.

The Semantic Web is, on this story, not a shadow of the real Web, but a story about how the Web is a shadow of the world. The Semantic Web is, in fact, much more like a 1970s disco than a shadow…

Symbol languages and the Semantic Web

OK so I just stumbled upon this…

Bomb in Baghdad

…via Jonathan Chetwynd’s ever-inventive and SVG-happy Peepo.com.

The “Car Bomb in Baghdad” story is from a site created by Widgit Software, and explains itself as follows:

Symbolworld has been set up to provide a web site with material suitable for symbol readers of all ages. The internet is an important medium which many people really like to use. Sadly there is very little material that is appropriate or accessible by people with learning difficulties.

The copyright statement for Symbolworld says “symbols on this page are copyright of the commercial owners. They may not be copied or used in any other format without the written permission from the owner.“, which initially struck me as a potential challenge to any use of this particular symbol-set for online communication. But I don’t really know this scene, and I guess this copyright could be just the same as the way eg. fonts are copyrighted.

So I don’t know much about this particular project/company/product, but it reminded me of some similar work I heard about a few years ago. Back when the EU, in their occasionally infinite wisdom, funded SWAD-Europe to run around talking to interesting people about standards and the Semantic Web (and giving them t-shirts) , Chaals organised a great developer workshop in Madrid on Image annotation. We had the usual fun with SVG and RDF (which btw I’m still betting on) and I got to learn a bit about CCF. Seeing the Baghdad example this morning reminded me of all this. I’ve been clicking around and trying to gather my sprawling thoughts.

CCF, the Concept Coding Framework, is kinda image annotation in reverse. Instead of focussing on the description of the content, concepts etc associated with images, the emphasis is on the use of images to illustrate some enumerated set of concepts. From Chaals’ workshop report re outcomes, I’m reminded that we discussed…

How to use Creative Commons and similar vocabularies to determine whether a particular symbol can be freely used (typically in commercial systems the symbols themselves are proprietary, which can be a major barrier to communication between people who have different systems).

CCF was using some variant of SKOS (another SWAD-Europe activity). This found its way into SKOS itself, where we now have prefSymbol and altSymbol relationships that associate a skos:Concept with a dcmitype:Image. Borrowing an illustration from the SKOS Core guide here:

skos symbol diagram

The guide also notes a distinction between symbolic labelling and “depiction” in the FOAF sense; some symbols are purely symbolic, and have no representational content.

So, catching up with this area of work a little, I find the Bliss Symbolics, the WWAAC project final report, and various other accessibility-related efforts. But I’ve not really figured out where I’d start if I wanted to build something simple using a freely available symbol-set, nor what the main options/projects currently are. But there’s plenty of reading, including pages from a recent Bliss “think tank” meeting.

The latest I can find on CCF is that it has moved sites and that there are some Web interfaces available, but “sorry – there are no downloads for this project yet.”. Ah, apparently the SYMBERED project is continuing development of CCF (aside: Bliss with swedish translation; doubly incomprehensible to me!). There is a nice example on their site showing the multilingual aspect to this work, as well as contrasting Bliss with a more representationally-oriented symbol set; see their site for details.

Here’s a simple example just in English:

I want coffee and milk and cookies.

In case anyone thinks this whole exploration is a bit niche and obscure, take a look at how people use MSN and other IM systems sometime. And of course the Jabber/XMPP guys have been exploring specs for standard emoticons. Chaals also points out some connections to VoiceXML where “there are a handful of options available in an interaction designed to be through voice, and developers will define assorted ways of recognising from a user’s speech which of the relevant concepts is being matched.”

We’re also only a few clicks away from the survey conducted by the dreamily named W3C Emotion incubator group into use cases and technologies for the description of emotion using markup languages.

If an RDF-ized Wordnet is also thrown into the mix, assigning URIs to Synsets, WordSenses, Words, I think we might actually be getting somewhere. The version of Wordnet in RDF published at W3C doesn’t currently use SKOS, although this was discussed, and of course there are other representations that make more use of RDF class hierarchies for nouns (at the expense of linguistic lossyness and losing non-noun content). Princeton’s original English-language Wordnet has spawned many related projects and translations, but as far as I know, there is little integration amongst them, and not all of the data is public or freely re-usable.

I once had a silly dream of taking a photo to go with each and every noun term in Wordnet. A kind of SemWeb I-Spy, a cousin to Immuexa’s FOAF bingo. Or better, of doing that with friends. The rise of Flickr and tagging means that we’d probably do this now by aggregate using Flickr and similar sites. But it seems conceivable to me that such an “illustrated wordnet” could be made, using either photo-oriented or symbol-oriented illustrations.

OK what might that buy us? Let’s try these two samples.

Not perfect, … but imho Wordnet gives a nice set of common and identifiable concepts that can be used as a hub for all kinds of different projects. And all we’d need is a huge pile of shared data shaped like this:

wordnet:word-cookie skos.prefLabel wikicommons:Explosion.svg .

OK it’s not going to bring about world peace and the return of esperanto, and of course there’s much more to language and communication than nouns and verbs (the easiest part of wordnet to turn into visual symbols), but it does strike me as a fun little (big) project…. Wordnet is too huge to be useful in every context where we’d want a modest-sized symbol set (eg. IM emoticons), … but it is nicely searchable, and would provide a framework for such subsets to evolve and be interconnected.