Symbol languages and the Semantic Web

OK so I just stumbled upon this…

Bomb in Baghdad

…via Jonathan Chetwynd’s ever-inventive and SVG-happy Peepo.com.

The “Car Bomb in Baghdad” story is from a site created by Widgit Software, and explains itself as follows:

Symbolworld has been set up to provide a web site with material suitable for symbol readers of all ages. The internet is an important medium which many people really like to use. Sadly there is very little material that is appropriate or accessible by people with learning difficulties.

The copyright statement for Symbolworld says “symbols on this page are copyright of the commercial owners. They may not be copied or used in any other format without the written permission from the owner.“, which initially struck me as a potential challenge to any use of this particular symbol-set for online communication. But I don’t really know this scene, and I guess this copyright could be just the same as the way eg. fonts are copyrighted.

So I don’t know much about this particular project/company/product, but it reminded me of some similar work I heard about a few years ago. Back when the EU, in their occasionally infinite wisdom, funded SWAD-Europe to run around talking to interesting people about standards and the Semantic Web (and giving them t-shirts) , Chaals organised a great developer workshop in Madrid on Image annotation. We had the usual fun with SVG and RDF (which btw I’m still betting on) and I got to learn a bit about CCF. Seeing the Baghdad example this morning reminded me of all this. I’ve been clicking around and trying to gather my sprawling thoughts.

CCF, the Concept Coding Framework, is kinda image annotation in reverse. Instead of focussing on the description of the content, concepts etc associated with images, the emphasis is on the use of images to illustrate some enumerated set of concepts. From Chaals’ workshop report re outcomes, I’m reminded that we discussed…

How to use Creative Commons and similar vocabularies to determine whether a particular symbol can be freely used (typically in commercial systems the symbols themselves are proprietary, which can be a major barrier to communication between people who have different systems).

CCF was using some variant of SKOS (another SWAD-Europe activity). This found its way into SKOS itself, where we now have prefSymbol and altSymbol relationships that associate a skos:Concept with a dcmitype:Image. Borrowing an illustration from the SKOS Core guide here:

skos symbol diagram

The guide also notes a distinction between symbolic labelling and “depiction” in the FOAF sense; some symbols are purely symbolic, and have no representational content.

So, catching up with this area of work a little, I find the Bliss Symbolics, the WWAAC project final report, and various other accessibility-related efforts. But I’ve not really figured out where I’d start if I wanted to build something simple using a freely available symbol-set, nor what the main options/projects currently are. But there’s plenty of reading, including pages from a recent Bliss “think tank” meeting.

The latest I can find on CCF is that it has moved sites and that there are some Web interfaces available, but “sorry – there are no downloads for this project yet.”. Ah, apparently the SYMBERED project is continuing development of CCF (aside: Bliss with swedish translation; doubly incomprehensible to me!). There is a nice example on their site showing the multilingual aspect to this work, as well as contrasting Bliss with a more representationally-oriented symbol set; see their site for details.

Here’s a simple example just in English:

I want coffee and milk and cookies.

In case anyone thinks this whole exploration is a bit niche and obscure, take a look at how people use MSN and other IM systems sometime. And of course the Jabber/XMPP guys have been exploring specs for standard emoticons. Chaals also points out some connections to VoiceXML where “there are a handful of options available in an interaction designed to be through voice, and developers will define assorted ways of recognising from a user’s speech which of the relevant concepts is being matched.”

We’re also only a few clicks away from the survey conducted by the dreamily named W3C Emotion incubator group into use cases and technologies for the description of emotion using markup languages.

If an RDF-ized Wordnet is also thrown into the mix, assigning URIs to Synsets, WordSenses, Words, I think we might actually be getting somewhere. The version of Wordnet in RDF published at W3C doesn’t currently use SKOS, although this was discussed, and of course there are other representations that make more use of RDF class hierarchies for nouns (at the expense of linguistic lossyness and losing non-noun content). Princeton’s original English-language Wordnet has spawned many related projects and translations, but as far as I know, there is little integration amongst them, and not all of the data is public or freely re-usable.

I once had a silly dream of taking a photo to go with each and every noun term in Wordnet. A kind of SemWeb I-Spy, a cousin to Immuexa’s FOAF bingo. Or better, of doing that with friends. The rise of Flickr and tagging means that we’d probably do this now by aggregate using Flickr and similar sites. But it seems conceivable to me that such an “illustrated wordnet” could be made, using either photo-oriented or symbol-oriented illustrations.

OK what might that buy us? Let’s try these two samples.

Not perfect, … but imho Wordnet gives a nice set of common and identifiable concepts that can be used as a hub for all kinds of different projects. And all we’d need is a huge pile of shared data shaped like this:

wordnet:word-cookie skos.prefLabel wikicommons:Explosion.svg .

OK it’s not going to bring about world peace and the return of esperanto, and of course there’s much more to language and communication than nouns and verbs (the easiest part of wordnet to turn into visual symbols), but it does strike me as a fun little (big) project…. Wordnet is too huge to be useful in every context where we’d want a modest-sized symbol set (eg. IM emoticons), … but it is nicely searchable, and would provide a framework for such subsets to evolve and be interconnected.

Who, what, where, when?

A “Who? what? where? when?” of the Semantic Web is taking shape nicely.

Danny Ayers shows some work with FOAF and the hCard microformat, picking up a theme first explored by Dan Connolly back in 2000: inter-conversion between RDF and HTML person descriptions. Danny generates hCards from SPARQL queries of FOAF, an approach which would pair nicely with GRDDL for going in the other direction.

Meanwhile at W3C, the closing days of the SW Best Practices Group have recently produced a draft of an RDF/OWL Representation of Wordnet. Wordnet is a fantastic resource, containing descriptions of pretty much every word in the English language. Anyone who has spent time in committees, deciding which terms to include in some schema/vocabulary, must surely feel the appeal of a schema which simply contains all possible words. There are essentially two approaches to putting Wordnet into the Semantic Web. A lexically-oriented approach, such as the one published at W3C for Wordnet 2.0, presents a description of words. It mirrors the structure of wordnet itself (verbs, nouns, synsets etc.). Consequently it can be a complete and unjudgemental reflection into RDF of all the work done by the Wordnet team.

The alternate, and complementary, approach is to explore ways of projecting the contents of Wordnet into an ontology, so that category terms (from the noun hierarchy) in Wordnet become classes in RDF. I made a simplistic approach at this some time back (see overview). It has appeal (alonside the linguistic version) because it allows RDF to be used to describe instances of classes for each concept in wordnet, with other properties of those instances. See WhyWordnetIsCool in the FOAF wiki for an example of Wordnet’s coverage of everyday objects.

So, getting Wordnet moving into the SW is great step. It gives us URIs to identify a huge number of everyday concepts. It’s coverage isn’t complete, and it’s ontology is kinda quirky. Aldo Gangemi and others have worked on tidying up the hierarchy; I believe only for version 1.6 of Wordnet so far. I hope that work will eventually get published at W3C or elsewhere as stable URIs we can all use.

In addition to Wordnet there are various other efforts that give types that can be used for the “what” of “who/what/where/when”. I’ve been talking with Rob McCool about re-publishing a version of the old TAP knowledge base. The TAP project is now closed, with Rob working for Yahoo and Guha at Google. Stanford maintain the site but aren’t working on it. So I’ve been working on a quick cleanup (wellformed RDF/XML etc.) of TAP that could get it into more mainstream use. TAP, unlike Wordnet, has more modern everyday commercial concepts (have a look), as well as a lot of specific named instances of these classes.

Which brings me to (Semantic) Wikipedia; another approach to identifying things and their types on the Semantic Web. A while back we added isPrimaryTopicOf to FOAF, to make it easier to piggyback on Wikipedia for RDF-identifying things that have Wiki (and other) pages about them. The Semantic Mediawiki project goes much much further in this direction, providing a rich mapping (classes etc.) into RDF for much of Wikipedia’s more data-oriented content. Very exciting, especially if it gets into the main codebase.

So I think the combination of things like Wordnet, TAP, Wikipedia, and instance-identifying strategies such as “isPrimaryTopicOf”, will give us a solid base for identifying what the things are that we’re describing in the Semantic Web.

And regarding. “Where?” and “when?” … on the UI front, we saw a couple of announcements recently: OpenLayers v1.0, which provides Google-maps-like UI functionality, but opensource and standards friendly. And for ‘when’, a similar offering: the timeline widget. This should allow for fancy UIs to be wired in with RDF calendar or RDF-geo tagged data.

Talking of which… good news of the week: W3C has just announced a Geo incubator group (see detailed charter), whose mission includes updates for the basic Geo (ie. lat/long etc) vocabulary we created in the SW Interest Group.

Ok, I’ve gone on at enough length already, so I’ll talk about SKOS another time. In brief – it fits in here in a few places. When extended with more lexical stuff (for describing terms, eg. multi-lingual thesauri) it could be used as a base for representing the lexically-oriented version of Wordnet. And it also fits in nicely with Wikipedia, I believe.

Last thing, don’t get me wrong — I’m not claiming these vocabs and datasets and bits of UI are “the” way to do ‘who/what/where/when’ on the Semantic Web. They’re each one of several options. But it is great to have a whole pile of viable options at last :)