Public Skype RDF presence service

OK I don’t know how this works, or how it happens (other Asemantics people might know more), but for those who didn’t know:

At there is a public RDF/XML document reflecting my status in Skype. There seems to be one for every active account name in the system.

Example markup:

<Status rdf:about=”″>
<statusCode rdf:datatype=””>5</statusCode>
<presence xml:lang=”NUM”>5</presence>
<presence xml:lang=”en”>Do Not Disturb</presence>
<presence xml:lang=”fr”>Ne pas déranger</presence>
<presence xml:lang=”de”>Beschäftigt</presence>
<presence xml:lang=”ja”>取り込み中</presence>
<presence xml:lang=”zh-cn”>請勿打擾</presence>
<presence xml:lang=”zh-tw”>请勿打扰</presence>
<presence xml:lang=”pt”>Ocupado</presence>
<presence xml:lang=”pt-br”>Ocupado</presence>
<presence xml:lang=”it”>Occupato</presence>
<presence xml:lang=”es”>Ocupado</presence>
<presence xml:lang=”pl”>Nie przeszkadzać</presence>
<presence xml:lang=”se”>Stör ej</presence>

In general (expressed in FOAF terms), for any :OnlineAccount that has an :accountServiceHomepage of you can take the :accountName – let’s call it ?a and plug it into the URI Template{a}.xml to get presence information in curiously cross-cultural RDF. In other words, one’s Skype status is part of the public record on the Web, well beyond the closed P2P network of Skype IM clients.

Thinking about RDF vocabulary design and document formats, the Skype representation is roughly akin to FOAF documents (such as those on LiveJournal currently) that don’t indicate explicitly that they’re a :PersonalProfileDocument, nor say who is the :primaryTopic or :maker of the document. Passed the RDF/XML on its own, you don’t have enough context to know what it is telling you. Whereas, if you know the URI, and the URI template rule shown above, you have a better idea of the meaning of the markup. Still, it’s useful. I suspect it might be time to add foaf:skypeID as an inverse-functional (ie. uniquely identifying) property to the FOAF spec, to avoid longwinded markup and make it easier to bridge profile data and up-to-the-minute status data. Thoughts?

Google Social Graph API, privacy and the public record

I’m digesting some of the reactions to Google’s recently announced Social Graph API. ReadWriteWeb ask whether this is a creeping privacy violation, and danah boyd has a thoughtful post raising concerns about whether the privileged tech elite have any right to experiment in this way with the online lives of those who are lack status, knowledge of these obscure technologies, and who may be amongst the more vulnerable users of the social Web.

While I tend to agree with Tim O’Reilly that privacy by obscurity is dead, I’m not of the “privacy is dead, get over it” school of thought. Tim argues,

The counter-argument is that all this data is available anyway, and that by making it more visible, we raise people’s awareness and ultimately their behavior. I’m in the latter camp. It’s a lot like the evolutionary value of pain. Search creates feedback loops that allow us to learn from and modify our behavior. A false sense of security helps bad actors more than tools that make information more visible.

There’s a danger here of technologists seeming to blame those we’re causing pain for. As danah says, “Think about whistle blowers, women or queer folk in repressive societies, journalists, etc.”. Not everyone knows their DTD from their TCP, or understand anything of how search engines, HTML or hyperlinks work. And many folk have more urgent things to focus on than learning such obscurities, let alone understanding the practical privacy, safety and reputation-related implications of their technology-mediated deeds.

Web technologists have responsibilities to the users of the Web, and while media education and literacy are important, those who are shaping and re-shaping the Web ought to be spending serious time on a daily basis struggling to come up with better ways of allowing humans to act and interact online without other parties snooping. The end of privacy by obscurity should not mean the death of privacy.

Privacy is not dead, and we will not get over it.

But it does need to be understood in the context of the public record. The reason I am enthusiastic about the Google work is that it shines a big bright light on the things people are currently putting into the public record. And it does so in a way that should allow people to build better online environments for those who do want their public actions visible, while providing immediate – and sometimes painful – feedback to those who have over-exposed themselves in the Web, and wish to backpedal.

I hope Google can put a user support mechanism on this. I know from our experience in the FOAF community, even with small scale and obscure aggregators, people will find themselves and demand to be “taken down”. While any particular aggregator can remove or hide such data, unless the data is tracked back to its source, it’ll crop up elsewhere in the Web.

I think the argument that FOAF and XFN are particularly special here is a big mistake. Web technologies used correctly (posh – “plain old semantic html” in microformats-speak) already facilitate such techniques. And Google is far from the only search engine in existence. Short of obfuscating all text inside images, personal data from these sites is readily harvestable.

ReadWriteWeb comment:

None the less, apparently the absence of XFN/FOAF data in your social network is no assurance that it won’t be pulled into the new Google API, either. The Google API page says “we currently index the public Web for XHTML Friends Network (XFN), Friend of a Friend (FOAF) markup and other publicly declared connections.” In other words, it’s not opt-in by even publishers – they aren’t required to make their information available in marked-up code.

The Web itself is built from marked-up code, and this is a thing of huge benefit to humanity. Both microformats and the Semantic Web community share the perspective that the Web’s core technologies (HTML, XHTML, XML, URIs) are properly consumed both by machines and by humans, and that any efforts to create documents that are usable only by (certain fortunate) humans is anti-social and discriminatory.

The Web Accessibility movement have worked incredibly hard over many years to encourage Web designers to create well marked up pages, where the meaning of the content is as mechanically evident as possible. The more evident the meaning of a document, the easier it is to repurpose it or present it through alternate means. This goal of device-independent, well marked up Web content is one that unites the accessibility, Mobile Web, Web 2.0, microformat and Semantic Web efforts. Perhaps the most obvious case is for blind and partially sighted users, but good markup can also benefit those with the inability to use a mouse or keyboard. Beyond accessibility, many millions of Web users (many poor, and in poor countries) will have access to the Web only via mobile phones. My former employer W3C has just published a draft document, “Experiences Shared by People with Disabilities and by People Using Mobile Devices”. Last month in Bangalore, W3C held a Workshop on the Mobile Web in Developing Countries (see executive summary).

I read both Tim’s post, and danah’s post, and I agree with large parts of what they’re both saying. But not quite with either of them, so all I can think to do is spell out some of my perhaps previously unarticulated assumptions.

  • There is no huge difference in principle between “normal” HTML Web pages and XFN or FOAF. Textual markup is what the Web is built from.
  • FOAF and XFN take some of the guesswork out of interpreting markup. But other technologies (javascript, perl, XSLT/GRDDL) can also transform vague markup into more machine-friendly markup. FOAF/XFN simply make this process easier and less heuristic, less error prone.
  • Google was not the first search engine, it is not the only search engine, and it will not be the last search engine. To obsess on Google’s behaviour here is to mistake Google for the Web.
  • Deeds that are on the public record in the Web may come to light months or years later; Google’s opening up of the (already public, but fragmented) Usenet historical record is a good example here.
  • Arguing against good markup practice on the Web (accessible, device independent markup) is something that may hurt underprivileged users (with disabilities, or limited access via mobile, high bandwidth costs etc).
  • Good markup allows content to be automatically summarised and re-presented to suit a variety of means of interaction and navigation (eg. voice browsers, screen readers, small screens, non-mouse navigation etc).
  • Good markup also makes it possible for search engines, crawlers and aggregators to offer richer services.

The difference between Google crawling FOAF/XFN from LiveJournal, versus extracting similar information via custom scripts from MySpace, is interesting and important solely to geeks. Mainstream users have no idea of such distinctions. When LiveJournal originally launched their FOAF files in 2004, the rule they followed was a pretty sensible one: if the information was there in the HTML pages, they’d also expose it in FOAF.

We need to be careful of taking a ruthless “you can’t make an omelete without breaking eggs” line here. Whatever we do, people will suffer. If the Web is made inaccessible, with information hidden inside image files or otherwise obfuscated, we exclude a huge constituency of users. If we shine a light on the public record, as Google have done, we’ll embarass, expose and even potentially risk harm to the people described by these interlinked documents. And if we stick our head in the sand and pretend that these folk aren’t exposed, I predict this will come back to bite us in the butt in a few months or years, since all that data is out there, being crawled, indexed and analysed by parties other than Google. Parties with less to lose, and more to gain.

So what to do? I think several activities need to happen in parallel:

  • Best practice codes for those who expose, and those who aggregate, social Web data
  • Improved media literacy education for those who are unwittingly exposing too much of themselves online
  • Technology development around decentralised, non-public record communication and community tools (eg. via Jabber/XMPP)

Any search engine at all, today, is capable of supporting the following bit of mischief:

Take some starting point a collection of user profiles on a public site. Extract all the usernames. Find the ones that appear in the Web less than say 10,000 times, and on other sites. Assume these are unique userIDs and crawl the pages they appear in, do some heuristic name matching, … and you’ll have a pile of smushed identities, perhaps linking professional and dating sites, or drunken college photos to respectable-new-life. No FOAF needed.

The answer I think isn’t to beat up on the aggregators, it’s to improve the Web experience such that people can have real privacy when they need it, rather than the misleading illusion of privacy. This isn’t going to be easy, but I don’t see a credible alternative.

Open social networks: bring back Iran

Three years ago, we lost Iran from Internet community. I simplify somewhat, but forgivably. Many Iranian ISPs cut off access to blogs and social networking sites, on government order. At the time, Iran was one of the most active nations on Orkut; and Orkut was the network of choice, faster than the then-fading Friendster, but not yet fully eclipsed by MySpace. It provided a historically unprecedented chance for young people from Iran, USA, Europe and the world to hang out together in an online community. But when Orkut was blocked at the ISP level in Iran, pretty much nobody in the English-speaking blog-tech-pundit scene seemed to even notice. This continues to bug me. Web technologists apparantly care collectively more about freeing Robert Scoble’s addressbook from Facebook, than about the real potential for unmediated, uncensored, global online community.

Most folk in the US will never visit Iran, and vice-versa. And the press and government in both states are engaged in scary levels of sabre-rattling and demonisation. For me, one of the big motivations for working (through FOAF, SPARQL, XMPP and other technologies) on social networking interop, is so young people in the future can grow up naturally having friends in distant nations, regardless of whether their government thinks that’s a priority. If hundreds of blog posts can be written about the good Mr Scoble’s addressbook portability situation, why are thousands of posts not being written about the need for social networking tools to connect people regardless of nationality and national firewalls?

Some things are too important to leave to governments…

Update: a few hours after writing this, things get hairy in Hormuz.  Oof…

FOAF diagram (day 2)

FOAF diagram (day 2)
Originally uploaded by danbri

Another revision, after feedback from Ivan.

The original had “Thing” in italics (a convention I tried before adding in doap: dc: and sioc: references), to indicate it was from another namespace. I’ve now made that heritage explicit (although I suspect it might confuse, the idea is pretty central so worth explaining).

Originally I had both “tipjar” and “mbox” drawn as if they were literal properties, when both are relational. The later layout I’m using allows tipjar to be drawn without crossovers, so now only “mbox” is an oddity. I’ve used italics as a hint of that, but without explicit explanation.

A lot of the redundancy in the diagram comes from the property inverses, but I want to leave them in since they’re important to explain. The expanded key in the bottom-left now has a nerdy little explanation of “inverse property”. I also use the word “subClass” explicitly on the fat arrow, to tap into an OO heritage that might be floating around in people’s minds. In other words, I want people to realise that “a Person is an Agent is a Spatial thing is a thing” is what we’re getting at with those fat little arrows.

On Ivan’s suggestion I trimmed the property lists a little. Removing the non-jabber IM properties, and first/last name. The space earned from the latter was immediately spent by adding in bio:olb since it’s useful and I’ve the impression it’s widely used.

I think that’s about it for changes. Ivan suggested removing the DOAP and SIOC partys, but to my mind they are important because those vocabularies (and projects) elaborate on parts of FOAF which are important but otherwise neglected: the description of Projects, and of Users.

OK one more version. This has coloured blogs for the core FOAF classes. I think I made a mistake choosing light blue, since that is the colour code I’ve assigned to mean “inverse functional property”, nearly. Perhaps a light yellow?

Anyway here it is for now. Guess I should stop using Flickr for this really!

with colours

Update: I’ve put the files in svn (original OmniGraffle XML is the .graffle file; also some SVG output although I’m unsure the quality), and made another slight revision, this time focussing on the layout of the ends arrows for better readability; previously they were cluttered and the property directions were therefore hard to read.

Begin again

facebook grabThere was an old man named Michael Finnegan
He went fishing with a pinnegan
Caught a fish and dropped it in again
Poor old Michael Finnegan
Begin again.

Let me clear something up. Danny mentions a discussion with Tim O’Reilly about SemWeb themes.

Much as I generally agree with Danny, I’m reaching for a ten-foot bargepole on this one point:

While Facebook may have achieved pretty major adoption for their approach, it’s only very marginally useful because of their overly simplistic treatment of relationships.

Facebook, despite the trivia, the endless wars between the ninja zombies and the pirate vampires; despite being centralised, despite [insert grumble] is massively useful. Proof of that pudding: it is massively used. “Marginal” doesn’t come into it. The real question is: what happens next?

Imagine 35 million people. Imagine them marching thru your front room. Jumping off a table at the same time. Sending you an email. Or turning the tap off when they brush their teeth. 35 million is a fair-sized nation. Taking that 35 million figure I’ve heard waved around, and placing it in the ever scientific Wikipedia listing … that puts the land of Facebook somewhere between Kenya and Algeria in the population charts. Perhaps the figures are exagerrated. Perhaps a few million have wandered off, or forgotten their passwords. Doubtless some only use it every month or few.

Even a million is a lot of use; and a lot of usefulness.

Don’t let anything I ever say here in this blog be taken as claiming such sites and services are only marginally useful. To be used is to be useful; and that’s something SemWeb people should keep in the forefront of their minds. And usually they do, I think, although the community tends towards the forward-looking.

But let’s be backwards-looking for a minute. My concern with these sites is not that they’re marginally useful, but that they could be even more useful. Slight difference of emphasis. was great, back in 2000 when we started FOAF. But it was a walled garden. It had cool graph traversal stuff that evocatively showed your connection path to anyone else in the network. Their network. Then followed Friendster, which got slow as it proved useful to too many people. Ditto Orkut, which everyone signed up to, then wandered off from when it proved there was rather little to do there except add people. MySpace and Facebook cracked that one, … but guess what, there’ll be more.

I got a signup to Yahoo’s Mash yesterday. Anyone wanna be my friend? It has fun stuff (“Mecca Ibrahim smacked The Mash Pet (your Mash pet)!”), … wiki-like profile editing, extension modules … and I’d hope given that this is 2007, eventually some form of API. People won’t live in Facebook-land forever. Nor in Mash, however fun it is. I still lean towards Jabber/XMPP as the long-term infrastructure for this sort of system, but that’s for another time. The appeal of SixDegrees, of Friendster, of Orkut … wasn’t ever the technology. It was the people. I was there ‘cos others were there. Nothing more. And I don’t see this changing, no matter how much the underlying technology evolves. And people move around, drift along to the next shiny thing, … go wherever their friends are. Which is our only real problem here.

Begin again.

I’ve been messing with RDF a bit. I made a sample SPARQL query that asks (exported RDF from) a few networks about my IM addresses; here are the results from Redland/Rasqal JSON.

Loosly joined

find . -name danbri-\*.rdf -exec rapper –count {} \;

rapper: Parsing file ./facebook/danbri-fb.rdf
rapper: Parsing returned 2155 statements
rapper: Parsing file ./orkut/danbri-orkut.rdf
rapper: Parsing returned 848 statements
rapper: Parsing file ./dopplr/danbri-dopplr.rdf
rapper: Parsing returned 346 statements
rapper: Parsing file ./
rapper: Parsing returned 71 statements
rapper: Parsing file ./
rapper: Parsing returned 123 statements
rapper: Parsing file ./advogato/danbri-advogato.rdf
rapper: Parsing returned 18 statements
rapper: Parsing file ./livejournal/danbri-livejournal.rdf
rapper: Parsing returned 139 statements

I can run little queries against various descriptions of me and my friends, extracted from places in the Web where we hang out.

Since we’re not yet in the shiny OpenID future, I’m matching people only on name (and setting up the myfb: etc prefixes to point to the relevant RDF files). I should probably take more care around xml:lang, to make sure things match. But this was just a rough test…

FROM myfb:
FROM myorkut:
FROM dopplr:
GRAPH myfb: {[ a :Person; :name ?n; :depiction ?img ]}
GRAPH myorkut: {[ a :Person; :name ?n; :mbox_sha1sum ?hash ]}
GRAPH dopplr: {[ a :Person; :name ?n; :img ?i2]}

…finds 12 names in common across Facebook, Orkut and Dopplr networks. Between Facebook and Orkut, 46 names. Facebook and Dopplr: 34. Dopplr and Orkut: 17 in common. Haven’t tried the others yet, nor generated RDF for IM and Flickr, which I probably have used more than any of these sites. The Facebook data was exported using the app I described recently; the Orkut data was done via the CSV format dumps they expose (non-mechanisable since they use a CAPCHA), while the Dopplr list was generated with a few lines of Ruby and their draft API: I list as foaf:knows pairs of people who reciprocally share their travel plans., LiveJournal, and Advogato expose RDF/FOAF directly. Re Orkut, I noticed that they now have the option to flood your GTalk Jabber/XMPP account roster with everyone you know on Orkut. Not sure the wisdom of actually doing so (but I’ll try it), but it is worth noting that this quietly bridges a large ‘social network ing’ site with an open standards-based toolset.

For the record, the names common to my Dopplr, Facebook and Orkut accounts were: Liz Turner, Tom Heath, Rohit Khare, Edd Dumbill, Robin Berjon, Libby Miller, Brian Kelly, Matt Biddulph, Danny Ayers, Jeff Barr, Dave Beckett, Mark Baker. If I keep adding to the query for each other site, presumably the only person in common across all accounts will be …. me.

Querying Facebook in SPARQL

A fair few people have been asking about FOAF exporters from Facebook. I’m not entirely sure what else is out there, but Matthew Rowe has just announced a Facebook FOAF generator. It doesn’t dump all 35 million records into your Web browser, thankfully. But it will export a minimal description of you and your Facebook associates. At the moment, you get name, a photo URL, and (in this revision of the tool) a Facebook account name using FOAF’s OnlineAccount construct.

As an aside, this part of the FOAF design provides a way for identifiers from arbitrary services to be described in FOAF without special-purpose support. Some services have shortcut property names, eg. msnChatID and we may add more, but it is also important to allow this kind of freeform, decentralised identification. People shouldn’t have to petition the FOAF spec editors before any given Social Network site’s IDs can be supported; they can always use their own vocabulary alongside FOAF, or use the OnlineAccount construct as shown here.

I’ve saved my Facebook export on my Web site, working on the assumption that Facebook IDs are not private data. If people think otherwise, let me know and I’ll change the setup. We might also discuss whether even sharing the names and connectivity graph will upset people’s privacy expectations, but that’s for another day. Let me know if you’re annoyed!

Here is a quick SPARQL query, which simply asks for details of each person mentioned in the file who has an account on Facebook.

SELECT DISTINCT ?name, ?pic, ?id
[ a :Person;
:name ?name;
:depiction ?pic;
:holdsAccount [ :accountServiceHomepage <> ; :accountName ?id ]
ORDER BY ?name

I tested this online using Dave Beckett’s Rasqal-based Web service. It should return a big list of the first 200 people matched by the query, ordered alphabetically by name.

For “Web 2.0″ fans, SPARQL‘s result sets are essentially tabular (just like SQL), and have encodings in both simple XML and JSON. So whatever you might have heard about RDF’s syntactic complexity, you can forget it when dealing with a SPARQL engine.

Here’s a fragment of the JSON results from the above query:

"name" : { "type": "literal", "value": "Dan Brickley" },
"pic" : { "type": "uri", "value": "" },
"id" : { "type": "literal", "value": "624168" }
"name" : { "type": "literal", "value": "Dan Brickley" },
"pic" : { "type": "uri", "value": "" },
"id" : { "type": "literal", "value": "501730978" }
}, ...

What’s going on here? (a) Why are there two of me? (b) And why does it think that one of us has my Facebook FOAF file’s URL as a mugshot picture?

There’s no big mystery here. Firstly, there’s another guy who has the cheek to be called Dan Brickley. We’re friends on Facebook, even though we should probably be mortal enemies or something. Secondly, why does it give him the wrong URL for his photo? This is also straightforward, if a little technical. Basically, it’s an easily-fixed bug in this version of the FOAF exporter I used. When an image URL is not available, the convertor is still generating markup like “<foaf:depiction rdf:resource=””/>”. This empty URL is treated in RDF as the extreme case of a relative link, ie. the same kind of thing as writing “../../images/me.jpg” in a normal Web page. And since RDF is all about de-contextualising information, your RDF parser will try to resolve the relative link before passing the data on to storage or query systems (fiddly details are available to those that care). If the foaf:depiction property were simply ommitted when no photo was present, this problem wouldn’t arise. We’d then have to make the query a little more flexible, so that it still matched people even if there was no depiction, but that’s easy. I’ll show it next time.

I mentioned a couple of days ago that SPARQL is a query language with built-in support for asking questions about data provenance, ie. we can mix in “according to Facebook”, “according to Jabber” right into the WHERE clause of queries such as the one I show here. I’m not going to get into that today, but I will close with a visual observation about why that is important.

yasn map, borrowed from data junk, valleywag blog
To state the obvious, there’ll always be multiple Web sites where people hang out and socialise. A friend sent me this link the other day; a world map of social networks (thumbnail version copied here). I can’t vouch for the science behind it, but it makes the point that we risk fragmenting Web communities on geographic boundaries if we don’t bridge the various IM and YASN networks. There are lots of ways this can be done, each with different implications for user experience, business model, cost and practicality. But it has to happen. And when it does, we’ll be wanting ways of asking questions against aggregations from across these sites…

Symbol languages and the Semantic Web

OK so I just stumbled upon this…

Bomb in Baghdad

…via Jonathan Chetwynd’s ever-inventive and SVG-happy

The “Car Bomb in Baghdad” story is from a site created by Widgit Software, and explains itself as follows:

Symbolworld has been set up to provide a web site with material suitable for symbol readers of all ages. The internet is an important medium which many people really like to use. Sadly there is very little material that is appropriate or accessible by people with learning difficulties.

The copyright statement for Symbolworld says “symbols on this page are copyright of the commercial owners. They may not be copied or used in any other format without the written permission from the owner.“, which initially struck me as a potential challenge to any use of this particular symbol-set for online communication. But I don’t really know this scene, and I guess this copyright could be just the same as the way eg. fonts are copyrighted.

So I don’t know much about this particular project/company/product, but it reminded me of some similar work I heard about a few years ago. Back when the EU, in their occasionally infinite wisdom, funded SWAD-Europe to run around talking to interesting people about standards and the Semantic Web (and giving them t-shirts) , Chaals organised a great developer workshop in Madrid on Image annotation. We had the usual fun with SVG and RDF (which btw I’m still betting on) and I got to learn a bit about CCF. Seeing the Baghdad example this morning reminded me of all this. I’ve been clicking around and trying to gather my sprawling thoughts.

CCF, the Concept Coding Framework, is kinda image annotation in reverse. Instead of focussing on the description of the content, concepts etc associated with images, the emphasis is on the use of images to illustrate some enumerated set of concepts. From Chaals’ workshop report re outcomes, I’m reminded that we discussed…

How to use Creative Commons and similar vocabularies to determine whether a particular symbol can be freely used (typically in commercial systems the symbols themselves are proprietary, which can be a major barrier to communication between people who have different systems).

CCF was using some variant of SKOS (another SWAD-Europe activity). This found its way into SKOS itself, where we now have prefSymbol and altSymbol relationships that associate a skos:Concept with a dcmitype:Image. Borrowing an illustration from the SKOS Core guide here:

skos symbol diagram

The guide also notes a distinction between symbolic labelling and “depiction” in the FOAF sense; some symbols are purely symbolic, and have no representational content.

So, catching up with this area of work a little, I find the Bliss Symbolics, the WWAAC project final report, and various other accessibility-related efforts. But I’ve not really figured out where I’d start if I wanted to build something simple using a freely available symbol-set, nor what the main options/projects currently are. But there’s plenty of reading, including pages from a recent Bliss “think tank” meeting.

The latest I can find on CCF is that it has moved sites and that there are some Web interfaces available, but “sorry – there are no downloads for this project yet.”. Ah, apparently the SYMBERED project is continuing development of CCF (aside: Bliss with swedish translation; doubly incomprehensible to me!). There is a nice example on their site showing the multilingual aspect to this work, as well as contrasting Bliss with a more representationally-oriented symbol set; see their site for details.

Here’s a simple example just in English:

I want coffee and milk and cookies.

In case anyone thinks this whole exploration is a bit niche and obscure, take a look at how people use MSN and other IM systems sometime. And of course the Jabber/XMPP guys have been exploring specs for standard emoticons. Chaals also points out some connections to VoiceXML where “there are a handful of options available in an interaction designed to be through voice, and developers will define assorted ways of recognising from a user’s speech which of the relevant concepts is being matched.”

We’re also only a few clicks away from the survey conducted by the dreamily named W3C Emotion incubator group into use cases and technologies for the description of emotion using markup languages.

If an RDF-ized Wordnet is also thrown into the mix, assigning URIs to Synsets, WordSenses, Words, I think we might actually be getting somewhere. The version of Wordnet in RDF published at W3C doesn’t currently use SKOS, although this was discussed, and of course there are other representations that make more use of RDF class hierarchies for nouns (at the expense of linguistic lossyness and losing non-noun content). Princeton’s original English-language Wordnet has spawned many related projects and translations, but as far as I know, there is little integration amongst them, and not all of the data is public or freely re-usable.

I once had a silly dream of taking a photo to go with each and every noun term in Wordnet. A kind of SemWeb I-Spy, a cousin to Immuexa’s FOAF bingo. Or better, of doing that with friends. The rise of Flickr and tagging means that we’d probably do this now by aggregate using Flickr and similar sites. But it seems conceivable to me that such an “illustrated wordnet” could be made, using either photo-oriented or symbol-oriented illustrations.

OK what might that buy us? Let’s try these two samples.

Not perfect, … but imho Wordnet gives a nice set of common and identifiable concepts that can be used as a hub for all kinds of different projects. And all we’d need is a huge pile of shared data shaped like this:

wordnet:word-cookie skos.prefLabel wikicommons:Explosion.svg .

OK it’s not going to bring about world peace and the return of esperanto, and of course there’s much more to language and communication than nouns and verbs (the easiest part of wordnet to turn into visual symbols), but it does strike me as a fun little (big) project…. Wordnet is too huge to be useful in every context where we’d want a modest-sized symbol set (eg. IM emoticons), … but it is nicely searchable, and would provide a framework for such subsets to evolve and be interconnected.