In email hell

I’m tagging this ‘rdf’ so it finds it’s way to my various friends and collaborators in the RDF and SemWeb community via

Just a brief tale of this week’s woe, and a note on how to contact me while I fix things up.

Since leaving W3C at the end of last year, my email hosting has been moving around a bit. I have various addresses all of which have been forwarded to a single account, which I typically access via IMAP and sometimes SSH/mutt. For years, this was at ILRT, Uni Bristol. And then it was at W3C. Post-W3C I had my mail live on the Porklips/Postdiluvian community box, but decided to move over to Dreamhost since we’d never gotten IMAP working successfully for me on that machine, I was missing the mail-tagging capabilities of Thunderbird, .. and because I’d already bought Dreamhost hosting for some Web sites I run. So a few weeks back I flipped the switch and moved again, to Dreamhost.

What’s the big problem? In short, Spam. I get 1000s of junk mails daily, … plus a generally high amount of email from mailing lists of various kinds. Spamassassin has been my main tool for dealing with this, along with (at W3C) a rather large whitelist, which I’ve not managed to set-up again since.

On Dreamhost, I’ve been finding that if I turn my back for a couple of days, my mailbox is drowning in junk. And that the Spam Assassin config options I chose, even though supposedly quite picky, … seem to be letting loads of rubbish through. I’m not sure what I’m doing wrong, but it got to the stage that Thunderbird was so slow and unresponsive (due to the size of the junk-filled inbox) that it was barely ussable for deleting things, let alone reading/writing.

At which point a few days ago, I decide to give Opera a whirl. It has a good reputation, so I tried it. It spent a long time doing something, what exactly I’m not sure since Thunderbird doesn’t work so well any more. But it has a nice virtual folders look, with much mail apparently sorted by the mailing list it was sent to. I think this is ‘views’ based, but am rather unclear which mails are being moved down to this laptop where I’m running Opera, and which are via IMAP. And things are still pretty slow.

I need to count to 10, calm down, and figure out a cleaner way of handling all this. Either staying at Dreamhost or moving elsewhere (recommendations welcome!!). At the moment I don’t have procmail on my Dreamhost setup (due to confusion between 2 kinds of inbox offering there). If I can get whitelist-based protection up and running again, maybe I’ll be OK.

In the meantime, if you need to reach me. Try IM, or my increasingly used Gmail account (ie. danbrickley—at— gmail -dot- com). Please try to keep that address more or less safe from spam harvesters. I should try zapping the few places it does appear.

Sorry for any inconvenience. If you’ve sent me anything urgent or very important in last week or two, I’d be grateful if you could re-send it to danbrickley….at…

Oh, last thing to tie this back into RDF and justify the tag, … if anyone’s interested to throw around whitelist sharing ideas again, I’m all ears. There’s got to be a better way of doing things. I’ve spent all week getting a sinking feeling at the idea of even trying to open my inbox…

Who, what, where, when?

A “Who? what? where? when?” of the Semantic Web is taking shape nicely.

Danny Ayers shows some work with FOAF and the hCard microformat, picking up a theme first explored by Dan Connolly back in 2000: inter-conversion between RDF and HTML person descriptions. Danny generates hCards from SPARQL queries of FOAF, an approach which would pair nicely with GRDDL for going in the other direction.

Meanwhile at W3C, the closing days of the SW Best Practices Group have recently produced a draft of an RDF/OWL Representation of Wordnet. Wordnet is a fantastic resource, containing descriptions of pretty much every word in the English language. Anyone who has spent time in committees, deciding which terms to include in some schema/vocabulary, must surely feel the appeal of a schema which simply contains all possible words. There are essentially two approaches to putting Wordnet into the Semantic Web. A lexically-oriented approach, such as the one published at W3C for Wordnet 2.0, presents a description of words. It mirrors the structure of wordnet itself (verbs, nouns, synsets etc.). Consequently it can be a complete and unjudgemental reflection into RDF of all the work done by the Wordnet team.

The alternate, and complementary, approach is to explore ways of projecting the contents of Wordnet into an ontology, so that category terms (from the noun hierarchy) in Wordnet become classes in RDF. I made a simplistic approach at this some time back (see overview). It has appeal (alonside the linguistic version) because it allows RDF to be used to describe instances of classes for each concept in wordnet, with other properties of those instances. See WhyWordnetIsCool in the FOAF wiki for an example of Wordnet’s coverage of everyday objects.

So, getting Wordnet moving into the SW is great step. It gives us URIs to identify a huge number of everyday concepts. It’s coverage isn’t complete, and it’s ontology is kinda quirky. Aldo Gangemi and others have worked on tidying up the hierarchy; I believe only for version 1.6 of Wordnet so far. I hope that work will eventually get published at W3C or elsewhere as stable URIs we can all use.

In addition to Wordnet there are various other efforts that give types that can be used for the “what” of “who/what/where/when”. I’ve been talking with Rob McCool about re-publishing a version of the old TAP knowledge base. The TAP project is now closed, with Rob working for Yahoo and Guha at Google. Stanford maintain the site but aren’t working on it. So I’ve been working on a quick cleanup (wellformed RDF/XML etc.) of TAP that could get it into more mainstream use. TAP, unlike Wordnet, has more modern everyday commercial concepts (have a look), as well as a lot of specific named instances of these classes.

Which brings me to (Semantic) Wikipedia; another approach to identifying things and their types on the Semantic Web. A while back we added isPrimaryTopicOf to FOAF, to make it easier to piggyback on Wikipedia for RDF-identifying things that have Wiki (and other) pages about them. The Semantic Mediawiki project goes much much further in this direction, providing a rich mapping (classes etc.) into RDF for much of Wikipedia’s more data-oriented content. Very exciting, especially if it gets into the main codebase.

So I think the combination of things like Wordnet, TAP, Wikipedia, and instance-identifying strategies such as “isPrimaryTopicOf”, will give us a solid base for identifying what the things are that we’re describing in the Semantic Web.

And regarding. “Where?” and “when?” … on the UI front, we saw a couple of announcements recently: OpenLayers v1.0, which provides Google-maps-like UI functionality, but opensource and standards friendly. And for ‘when’, a similar offering: the timeline widget. This should allow for fancy UIs to be wired in with RDF calendar or RDF-geo tagged data.

Talking of which… good news of the week: W3C has just announced a Geo incubator group (see detailed charter), whose mission includes updates for the basic Geo (ie. lat/long etc) vocabulary we created in the SW Interest Group.

Ok, I’ve gone on at enough length already, so I’ll talk about SKOS another time. In brief – it fits in here in a few places. When extended with more lexical stuff (for describing terms, eg. multi-lingual thesauri) it could be used as a base for representing the lexically-oriented version of Wordnet. And it also fits in nicely with Wikipedia, I believe.

Last thing, don’t get me wrong — I’m not claiming these vocabs and datasets and bits of UI are “the” way to do ‘who/what/where/when’ on the Semantic Web. They’re each one of several options. But it is great to have a whole pile of viable options at last :)