Data syndication

There have been various developments in the last week, via Planet RDF, on the topic of data syndication using RSS/Atom.

Edd Dumbill on iTunes RSS extensions; a handy review of the extensions they’ve added to support a “podcasting” directory. See also comments from Danny.

Nearby in the Web, Yahoo! and friends are still busy with their their media RSS spec, which lives on the rss-media yahoogroups list. Yahoo are also looking creative on other fronts. Today’s Yahoo! Search blog has an entry on Yahoo! Maps, which again uses RSS extensions to syndicate map-related data:

The Yahoo! Maps open API is based on geoRSS, a RSS 2.0 with w3c geo extension. For more information check out developer.yahoo.net/maps. We also offer API support via a group forum at yws-maps.

This is particularly interesting to me, as they’re picking up the little geo: namespace I made with collaborators from the W3C Semantic Web (formerly RDF) Interest Group.

Although the namespace was designed to be used in RDF, they’re using it in non-RDF RSS2 documents. This is a little dissapointing, since it makes the data less available to RDF processors. RSS1, of course, was designed as an RDF application specifically to support such data-centric extensions. Yahoo! have some developer pages with more detail, but they seem to have picked up where the Worldkit Flash RSS geocoding project left off. Worldkit attaches geo:lat and geo:long information to RSS2 items, and can display these in a Flash-based UI.

The WGS 84 geo vocabulary used here by Worldkit and Yahoo! was a collaborative experiment in minimalism. The GIS world has some very rich, sophisticated standards. The idea with the geo: namespace was to take the tiniest step towards reflecting that world into the RDF data-merging environment. RDF is interesting precisely because it allows for highly mixed, yet predictably structured, data mixing.

So that little geo namespace experiment (thanks to the efforts of various geo/mapping hackers, most of whom aren’t very far in FOAF space from Jo Walsh) seems to be proving its worth. A little bit of GIS can go a very long way.

I should at this point stress that the “w3c geo extension” (as Yahoo!’s search blog calls it” is an informal, pre-standardisation piece of work. This is important to stress, particularly given that the work is associated with W3C, both through hosting of the namespace document and because it came about as a collaboration of theW3C RDF/SW Interest Group. If it were the product of a real W3C Working Group, it would have received much more careful review. Someone might have noticed the inadquate (or non-) definition of geo:alt, for example!

I’m beginning to think that a small but more disciplined effort at W3C around RDF vocabulary for geo/mapping (with appropriate liasion to GIS standards) would be timely. But I digress. I was talking about data syndication. Going back to the Yahoo! maps example… they have taken RSS2, and the SWIG Geo vocab, and added a number of extensions that relate to their mapping interface (image, zoom etc), as well as Address, CityState, Zip, Country code, etc. Useful entities to have
markup for. In an RDF environment I’d probably have used vCard for those.

Yahoo aren’t the only folk getting creative with RSS this week. Microsoft have published some information on their work, including some draft proposals for extending RSS with “lists”. This, again, brings an emphasis back to RSS for data syndication. In other words, RSS documents as a carrier for arbitrary other information whose dissemination fits the syndication/publication model of RSS. Some links on the Microsoft proposal are on scoble’s blog. See Microsoft’s RSS in Longhorn page for details and a link to their
Simple List Extensions specification, which seems to focus on allowing RSS feeds to be presented as lists of items, including use of datatyped (and hence sortable) extensions.

British Board of Film Classification RSS feeds and Movie metadata

The BBFC have several RSS feeds on their site, carrying information about their judgements on various cinematic works for a UK audience. Recent film decisions, recent adult (sex) videos and films, etc. Each entry in the feed points to a descriptive page and summarises a BBFC judgement in a simple textual description, eg. “The BFC gave the English language video LES PERVERSIONS 5 a rating of R18 on Thu, 10 Feb. Consumer advice is not supplied for R18 titles. the video is directed by Sineplex.“.

While their adult feed is interesting in the context of the debates around Web filtering etc., the mainstream feed is also interesting. It has textual information about sex, violence, drugs etc., which could easily be exposed in machine-processable form if they’d used RSS 1.0 + IRCA/RDF labels. Both make the semantic web point about data-reuse – since they can be used for finding things as much as for not finding things.

The BBFC gave the English language film TABLOID a rating of 18 on Fri, 28 Jan. This film contains STRONG SEX, VIOLENCE, LANGUAGE AND DRUG USE. The film is directed by David Blair. The cast includes Matthew Rhys, Mary Elizabeth Mastrantonio, David Soul, John Hurt, Stephen Tompkinson, Art Malik, Dani Behr, Keith Chegwin, Ainsley Harriott, Gail Porter, Beverley Callard, Les Dennis, Danny Dyer, James Hewitt, Freddie Jones, Vicky Holloway, Vikki Thomas and Anna Kumble.

I’ve been thinking about how FOAF could better support recommendation systems, eg. around MusicBrainz for music, or systems like MindSwap’s FilmTrust for movies. For movies, one core issue is quite simple: providing unique identifiers for films (direct or indirect, eg. via a page that has some film as it’s primary topic). BBFC or IMDB pages, or movie homepages, could serve such a purpose. Unfortunately, the world of movies doesn’t yet have a good open-content licensed database, unlike music, where we have MusicBrainz. Until we agree on some tricks for identifying things like movies (and actors, …), we won’t get the data integration needed to have a really rich Web-wide movie review system.

We will eventually, I am sure, see a framework in which various sites aggregate and syndicate such opinions, either numerical ratings or (more likely I think) textual reviews. Often I’m quite interested to see how a movie was perceived by people I disagree with, or have never met. The CapAlert site is often entertaining, for example. All these sources (as well as smaller community datasets) will be mixed together in a metadata marketplace. Information that some people use for filtering, blocking and avoiding will be used by others for searching, browsing and discovery. It’s just a matter of time before we’ll be using W3C’s new SPARQL technology to query BBFC judgement feeds, FOAF+review data from sites like like Filmtrust and other weblog-based data sources… Anyhow, definitely check out the Filmtrust site if you’re interested in movie metadata and ratings.