RSS/Atom


Is this crazy or useful? Am not sure yet.

This example uses FOAF vocabulary for groups and openid. So the basic structure here is that Agents (including persons) can have an :openid and can be a :member of a :Group.

From an openid-augmented Wordpress, we get a list of all the openids my blog knows about. From an openid-augmented MediaWiki, we get a list of all the openids that contribute to the FOAF project wiki. I dumped each into a basic RDF file (not currently an automated process). But the point here is to explore enumerated groups using queries.

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns=”http://xmlns.com/foaf/0.1/”>
<Group rdf:about=’#both’>
<!– enumerated membership –>
<member><Agent><openid rdf:resource=’http://danbri.org/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://tommorris.org/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://kidehen.idehen.net/dataspace/person/kidehen’/></Agent></member>
<member><Agent><openid rdf:resource=’http://www.wasab.dk/morten/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://kronkltd.net/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://www.kanzaki.com/’/></Agent></member>

<!– rule-based membership –>

<constructor><![CDATA[
PREFIX : <http://xmlns.com/foaf/0.1/>
CONSTRUCT {
<http://danbri.org/yasns/danbri/both.rdf#thegroup> a :Group; :member [ a :Agent; :openid ?id ]
}
WHERE {
GRAPH <http://wiki.foaf-project.org/_users.rdf> { [ a :Group; :member [ a :Agent; :openid ?id ]. ] }
GRAPH <http://danbri.org/yasns/danbri/_group.rdf> { [ a :Group; :member [ a :Agent; :openid ?id ]. ] }
}
]]></constructor>
</Group>
</rdf:RDF>

This RDF description does it both ways. It enumerates (for simple clients) a list of members of a group whose members are those individuals that are both commentators on my blog, and contributors to the FOAF wiki. At least, to the extent they’re detectable via common use of OpenID URIs. But the RDF group description also embeds a SPARQL query, the kind which generates RDF rather than an SQL-like resultset. The RDF essentially regenerates the enumerated list, assuming the query is run against an RDF dataset with the data graphs appropriately populated.

Now I sorta like this, and I sorta don’t. It may be incredibly powerful, or it may be a bit to clever for its own good.

Certainly there’s scope overlap with the W3C RIF rules work, and with the capabilities of OWL. FOAF has long contained an experimental method for using OWL to do something similar, but it hasn’t found traction. The motivation I have here for trying SPARQL here is that it has built-in machinery for talking about the provenance of data; so I could write a group description this way that says “members are anyone listed as a colleague in http://myworkplace.example.com/stafflist.rdf”. Or I could mix in arbitrary descriptive vocabularies; family tree stuff, XFN, language abilities (speaks-reads-writes) etc.

Where I think this could fall down is in the complexity of the workflow. The queries need executing against some SPARQL installation with a configured dataset, and the query lists URIs of data graphs. But I doubt database admins will want to randomly load any/every RDF file mentioned in these shared queries. Perhaps something like SparqlPress, attached to one’s weblog, and social filters to load only files in queries eg. from friends? Also, authoring these kinds of query isn’t something non-geek users are going to do often, and the sorts of queries that will work will depend of course on the data actually available. Sure I could write a query based on matching the openids of former colleagues, but the group will be empty unless the data listing people as former colleagues is actually out there and in the Web, and written in the terms anticipated by the query.

On the other hand, this mechanism does appeal, and could go way beyond FOAF group definitions. We could see a model where people post data in the Web but also post queries, eg. revisiting the old work Libby and I explored around RSS query. On the other other hand, who wants to make their Web queries public? All that said, the same goes for the data being queried. And since this technique embeds queries inside ordinary RDF data, however we deal with the data visibility issue for RDF/FOAF should also work for the query stuff. Perhaps. Can’t blame me for trying…
I realise this isn’t the clearest of explanations. Let’s try again:

RDF is normally for publishing collections of simple claims about the world. This is an experiment in embedding data-generating-queries amongst these claims, where the query is configured to output more RDF claims (aka statements, triples etc), but only when executed against some appropriate body of RDF data. Since the query is written in SPARQL, it allows the data-generation rules to mention interesting things, such as properties of the source of the data being queried.

This particular experiment is couched in terms of FOAF’s “Group” construct, but the technique is entirely general. The example above defines a group of agents called the “both” group, by saying that an Agent is in that group if it its OpenID URI is listed in each of two RDF documents specified, ie. both a commentator on my blog, and a contributor to the FOAF Wiki. Other examples could be “(fe)male employees” or “family members sharing a blood type” or in fact, any descriptive pattern that can match against the data to hand and be expressed in SPARQL.

The PlanetPlanet feed reader (and the Venus variant) exposes its blogroll via RDF/FOAF, typically at “/foafroll.xml” URIs. I ran through the list of Planet installations from the main site, and found the following, which might be interesting for experimentation, crawling, whitelist work etc. Or you could just make a giant feedlist and install Venus yourself, composing your own meta selection from the feeds described in these files.

http://widgetarians.org/foafroll.xml
http://www.planetapache.org/foafroll.xml
http://www.beclan.org/aggregator/foafroll.xml
http://planet.classpath.org/foafroll.xml
http://www.debian.org.hk/planet/foafroll.xml
http://planet.hellug.gr/foafroll.xml
http://planet.freedesktop.org/foafroll.xml
http://planet.humbug.org.au/foafroll.xml
http://planet.gnome-ev.de/foafroll.xml
http://gstreamer.freedesktop.org/planet/foafroll.xml
http://planet.jabber.org/foafroll.xml
http://planet.mozilla.org/foafroll.xml
http://planet.foss.org.my/foafroll.xml
http://planet.go-oo.org/foafroll.xml
http://planet.perl.org/foafroll.xml
http://www.planetpython.org/foafroll.xml
http://planet.slug.org.au/foafroll.xml
http://planetsun.org/foafroll.xml
http://www.planetsuse.org/foafroll.xml
http://planet.twistedmatrix.com/foafroll.xml
http://advocacydev.org/blogs/foafroll.xml
http://planet.arslinux.com/foafroll.xml
http://fossplanet.com/foafroll.xml
http://indyblogs.protest.net/foafroll.xml
http://www.cs.princeton.edu/~mp/malayalam/blogs/foafroll.xml
http://planet.mozillazine.org/foafroll.xml

http://planetjava.org/foafroll.xml # bad xml
http://planetkde.org/foafroll.xml

http://www.planet-im.com/foafroll.xml # no feed urls
http://planet.linux.net.mk/foafroll.xml

The Widgetarians.org feed aggregator is now running Venus, Sam Ruby’s reworking of the Planet codebase. I had been getting sporadic “KeyError” errors; with the upgrade, that’s gone away. Many thanks to Sam Ruby, Scott James Remnant and Jeff Waugh for some handy software :)

OK this is old news, but pretty cool so I’m happy to write it up belatedly.

I just logged into MSN chat, and was greeted by Mario Menti’s IM bot, which provides a text-chat UI for navigating the BBC’s news feeds from their Persian service. I’m pasting the output here, hoping it’ll display reasonably. I can’t read a word of it of course, but remember Ian Forrester’s XTech talk a few years back about the headaches for getting I18N right for such feeds (and the varying performance of newsreader clients with right-to-left and mixed direction text). This hack came out of a conversation with Mario and Ian around the BBC Backstage scene, and from comments from a couple of friends in Tehran, this sort of technology direction is much appreciated by those whose news access is restricted. The bot is called bbcpersian at hotmail.co.uk, and seems to still be running 18 months later. See also some more recent hacks from Mario that wire up BBC feeds to twitter.

BBC Persian News Flash says: (23:01:02)

Hi, this is your hourly BBCPersian.com news flash with the 10 most recent new items
1 افزایش نیروها در عراق ‘درحال نتیجه دادن است’
2 انتقاد شدید کروبی از ‘مخالفان احزاب’
3 نواز شریف از پاکستان اخراج شد
4 بازداشت یکی از ‘قاچاقچیان بزرگ’ کلمبیا
5 ترکیه: کشورهای منطقه از اقدامات تنش زا دوری کنند
6 ‘عاشقان قلندر’ جشنواره ای دیگر برپا کردند
7 کاهش ساعت کار ادارات دولتی ایران در ماه رمضان
8 ‘عراقیها احساس امنیت بیشتری نمی کنند’
9 نواز شریف از پاکستان اخراج شد
10 شرکت مردم گواتمالا در انتخابات این کشور

Reply with number 1 to 10 to see more information, or any other message if you want to stop receiving these news flashes

Anyone know what the state of the art is with IM-based feed readers? or have a wishlist?

Just renewed my Flickr-Pro account for 2 years, ensuring an irregular supply of pigeon, fish and other misc depictions.

I wasn’t 100% happy with the wording of their terms though.

To participate in Flickr pro, you must have a valid Yahoo! ID and, solely if you have not received a free offer or gift for a specific number of days of Flickr pro (“Free pro Period”), you will also need to provide other information, such as your credit card and billing information (your “Registration Data”). If you do not have a Yahoo! ID, you will be prompted to complete the registration process for it before you can register for Flickr pro. In consideration of your use of Flickr pro, you agree to: (a) provide true, accurate, current and complete information about yourself and (b) maintain and promptly update the Registration Data to keep it true, accurate, current and complete. If you provide any information that is untrue, inaccurate, not current or incomplete, or Flickr has reasonable grounds to suspect that such information is untrue, inaccurate, not current or incomplete, Flickr has the right to suspend or terminate your account and delete any information or content therein without liability to Flickr.

The “provide true, accurate, current and complete information about yourself” is only contextually limited to “credit card” and “billing information”; it could also plausibly be read as covering the more general Flickr user profile, on which I’ve every right to omit various bits of information (Missing isn’t broken). The billing system also let me have the choice of storing credit card info or re-entering it again next time it’s used. So it isn’t really clear what they’re asking for here. If my buddy icon doesn’t show enough grey hair, is that inaccurate? :) I guess they’re really focussed on contact details, in which case, it’s best to say so.

I signed up anyways. The Flickr API and the RDF-oriented Perl backup library make it a more reliable option for my photos than my own little Ruby scripts ever were. Back 2-3 years ago I maintained the fantasy that I’d manage my own photos and their metadata; the big reason I switched to Flickr was the commenting/social side. It’s just too hard for per-person sites to maintain that level of interactivity and community (unless you’re super famous or beautiful or both). And a photo site without comments and community, for me, is kind of boring. For decentralists … perhaps some combination of extended RSS feed plus OpenID for comments could come closer these days; but before OpenID, I couldn’t ever see a way for commenting and annotation to be massmarket-friendly in a decentralised manner. And, well, also no need to be grudging: Flickr is a great product. I’ve definitely had my money’s worth…

Handy article, “Towards Open Source Flash Development” by Carlos Rovira.

Background to looking at this is some great news: Mikel Maron is open-sourcing the WorldKit system, a lightweight Flash/SWF-based Web mapping application. So I’m interested to find some open source tools that would allow me to rebuild it from source.

I also wonder whether SVG hackers might be interested to port some of it to SVG/Javascript. WorldKit supports geo/rss location tagging, so I’m also curious about what it’d take to get full RDF support in there. Has anybody made an RDF parser for SWF/Flash yet?

As a contrast to the GML/KML and Google-related posts, here is an annotated Yahoo! map, derrived from geo-extended RSS 2.0 markup. I tried feeding the service a variant of RSS 1.0 last week (albeit with the Yahoo! extensions implicitly in the RSS namespace) and it seemed to work. They don’t yet have worldwide coverage, unfortunately. [via flickr thread]

There have been various developments in the last week, via Planet RDF, on the topic of data syndication using RSS/Atom.

Edd Dumbill on iTunes RSS extensions; a handy review of the extensions they’ve added to support a “podcasting” directory. See also comments from Danny.

Nearby in the Web, Yahoo! and friends are still busy with their their media RSS spec, which lives on the rss-media yahoogroups list. Yahoo are also looking creative on other fronts. Today’s Yahoo! Search blog has an entry on Yahoo! Maps, which again uses RSS extensions to syndicate map-related data:

The Yahoo! Maps open API is based on geoRSS, a RSS 2.0 with w3c geo extension. For more information check out developer.yahoo.net/maps. We also offer API support via a group forum at yws-maps.

This is particularly interesting to me, as they’re picking up the little geo: namespace I made with collaborators from the W3C Semantic Web (formerly RDF) Interest Group.

Although the namespace was designed to be used in RDF, they’re using it in non-RDF RSS2 documents. This is a little dissapointing, since it makes the data less available to RDF processors. RSS1, of course, was designed as an RDF application specifically to support such data-centric extensions. Yahoo! have some developer pages with more detail, but they seem to have picked up where the Worldkit Flash RSS geocoding project left off. Worldkit attaches geo:lat and geo:long information to RSS2 items, and can display these in a Flash-based UI.

The WGS 84 geo vocabulary used here by Worldkit and Yahoo! was a collaborative experiment in minimalism. The GIS world has some very rich, sophisticated standards. The idea with the geo: namespace was to take the tiniest step towards reflecting that world into the RDF data-merging environment. RDF is interesting precisely because it allows for highly mixed, yet predictably structured, data mixing.

So that little geo namespace experiment (thanks to the efforts of various geo/mapping hackers, most of whom aren’t very far in FOAF space from Jo Walsh) seems to be proving its worth. A little bit of GIS can go a very long way.

I should at this point stress that the “w3c geo extension” (as Yahoo!’s search blog calls it” is an informal, pre-standardisation piece of work. This is important to stress, particularly given that the work is associated with W3C, both through hosting of the namespace document and because it came about as a collaboration of theW3C RDF/SW Interest Group. If it were the product of a real W3C Working Group, it would have received much more careful review. Someone might have noticed the inadquate (or non-) definition of geo:alt, for example!

I’m beginning to think that a small but more disciplined effort at W3C around RDF vocabulary for geo/mapping (with appropriate liasion to GIS standards) would be timely. But I digress. I was talking about data syndication. Going back to the Yahoo! maps example… they have taken RSS2, and the SWIG Geo vocab, and added a number of extensions that relate to their mapping interface (image, zoom etc), as well as Address, CityState, Zip, Country code, etc. Useful entities to have
markup for. In an RDF environment I’d probably have used vCard for those.

Yahoo aren’t the only folk getting creative with RSS this week. Microsoft have published some information on their work, including some draft proposals for extending RSS with “lists”. This, again, brings an emphasis back to RSS for data syndication. In other words, RSS documents as a carrier for arbitrary other information whose dissemination fits the syndication/publication model of RSS. Some links on the Microsoft proposal are on scoble’s blog. See Microsoft’s RSS in Longhorn page for details and a link to their
Simple List Extensions specification, which seems to focus on allowing RSS feeds to be presented as lists of items, including use of datatyped (and hence sortable) extensions.

The BBFC have several RSS feeds on their site, carrying information about their judgements on various cinematic works for a UK audience. Recent film decisions, recent adult (sex) videos and films, etc. Each entry in the feed points to a descriptive page and summarises a BBFC judgement in a simple textual description, eg. “The BFC gave the English language video LES PERVERSIONS 5 a rating of R18 on Thu, 10 Feb. Consumer advice is not supplied for R18 titles. the video is directed by Sineplex.“.

While their adult feed is interesting in the context of the debates around Web filtering etc., the mainstream feed is also interesting. It has textual information about sex, violence, drugs etc., which could easily be exposed in machine-processable form if they’d used RSS 1.0 + IRCA/RDF labels. Both make the semantic web point about data-reuse - since they can be used for finding things as much as for not finding things.

The BBFC gave the English language film TABLOID a rating of 18 on Fri, 28 Jan. This film contains STRONG SEX, VIOLENCE, LANGUAGE AND DRUG USE. The film is directed by David Blair. The cast includes Matthew Rhys, Mary Elizabeth Mastrantonio, David Soul, John Hurt, Stephen Tompkinson, Art Malik, Dani Behr, Keith Chegwin, Ainsley Harriott, Gail Porter, Beverley Callard, Les Dennis, Danny Dyer, James Hewitt, Freddie Jones, Vicky Holloway, Vikki Thomas and Anna Kumble.

I’ve been thinking about how FOAF could better support recommendation systems, eg. around MusicBrainz for music, or systems like MindSwap’s FilmTrust for movies. For movies, one core issue is quite simple: providing unique identifiers for films (direct or indirect, eg. via a page that has some film as it’s primary topic). BBFC or IMDB pages, or movie homepages, could serve such a purpose. Unfortunately, the world of movies doesn’t yet have a good open-content licensed database, unlike music, where we have MusicBrainz. Until we agree on some tricks for identifying things like movies (and actors, …), we won’t get the data integration needed to have a really rich Web-wide movie review system.

We will eventually, I am sure, see a framework in which various sites aggregate and syndicate such opinions, either numerical ratings or (more likely I think) textual reviews. Often I’m quite interested to see how a movie was perceived by people I disagree with, or have never met. The CapAlert site is often entertaining, for example. All these sources (as well as smaller community datasets) will be mixed together in a metadata marketplace. Information that some people use for filtering, blocking and avoiding will be used by others for searching, browsing and discovery. It’s just a matter of time before we’ll be using W3C’s new SPARQL technology to query BBFC judgement feeds, FOAF+review data from sites like like Filmtrust and other weblog-based data sources… Anyhow, definitely check out the Filmtrust site if you’re interested in movie metadata and ratings.