Google boost Jabber + VOIP, Skype releases IM toolkit, Jabber for P2P SPARQL?

Interesting times for the personal Semantic Web: “Any client that supports Jabber/XMPP can connect to the Google Talk service” Google Talk and Open Communications. It does voice calls too, using “a custom XMPP-based signaling protocol and peer-to-peer communication mechanism. We will fully document this protocol. In the near future, we plan to support SIP signaling.”

Meanwhile, Skype (the P2P-based VOIP and messaging system) have apparently released developer tools for their IM system. From a ZDNet article:

“Skype to wants to embrace the rest of Internet,” Skype co-founder Janus Friis said during a recent interview.

He did offer hypothetical examples. Online gamers involved in massive multiple player mayhem could use Skype IM to taunt rivals and discuss strategy with teammates. Skype’s IM features could be incorporated, Friss suggests, into software-based media players for personal computers, Web sites for dating, blogging or “eBay kinds of auctions,” Friis said.

I spent some time recently looking at collaborative globe-browsing with Google Earth (ie. giving and taking of tours), and yesterday, revisiting Jabber/XMPP as a possible transport for SPARQL queries and responses between friends and FOAFs. Both apps could get a healthy boost from these developments in the industry. Skype is great but the technology could do with being more open; maybe the nudge from Google will help there. Jabber is great but … hardly used by the people I chat with (who are split across MSN, Yahoo, AIM, Skype and IRC).

For a long time I’ve wanted to do RDF queries in a P2P context (eg. see book chapter I wrote with Rael Dornfest). Given Apple’s recent boost for Jabber, and now this from Google, the technology looks to have a healthy future. I want to try exposing desktop, laptop etc RDF collections (addressbooks, calendars, music, photos) directly as SPARQL endpoints exposed via Jabber. There will be some fiddly details, but the basic idea is that Jabber users (including Google and Apple customers) could have some way to expose aspects of their local data for query by their friends and FOAFs, without having to upload it all to some central Web site.

Next practical question: which Jabber software library to start hacking with? I was using Rich Kilmer’s Jabber4R but read that it wasn’t unmaintained, so wondering about switching to Perl or Python…

Profiling GML for RSS/Atom, RDF and Web developers

I spent some time yesterday talking with Ron Lake about GML, RDF, RSS and other acronyms. GML was originally an RDF application, and various RDFisms can still be seen in the design. I learned a fair bit about GML, and about its extensibility and profiling mechanisms.

We discussed some possibilities for sharing data between GML, RSS/Atom and RDF environments. In particular, two options: RDF inside GML; and RDFized GML.

The possibility of embedding islands of RDF inside GML (eg. the GML for a restaurant might use RDF for restaurant-review or menu markup) is interesting, as would allow GML documents to use any RDF vocabulary to describe the features on a map. Currently, such extension data typically requires the creation of a custom XML Schema. The other option, “RDFized GML”, is to explore the creation of an RDF vocabulary that allows some useful subset of GML data to be used in RDF. I’ll come back to this in a minute.

While GML comes from the world of professional GIS, its influence is being felt more widely: Google Earth (formerly Keyhole) uses something called KML, which bears a great many similarities with GML. Meanwhile in the RDF and RSS/Atom world, the very basic addition of “geo:lat” and “geo:long” tagging (sometimes using the W3C SemWeb IG WGS_84 namespace) has got a number of toolmakers interested. This year has seen the release of Yahoo! Maps, Google Maps, Google Earth and most recently Microsoft Virtual Earth. We’ve also seen the release of the excellent Mapping Hacks book, and increasing interest in this area from Web developers.

Although the experimental SWIG RDF vocabulary only deals with points described in WGS_84, there have been various discussions on possible extensions (eg. RDFGeom-2d from Chris Goad). These are intriguing, but we should be careful to avoid re-inventing wheels. Basically, I think we have all the ingredients for a hybrid approach: an RDFized GML subset designed for use by Web developers alongside RSS/Atom, FOAF and other public-facing XML formats. GML serves well as a data format in the GIS community, but some work is needed to find a subset that will find adoption in the wider Web.

The tiny W3C SWIG vocab, and related geo:lat/long tagging of “geo”-RSS feeds has shown that there is real interest in a lightweight XML-based mechanism for sharing map-related markup. GML shows us (via a 600 page specification, for GML 3.1) quite how rich and complex a problem space we’re facing, and KML demonstrates that a medium-sized “GML lite” subset can get traction with webmasters and developers, when backed by useful tools and services.

There are two pieces of work to do here (setting aside for now the topic of RDF islands within GML documents). Let’s first find a strawman profile of GML. From my limited knowledge and discussion with others, something “GML 2-ish” but profiled against GML 3.1, is the area to explore. Then we try getting those data structures into RDF, so it can mix freely with other information.

I understand from Ron Lake that profiling is something that is actively encouraged for GML, and there are even tools to support it that come with the spec: have a look at subsetutility.zip. These files (thanks Ron!) show a pretty easy path for experimentation with profiles. In addition to the schema subsetting utilities, the .zip also includes (just as an example to help me understand GML) an example application schema CommonObjects.xsd, showing how to define things like ‘Building’, ‘River’, and a sample instance .xml file that uses it.

To use the profiling tool, just put the unzipped files directly in the base/ directory of .xsd files that ships with GML 3.1, then run an XSLT processor to generate a GML subset.

xsltproc depends.xsl gml.xsd > _gml.dep

xsltproc GML3.1.1Subset.xsl _gml.dep > _gmlSubset.xsd

…and that’s your profile. The scripts take care of all the dependencies (ie. they’ll read the 29 XML Schemas, so you don’t have to :)

The bits of GML you want are specified as parameters in GML3.1.1Subset.xsl. The default in this .zip is: gml:Point, gml:LineString, gml:Polygon, gml:LinearRing, gml:Observation, gml:TimeInstant, gml:TimePeriod

I’m no GML expert, but if someone can help get some instance data matching such a profile, I’ll have a go at RDFizing it. Also, of course, it will be useful to debate how many facilities from full GML would find use in the Webmaster (RSS, KML etc) scene.

Disclaimer: for now this is purely an informal collaboration. If we make something interesting, it might be worth investigation of something more formal between W3C (home of RDF, and where I work) and OGC (home of GML). For now, let’s just try out some ideas…

Data syndication

There have been various developments in the last week, via Planet RDF, on the topic of data syndication using RSS/Atom.

Edd Dumbill on iTunes RSS extensions; a handy review of the extensions they’ve added to support a “podcasting” directory. See also comments from Danny.

Nearby in the Web, Yahoo! and friends are still busy with their their media RSS spec, which lives on the rss-media yahoogroups list. Yahoo are also looking creative on other fronts. Today’s Yahoo! Search blog has an entry on Yahoo! Maps, which again uses RSS extensions to syndicate map-related data:

The Yahoo! Maps open API is based on geoRSS, a RSS 2.0 with w3c geo extension. For more information check out developer.yahoo.net/maps. We also offer API support via a group forum at yws-maps.

This is particularly interesting to me, as they’re picking up the little geo: namespace I made with collaborators from the W3C Semantic Web (formerly RDF) Interest Group.

Although the namespace was designed to be used in RDF, they’re using it in non-RDF RSS2 documents. This is a little dissapointing, since it makes the data less available to RDF processors. RSS1, of course, was designed as an RDF application specifically to support such data-centric extensions. Yahoo! have some developer pages with more detail, but they seem to have picked up where the Worldkit Flash RSS geocoding project left off. Worldkit attaches geo:lat and geo:long information to RSS2 items, and can display these in a Flash-based UI.

The WGS 84 geo vocabulary used here by Worldkit and Yahoo! was a collaborative experiment in minimalism. The GIS world has some very rich, sophisticated standards. The idea with the geo: namespace was to take the tiniest step towards reflecting that world into the RDF data-merging environment. RDF is interesting precisely because it allows for highly mixed, yet predictably structured, data mixing.

So that little geo namespace experiment (thanks to the efforts of various geo/mapping hackers, most of whom aren’t very far in FOAF space from Jo Walsh) seems to be proving its worth. A little bit of GIS can go a very long way.

I should at this point stress that the “w3c geo extension” (as Yahoo!’s search blog calls it” is an informal, pre-standardisation piece of work. This is important to stress, particularly given that the work is associated with W3C, both through hosting of the namespace document and because it came about as a collaboration of theW3C RDF/SW Interest Group. If it were the product of a real W3C Working Group, it would have received much more careful review. Someone might have noticed the inadquate (or non-) definition of geo:alt, for example!

I’m beginning to think that a small but more disciplined effort at W3C around RDF vocabulary for geo/mapping (with appropriate liasion to GIS standards) would be timely. But I digress. I was talking about data syndication. Going back to the Yahoo! maps example… they have taken RSS2, and the SWIG Geo vocab, and added a number of extensions that relate to their mapping interface (image, zoom etc), as well as Address, CityState, Zip, Country code, etc. Useful entities to have
markup for. In an RDF environment I’d probably have used vCard for those.

Yahoo aren’t the only folk getting creative with RSS this week. Microsoft have published some information on their work, including some draft proposals for extending RSS with “lists”. This, again, brings an emphasis back to RSS for data syndication. In other words, RSS documents as a carrier for arbitrary other information whose dissemination fits the syndication/publication model of RSS. Some links on the Microsoft proposal are on scoble’s blog. See Microsoft’s RSS in Longhorn page for details and a link to their
Simple List Extensions specification, which seems to focus on allowing RSS feeds to be presented as lists of items, including use of datatyped (and hence sortable) extensions.