Quick clarification on SPARQL extensions and “Lock-in”

It’s clear from discussion bouncing around IRC, Twitter, Skype and elsewhere that “Lock-in” isn’t a phrase to use lightly.

So I post this to make myself absolutely clear. A few days ago I mentioned in IRC a concern that newcomers to SPARQL and RDF databases might not appreciate which SPARQL extensions are widely implemented, and which are the specialist offerings of the system they happen to be using. I mentioned OpenLink’s Virtuoso in particular as a SPARQL implementation that had a rich and powerful set of extensions.

Since it seems there is some risk I might be mis-interpreted as suggesting OpenLink are actively trying to “do a Microsoft” and trap users in some proprietary pseudo-SPARQL, I’ll state what I took to be obvious background knowledge: OpenLink is a company who owe their success to the promotion of cross-vendor database portability, they have been tireless advocates of a standards-based Semantic Web, and they’re active in proposing extensions to W3C for standardisation. So – no criticism of OpenLink intended. None at all.

All I think we need here, are a few utilities that help developers understand the nature of the various SPARQL dialects and the potential costs/benefits of using them. Perhaps an online validator, alongside those for RDF/XML, RDFa, Turtle etc. Such a validator might usefully list the extensions used in some query, and give pointers (perhaps into a wiki) where the status of the various extensions constructs can be discussed and documented.

Since SPARQL is such a young language, it lacks a lot of things that are taken from granted in the SQL world, and so using rich custom extensions when available is for many developers a sensible choice. My only concern is that it must be a choice, and one entered into consciously.

Foundation Nation: new orgs for Infocards, Symbian

Via the [IP] list, I read that the Information Card Foundation has launched.

Information Cards are the new way to control your personal data and identity on the web.

The Information Card Foundation is a group of thoughtful designers, architects, and companies who want to make the digital world easier for you by building better products that help you get control of your personal information.

From their blog, where Charles Andres offers a historical account of where they fit in:

And by early 2007, four tribes in the newly discovered continent of user-centric identity had united under the banner of OpenID 2.0 and brought the liberating power of user-controlled identifiers to the digital identity pioneers. The OpenID community formed the OpenID Foundation to serve as a trustee for intellectual property and a host for community activity and by early 2008 had attracted Microsoft, Yahoo, Google, VeriSign, and IBM to join as corporate directors.

Inspired by these efforts, the growing Information Card community realized that to bring this metaphor to full fruition required taking the same step—coming together into a common organization that would unify our efforts to create an interoperable identity layer. From one perspective this could be looked at as completing the “third leg of the stool” of what is often called the Venn of Identity (SAML, OpenID, and Information Cards). But from another perspective, you can see it as one of the logical steps needed towards the cooperative convergence among identity systems and protocols that will be necessary to reach a ubiquitous Internet identity layer—the layer that completes the hat trick.

I’m curious to see what comes of this. There’s some big backing, and I’ve heard good things about Infocard from folks in the know. From an SemWebby perspective, this stuff just gives us another way to figure out the provenance of claim graphs representable in RDF, queryable in SPARQL. And presumably some more core schemas to play with…

Meanwhile in the mobile scene, a Symbian Foundation has been unveiled:

Industry leaders to unify the Symbian mobile platform and set it free
Foundation to be established to provide royalty-free open platform and accelerate innovation

The demand for converged mobile devices is accelerating. By 2010 we expect four billion people to have joined the global mobile conversation. For many of these people, their mobile will be their first Internet experience, not just their first camera, music player or phone.

Open software is the basic building block for delivering this future.

With this in mind, industry leaders are coming together to establish Symbian Foundation, to bring to life a shared vision and to create the most proven, open and complete mobile software platform – available for free. To achieve this, the foundation will unify Symbian, S60, UIQ and
MOAP(S) software to create an unparalleled open software platform for converged mobile devices, enabling the whole mobile ecosystem to accelerate innovation.

The foundation is expected to start operating during the first half of 2009. Membership of the foundation will be open to all organizations, for a low annual membership fee of US $1,500.

I’ll save my pennies for an iPhone. Everybody’s open nowadays, I guess that’s good…

X-UA-Combustible

I share the concerns expressed by Norm Walsh and Jeremy Keith regarding the X-UA-Compatible mechanism for IE8. It’s certainly an important issue. By my inner tinfoil hat wearer can’t help but wonder whether this is a kind of feint or indirection that serves to distract from the real issue. Instead of being concerned directly that the world’s most popular Web browser doesn’t handle modern Web standards like SVG, we are left debating about how IE should communicate its inabilities with the rest of the Web. I fully appreciate that IE8 is a vast engineering project, but still … have our expectations of Microsoft sunk so low that nobody expects them to implement things like SVG? If Opera, Firefox, Safari can make progress on SVG, so can IE.

“The World is now closed”

Facebook in many ways is pretty open for a ‘social networking’ site. It gives extension apps a good amount of access to both data and UI. But the closed world language employed in their UI betrays the immodest assumption “Facebook knows all”.

  • Eric Childress and Stuart Weibel are now friends with Charles McCathienevile.
  • John Doe is now in a relationship.
  • You have 210 friends.

To state the obvious: maybe Eric, Stu and Chaals were already friends. Maybe Facebook was the last to know about John’s relationship; maybe friendship isn’t countable. As the walls between social networking sites slowly melt (I put Jabber/XMPP first here, with OpenID, FOAF, SPARQL and XFN as helper apps), me and my 210 closest friends will share fragments of our lives with a wide variety of sites. If we choose to make those descriptions linkable, the linked sites will increasingly need to refine their UI text to be a little more modest: even the biggest site doesn’t get the full story.

Closed World Assumption (Abort/Retry/Fail)
Facebook are far from alone in this (see this Xbox screenshot too, “You do not have any friends!”); but even with 35M users, the mistake is jarring, and not just to Semantic Web geeks of the missing isn’t broken school. It’s simply a mistake to fail to distinguish the world from its description, or the territory from the map.

A description of me and my friends hosted by a big Web site isn’t “my social network”. Those sites are just a database containing claims made by different people, some verified, some not. And with, inevitably, lots missing. My “social network” is an abstractification of a set of interlinked real-world histories. You could make the case that there has only ever been one “social network” since the distant beginnings of human society; certainly those who try to do geneology with Web data formats run into this in a weaker form, including the need to balance competing and partial information. We can do better than categorised “buddylists” when describing people, their inter-connections and relationships. And in many ways Facebook is doing just great here. Aside from the Pirates-vs-Ninjas noise, many extension applications on Facebook allow arbitrary events from elsewhere in the Web to bubble up through their service and be seen (or filtered) by others who are linked to me in their database. For example:

Facebook is good at reporting events, generally. Especially those sourced outside the system. Where it isn’t so great is when reporting internal-events, eg. someone telling it about a relationship. Event descriptions are nice things to syndicate btw since they never go out of date. Syndicating descriptions of the changeable properties of the world, on the other hand, is more slippery since you need to have all other relevant facts to be able to say how the world is right now (or implicitly, how it used to be, before). “Dan has painted his car red” versus “Dan’s car is now red”. “Dan has bookmarked the Jabber user profile spec” versus “Dan now has 1621 bookmarks”. “Dan has added Charles to his Facebook profile” versus “Dan is now friends with Charles”.

We need better UI that reflects what’s really going on. There will be users who choose to live much of their lives in public view, spread across sites, sharing enough information for these accounts to be linked. Hopefully they’ll be as privacy-smart and selective as Pew suggests. Personas and ‘characters’ can be spread across sites without either site necessarily revealing a real-world identity; secrets are keepable, at least in theory. But we will see people’s behaviour and claims from one site leak into another, and with approval. I don’t think this will be just through some giant “social graph” of strictly enumerated relationships, but through a haze of vaguer data.

What we’re most missing is a style of end-user UI here that educates users about this world that spans websites, couching things in terms of claims hosted in sites, rather than in absolutist terms. I suppose I probably don’t have 210 “friends” (whatever that means) in real life, although I know a lot of great people and am happy to be linked to them online. But I have 210 entries in a Facebook-hosted database. My email whitelist file has 8785 email addresses in it currently; email accounts that I’m prepared to assume aren’t sending me spam. I’m sure I can’t have 8785 friends. My Google Mail (and hence GTalk Jabber) account claims 682 contacts, and has some mysterious relationship to my Orkut account where I have 200+ (more randomly selected) friends. And now the OpenID roster on my blog gives another list (as of today, 19 OpenIDs that made it past the WordPress spam filter). Modern social websites shouldn’t try to tell me how many friends I have; that’s just silly. And they shouldn’t assume their database knows it all. What they can do is try to tell me things that are interesting to me, with some emphasis on things that touch my immediate world and the extended world of those I’m variously connected to.

So what am I getting at here? I guess it’s just that we need these big social sites to move away from making teen-talk claims about how the world is – “Sally (now) loves John” – and instead become reflectors for the things people are saying, “Sally announces that she’s in love with John”; “John says that he used to work for Microsoft” versus “John worked for Microsoft 2004-2006″; “Stanford University says Sally was awarded a PhD in 2008″. Today’s young internet users are growing up fast, and the Web around them needs also to mature.

One of the most puzzling criticisms you’ll often hear about the Semantic Web initiative is that is requires a single universal truth, a monolithic ontology to model all of human knowledge. Those of us in the SW community know that this isn’t so; we’ve been saying for a long time that our (meta)data architecture is designed to allow people to publish claims “in which
statements can draw upon multiple vocabularies that are managed in a decentralised fashion by various communities of expertise.”
As the SemWeb technology stack now has a much better approach to representing data provenance (SPARQL named graphs replacing RDF’99 statement reification) I believe we should now be putting more emphasis on a related theme: Semantic Web data can represent disputes, competing claims, and contradictions. And we can query it in an SQL-like language (SPARQL) that allows us to ask questions not just of some all-knowing database, but about what different databases are telling us.

The closed world approach to data gives us a lot, don’t get me wrong. I’m not the only one with a love-hate relationship with SQL. There are many optimisations we can do in a traditional SQL or XML Schema environment which become hard in an RDF context. In particular, going “open world” makes for a harder job when hosting and managing data rather than merely aggregating and integrating it. Nevertheless, if you’re looking for a modern Web data environment for aggregating claims of the “Stanford University says Sally was awarded a PhD in 1995″ form, SPARQL has a lot to offer.

When we’re querying a single, all-knowing, all-trusted database, SQL will do the job (eg. see Facebook’s FQL for example). When we need to take a bit more care with “who said what” and “according to whom?” aspects, coupled with schema extensibility and frequently missing data, SQL starts to hurt. If we’re aggregating (and building UI for) ‘social web’ claims about the world rather than simple buddylists (which XMPP/Jabber gives us out of the box), I suspect aggregators will get burned unless they take care to keep careful track of who said what, whether using SPARQL or some home-grown database system in the same spirit. And I think they’ll find that doing so will be peculiarly rewarding, giving us a foundation for applications that do substantially more than merely listing your buddies…

Google Earth touring via KML

While I’m writing up old hacks, here’s one that I really enjoyed, even if it was a bit clunky. A couple of years ago Mikel Maron implemented (on my urging in irc.oftc.net #geo IRC :) a PHP-based Google Earth touring service, which interconnects a “tour guide” user with “tourists”.

This site facilitates collaborative, realtime exploration of Google Earth. As the “tour guide” navigates, “tourists” will automatically follow along.

When the tour guide’s Google Earth installation is at rest, a specially installed KML network link sends the server an HTTP request, showing the coordinates of the visible area of the globe. This same service is periodically polled (every second) by “tourists” whose Google Earth will dutifully fly to the appropriate spot.

The system seems to be offline currently, but was quite evocative to use, even if tricky. You never quite knew what the other party could actually see, since the picture can load quite slowly when moving around a lot. And the implementation didn’t do anything about angle of view (although this became possible in later versions of KML). I had experimental tours of Dublin guided by Ina (Skyping at same time), and of various places in Iran by Hamed Saber.

I expect in due course (if not already, I don’t track these things) Google Earth and similar products (Worldwind, or the Microsoft thingy) will offer social map browsing, it’s such a nice feature, though it really needs an audio channel open at the same time. Last week I tried to do the same without such a link, my mum talking me thru finding a small village in France. Much harder! “Take the road north out of Chabanais … past a small farm, past the swimming pool…”.

Here is Mikel’s “how it works” writeup:

The web interface generates KML files, which are loaded into Google Earth and create Network Links. The tour guide has a “View Based Refresh” Network Link, which sends the bounding box of the current view to the specified URL whenever the camera stops. That position is stored. Tourists receive a “Time Based Refresh” Network Link, which requests every 10 seconds and receives the last stored position of the guide.

Right now only location and altitude are transmitted. A future release of Google Earth may enable tilt and rotation. Integrated chat would be nice as well.

The fact that they’ve hidden a full flight simulator within Google Earth might make this worth revisiting. And of course there is infinite fun to be had from playing with photos etc on the globe, although my last attempts in that direction (preparing for 3 months in Buenos Aires by studying geo-tagged photos instead of Spanish) tailed off. Everyone was putting pics on maps, I got a bit bored, even though it’s still a worthwhile area with much still to be done.

Some ideas are not meant to be combined though: who really needs a collaborative realtime photo-navigator implemented with Google Earth flight simulator? :)

Profiling GML for RSS/Atom, RDF and Web developers

I spent some time yesterday talking with Ron Lake about GML, RDF, RSS and other acronyms. GML was originally an RDF application, and various RDFisms can still be seen in the design. I learned a fair bit about GML, and about its extensibility and profiling mechanisms.

We discussed some possibilities for sharing data between GML, RSS/Atom and RDF environments. In particular, two options: RDF inside GML; and RDFized GML.

The possibility of embedding islands of RDF inside GML (eg. the GML for a restaurant might use RDF for restaurant-review or menu markup) is interesting, as would allow GML documents to use any RDF vocabulary to describe the features on a map. Currently, such extension data typically requires the creation of a custom XML Schema. The other option, “RDFized GML”, is to explore the creation of an RDF vocabulary that allows some useful subset of GML data to be used in RDF. I’ll come back to this in a minute.

While GML comes from the world of professional GIS, its influence is being felt more widely: Google Earth (formerly Keyhole) uses something called KML, which bears a great many similarities with GML. Meanwhile in the RDF and RSS/Atom world, the very basic addition of “geo:lat” and “geo:long” tagging (sometimes using the W3C SemWeb IG WGS_84 namespace) has got a number of toolmakers interested. This year has seen the release of Yahoo! Maps, Google Maps, Google Earth and most recently Microsoft Virtual Earth. We’ve also seen the release of the excellent Mapping Hacks book, and increasing interest in this area from Web developers.

Although the experimental SWIG RDF vocabulary only deals with points described in WGS_84, there have been various discussions on possible extensions (eg. RDFGeom-2d from Chris Goad). These are intriguing, but we should be careful to avoid re-inventing wheels. Basically, I think we have all the ingredients for a hybrid approach: an RDFized GML subset designed for use by Web developers alongside RSS/Atom, FOAF and other public-facing XML formats. GML serves well as a data format in the GIS community, but some work is needed to find a subset that will find adoption in the wider Web.

The tiny W3C SWIG vocab, and related geo:lat/long tagging of “geo”-RSS feeds has shown that there is real interest in a lightweight XML-based mechanism for sharing map-related markup. GML shows us (via a 600 page specification, for GML 3.1) quite how rich and complex a problem space we’re facing, and KML demonstrates that a medium-sized “GML lite” subset can get traction with webmasters and developers, when backed by useful tools and services.

There are two pieces of work to do here (setting aside for now the topic of RDF islands within GML documents). Let’s first find a strawman profile of GML. From my limited knowledge and discussion with others, something “GML 2-ish” but profiled against GML 3.1, is the area to explore. Then we try getting those data structures into RDF, so it can mix freely with other information.

I understand from Ron Lake that profiling is something that is actively encouraged for GML, and there are even tools to support it that come with the spec: have a look at subsetutility.zip. These files (thanks Ron!) show a pretty easy path for experimentation with profiles. In addition to the schema subsetting utilities, the .zip also includes (just as an example to help me understand GML) an example application schema CommonObjects.xsd, showing how to define things like ‘Building’, ‘River’, and a sample instance .xml file that uses it.

To use the profiling tool, just put the unzipped files directly in the base/ directory of .xsd files that ships with GML 3.1, then run an XSLT processor to generate a GML subset.

xsltproc depends.xsl gml.xsd > _gml.dep

xsltproc GML3.1.1Subset.xsl _gml.dep > _gmlSubset.xsd

…and that’s your profile. The scripts take care of all the dependencies (ie. they’ll read the 29 XML Schemas, so you don’t have to :)

The bits of GML you want are specified as parameters in GML3.1.1Subset.xsl. The default in this .zip is: gml:Point, gml:LineString, gml:Polygon, gml:LinearRing, gml:Observation, gml:TimeInstant, gml:TimePeriod

I’m no GML expert, but if someone can help get some instance data matching such a profile, I’ll have a go at RDFizing it. Also, of course, it will be useful to debate how many facilities from full GML would find use in the Webmaster (RSS, KML etc) scene.

Disclaimer: for now this is purely an informal collaboration. If we make something interesting, it might be worth investigation of something more formal between W3C (home of RDF, and where I work) and OGC (home of GML). For now, let’s just try out some ideas…

Data syndication

There have been various developments in the last week, via Planet RDF, on the topic of data syndication using RSS/Atom.

Edd Dumbill on iTunes RSS extensions; a handy review of the extensions they’ve added to support a “podcasting” directory. See also comments from Danny.

Nearby in the Web, Yahoo! and friends are still busy with their their media RSS spec, which lives on the rss-media yahoogroups list. Yahoo are also looking creative on other fronts. Today’s Yahoo! Search blog has an entry on Yahoo! Maps, which again uses RSS extensions to syndicate map-related data:

The Yahoo! Maps open API is based on geoRSS, a RSS 2.0 with w3c geo extension. For more information check out developer.yahoo.net/maps. We also offer API support via a group forum at yws-maps.

This is particularly interesting to me, as they’re picking up the little geo: namespace I made with collaborators from the W3C Semantic Web (formerly RDF) Interest Group.

Although the namespace was designed to be used in RDF, they’re using it in non-RDF RSS2 documents. This is a little dissapointing, since it makes the data less available to RDF processors. RSS1, of course, was designed as an RDF application specifically to support such data-centric extensions. Yahoo! have some developer pages with more detail, but they seem to have picked up where the Worldkit Flash RSS geocoding project left off. Worldkit attaches geo:lat and geo:long information to RSS2 items, and can display these in a Flash-based UI.

The WGS 84 geo vocabulary used here by Worldkit and Yahoo! was a collaborative experiment in minimalism. The GIS world has some very rich, sophisticated standards. The idea with the geo: namespace was to take the tiniest step towards reflecting that world into the RDF data-merging environment. RDF is interesting precisely because it allows for highly mixed, yet predictably structured, data mixing.

So that little geo namespace experiment (thanks to the efforts of various geo/mapping hackers, most of whom aren’t very far in FOAF space from Jo Walsh) seems to be proving its worth. A little bit of GIS can go a very long way.

I should at this point stress that the “w3c geo extension” (as Yahoo!’s search blog calls it” is an informal, pre-standardisation piece of work. This is important to stress, particularly given that the work is associated with W3C, both through hosting of the namespace document and because it came about as a collaboration of theW3C RDF/SW Interest Group. If it were the product of a real W3C Working Group, it would have received much more careful review. Someone might have noticed the inadquate (or non-) definition of geo:alt, for example!

I’m beginning to think that a small but more disciplined effort at W3C around RDF vocabulary for geo/mapping (with appropriate liasion to GIS standards) would be timely. But I digress. I was talking about data syndication. Going back to the Yahoo! maps example… they have taken RSS2, and the SWIG Geo vocab, and added a number of extensions that relate to their mapping interface (image, zoom etc), as well as Address, CityState, Zip, Country code, etc. Useful entities to have
markup for. In an RDF environment I’d probably have used vCard for those.

Yahoo aren’t the only folk getting creative with RSS this week. Microsoft have published some information on their work, including some draft proposals for extending RSS with “lists”. This, again, brings an emphasis back to RSS for data syndication. In other words, RSS documents as a carrier for arbitrary other information whose dissemination fits the syndication/publication model of RSS. Some links on the Microsoft proposal are on scoble’s blog. See Microsoft’s RSS in Longhorn page for details and a link to their
Simple List Extensions specification, which seems to focus on allowing RSS feeds to be presented as lists of items, including use of datatyped (and hence sortable) extensions.

Four notions of “wishlist” for FOAF

After the recent SWAD-Europe meeting on ImageDescription techniques, I made some scribbled notes in a bar on different notions of “wishlist” that might make sense to use with FOAF.

At the ImageDescription meeting we talked a lot about the difficulty of scoping such technical activities, since problem spaces overlap in ways that create opportunities and frustration in equal measure. The idea of having a “wishlist” association with a person’s homepage or FOAF profile(s) illustrates just this. A related idea is that of a foaf:tipjar, ie. a relationship between a person and a document (or part of a document) which explains how to pay or otherwise reward that person. The idea of public wishlists is associated with the practice of companies such as Amazon, who allow people to expose lists of things they’d like bought for them (eg. as Birthday presents). Danny Ayers has done some work on transforming Amazon wishlists into a FOAF/RDF format. This can also be seen as one half of the concept of bartering explored by Ian Davis, and in earlier FOAFShop scribbles. It is far from clear how all this stuff fits together in a way that best suits incrementally fancy deployment.

Anyway, here are four notions of Wishlist as discussed in Spain.

Sense 1: Wishlists as true descriptions

This notion of a wishlist can be characterised as a relationship between a Person (or Agent) and a description of the world. The idea is that the “wish” is for the description to be true. Any RDF description of the world can be used. For example, I might wish for a job at Microsoft (ie. for the foaf:Person whose foaf:homepage is http://danbri.org/ to have a foaf:workplaceHomepage of http://www.microsoft.com/). Or I might wish to get to know someone (expressible with foaf:knows), or for anything else that RDF can describe. A variation on this design is to express that I wish some RDF-expressible state of affairs not to be a true description of the world.

Sense 2: to own particular things

This is a common sense of wishlist. A wishlist in this sense is just a list of things, associated with a Person or Agent that wishes to own them.

Sense 3: an information-oriented expression of interest

Somewhat different, this one. Idea is that we might want to express informational wishes, for example we might want to leave “questions” in the Web, and have services answer them, notifying us of the answers (via email, RSS/Atom, etc.). This kind of wishlist could be implemented through users publishing queries (expressed in terms of RDF/XML descriptions) alongside their FOAF files.

Sense 4: Wanting to own things that match some particular template description.

This is a hybrid of 2 and 3, and can be seen as a common generalisation of 2. In practice, the notion of wishlist as a “list of items one wants to own” is likely to be implemented through a description of the things on that list. The implicit assumption is that for each thing on the list, there is at most one desired object that matches the description (eg. some particular book with ISBN and title). However in practice there are often multiple objects that meet a description, and it is actually quite difficult to constrain item descriptions so that there is only one possible match.

Instead of saying that one wants to own the car whose numberplate is ABC-FOAF-1 you could say “I’d like to own a wn:Car that is RED and whose foaf:maker has a foaf:homepage of http://www.audi.co.uk/ and which is the foaf:primaryTopic of some specified page.

Synthesis?

Senses 2 and 4 could be re-characterised using sense 1, using notions such as foaf:owns (which doesn’t exist at the time of writing). In other words, ownership-oriented wishlist mechanisms can be seen as being based expressions of a desire that “it should be true that … owns …”, for some particular person and entity.

The big question is… what to do next? what would it make sense to add to, or use with, FOAF?