A tale of two business models

Glancing back at 1998, thanks to the Wayback Machine.

W3C has Royalty Free licensing requirements and a public Process Document for good reason. I’ve been clicking back through the papertrails around the W3C P3P vs InterMind patent issue as a reminder. Here is the “Appendix C: W3C Licensing Plan Summary” from the old Intermind site:

We expect to license the patents to practice the P3P standard as it evolves over time as follows:

User Agents: For User Agent functionality, all commercial licensees will pay a royalty of 1% of revenues directly associated with the use of P3P or 0.1% of all revenues directly associated with the product employing the User Agent at the licensee’s option. We expect to measure revenues in a confidential and mutually acceptable manner with the licensee.

Service Providers: For Service Provider functionality, all commercial licensees will pay a royalty of 1% of revenues directly associated with the use of P3P. We expect to measure these revenues through the use of Web site logs, which will determine the percentage of P3P-related traffic and apply that percentage to the relevant Web site revenues (i.e., advertising-based revenues or transaction-based revenues). We expect to determine a method for monitoring or auditing such logs in a confidential and mutually acceptable manner with the licensee.

[...]

Intermind Corporation also expects to license the patents for CDF, ICE, and other XML-based agent technology on non-discriminatory terms. Members interested in further information on the relationship of Intermind’s patents to these technologies can contact Drummond Reed at drummond@intermind.com or 206-812-6000.

Nearby in the Web:

Cover Pages on Extensible Name Service (XNS):

Background: “In January 1999, the first of Intermind’s web agent patents began being issued (starting with U.S. patent No. 5,862,325). At the heart of this patent was a new naming and addressing service based on web agent technology. With the emergence of XML as a new global data interchange language — one perfectly suited to the requirements of a global ‘language’ for web agents — Intermind changed its name to OneName Corporation, built a new board and management team, and embarked on the development of this new global naming and addressing service. Because its use of XML as the foundation for all object representation and interchange led to the platform, yet had the same distributed architecture as DNS, it was christened eXtensible Name Service, or XNS. Recognizing the ultimate impact such a system may have on Internet infrastructure, and the crucial role that privacy, security, and trust must play, OneName also made the commitment to building it with open standards, open source software, and an open independent governance organization. Thus was born the XNS Public Trust Organization (XNSORG), the entity charged with setting the technical, operational, and legal standards for XNS.”

Over on XDIORG – Licenses and Agreements:

Summary of the XDI.ORG Intellectual Property Rights Agreement

NOTE: This summary is provided as a convenience to readers and is not intended in any way to modify or substitute for the full text of the agreement.

The purpose of the XDI.ORG IPR Agreement between XDI.ORG and OneName Corporation (dba Cordance) is to facilitate and promote the widespread adoption of XDI infrastructure by transfering the intellectual property rights underlying the XDI technology to a community-governed public trust organization.

The agreement grants XDI.ORG exclusive, worldwide, royalty-free license to a body of patents, trademarks, copyrights, and specifications developed by Cordance on the database linking technology underlying XDI. In turn, it requires that XDI.ORG manage these intellectual property rights in the public interest and make them freely available to the Internet community as royalty-free open standards. (It specifically adopts the definition provided by Bruce Perens which includes the ability for XDI.ORG to protect against “embrace and enhance” strategies.)

There is also a Global Service Provider aspect to this neutral, royalty-free standard. Another excerpt:

Summary of the XDI.ORG Global Service Provider Agreement

NOTE: This summary is provided as a convenience to readers and is not intended in any way to modify or substitute for the full text of the agreement.

Global Services are those XDI services offered by Global Service Providers (GSPs) based on the XRI Global Context Symbols (=, @, +, !) to facilitate interoperability of XDI data interchange among all users/members of the XDI community. XDI.ORG governs the provision of Global Services and has the authority to contract with GSPs to provide them to the XDI community.

For each Global Service, XDI.ORG may contract with a Primary GSP (similar to the operator of a primary nameserver in DNS) and any number of Secondary GSPs (similar to the operator of a secondary DNS nameserver). The Secondary GSPs mirror the Primary for loadbalancing and failover. Together, the Primary and Secondary GSPs operate the infrastructure for each Global Service according to the Global Services Specifications published and maintained by XDI.ORG.

The initial XDI.ORG GSP Agreement is between XDI.ORG and OneName Corporation (dba Cordance). The agreement specifies the rights and obligations of both XDI.ORG and Cordance with regard to developing and operating the first set of Global Services. For each of these services, the overall process is as follows:

  • If Cordance wishes to serve as the Primary GSP for a service, it must develop and contribute an initial Global Service Specification to XDI.ORG.
  • XDI.ORG will then hold a public review of the Global Service Specification and amend it as necessary.
  • Once XDI.ORG approves the Global Service Specification, Cordance must implement it in a commercially reasonable period. If Cordance is not able to implement or operate the service as required by the Global Service Specification, XDI.ORG may contract with another party to be the primary GSP.
  • XDI.ORG may contract with any number of Secondary GSPs.
  • If XDI.ORG desires to commence a new Global Service and Cordance does not elect to develop the Global Service Specification or provide the service, XDI.ORG is free to contract with another party.

The contract has a fifteen year term and covers a specified set of Global Services. Those services are divided into two classes: cost-based and fee-based. Cost-based services will be supplied by Cordance at cost plus 10%. Fee-based services will be supplied by Cordance at annual fees not to exceed maximums specified in the agreement. These fees are the wholesale cost to XDI.ORG; XDI.ORG will then add fees from any other GSPs supplying the service plus its own overhead fee to determine the wholesale price to registrars (registrars then set their own retail prices just as with DNS). Cordance’s wholesale fees are based on a sliding scale by volume and range from U.S. $5.40 down to $3.40 per year for global personal i-names and from U.S. $22.00 down to $13.50 per year for global organizational i-names.

The agreement also ensures all registrants of the original XNS Personal Name Service and Organizational Name Service have the right to convert their original XNS registration into a new XDI.ORG global i-name registration at no charge. This conversion period must last for at least 90 days after the commencement of the new global i-name service.

Over on inames.net, Become an i-broker:

What Is an I-Broker?

From Wikipedia:

“I-brokers are ‘bankers for data’ or ‘ISPs for identity services’–trusted third parties that help people and organizations share private data the same way banks help us exchange funds and ISPs help us exchange email and files.”

I-brokers are the core providers of XRI digital identity infrastructure. They not only provide i-name and i-number registration services, but also they provide i-services: a new layer of digital identity services that help people and business safely interact on the Internet. See the I-Service Directory for the first open-standard i-services that can be offered by any XDI.org-accredited i-broker.

How Does an I-Broker Become XDI.org-Accredited?

Cordance and NeuStar, together with XDI.org, have published a short guide to the process, “Becoming an I-Broker,” which includes all the information necessary to get started. It also includes contact information for the i-broker support teams at both companies.

Download Becoming an I-Broker

In addition the following two zip files contain all the documents needed by an i-broker. The first one contains the i-broker application and agreements, which are appendicies F and I of the XDI.org Global Services Specifications (GSS), located in their complete form at http://gss.xdi.org. The second one contains all the rest of the GSS documents referenced by the application and agreement.

From that downloadable PDF,

What is the application fee?
The standard application fee is USD $2500. However between the GRS opening on June 20th and the
start of Digital ID World 2006 on September 11, 2006, you may apply to Cordance to have the fee offset
by development and marketing commitments. For more information, contact Cordance or NeuStar at the
addresses below.

In the XDI.org GSS wiki,

6.1. Appendix B: Fee Schedules

Taking a look at that, we see:

Global Services Specification V1.0
Appendix B: Fee Schedule
Revised Version: 1 September, 2007

[...]

This and all contributions to XDI.org are open, public, royalty-free specifications licensed
under the XDI.org license available at http://www.xdi.org/docref/legal/xdi-org-license.html.

…where I find I can buy a personal i-name for $5/year, or a business i-name for 29.44, as in individual. Or as “become an i-broker” points out,

For volume pricing, contact Cordance or NeuStar at the addresses below.

Flickr (Yahoo) upcoming support for OpenID

According to Simon Willison, Flickr look set to support OpenID by allowing your photostream URL (eg. for me, http://www.flickr.com/photos/danbri/) to serve as an OpenID, ie. something you can type wherever you see “login using OpenID” and be bounced to Flickr/Yahoo to provide credentials instead of remembering yet another password. This is rather good news.

For the portability-minded, it’s worth remembering that OpenID lets you put markup in your own Web page to devolve to such services. So my main OpenID is “danbri.org” , which is a document I control, on a domain that I own. In the HTML header I have the following markup:



<link rel="meta" type="application/rdf+xml" title="FOAF" href="http://danbri.org/foaf.rdf" />
<link rel="openid.server" href="http://www.livejournal.com/openid/server.bml" />
<link rel="openid.delegate" href="http://danbri.livejournal.com/" />

…which is enough to defer the details of being an OpenID provider to LiveJournal (thanks, LiveJournal!). Flickr are about to join the group of sites you can use in this way, it seems.

As an aside, this means that the security of our own websites becomes yet more important. Last summer, DreamHost (my webhosting provider) were compromised, and my own homepage was briefly decorated with viagra spam. Fortunately they didn’t touch seem to touch the OpenID markup, but you can see the risk. That’s the price of portability here. As Simon points out, we’ll probably all have several active OpenIDs, and there’s no need to host your own, just as there’s no need for people who want to publish online to buy and host their own domains or HTML sites.

The Flickr implementation, coupled with their existing API, means we could all offer things like “log into my personal site for family (or friends)” and defer buddylist – and FOAF – management to the well-designed Flickr site, assuming all your friends or family have Flickr accounts. Implementing this in a way that works with other providers (eg. LJ) is left as an excercise for the reader ;)

Imagemap magic

I’ve always found HTML imagemaps to be a curiously neglected technology. They seem somehow to evoke the Web of the mid-to-late 90s, to be terribly ‘1.0’. But there’s glue in the old horse yet…

A client-side HTML imagemap lets you associate links (and via Javascript, behaviour) with regions of an image. As such, they’re a form of image metadata that can have applications including image search, Web accessibility and social networking. They’re also a poor cousin to the Web’s new vector image format, SVG. This morning I dug out some old work on this (much of which from Max, Libby, Jim all of whom btw are currently working at Joost; as am I, albeit part-time).

The first hurdle you hit when you want to play with HTML imagemaps is finding an editor that produces them. The fact that my blog post asking for MacOSX HTML imagemap editors is now top Google hit for “MacOSX HTML imagemap” pretty much says it all. Eventually I found (and paid for) one called YokMak that seems OK.

So the first experiment here, was to take a picture (of me) and make a simple HTML imagemap.

danbri being imagemapped

As a step towards treating this as re-usable metadata, here’s imagemap2svg.xslt from Max back in 2002. The results of running it with xsltproc are online: _output.svg (you need an SVG-happy browser). Firefox, Safari and Opera seem more or less happy with it (ie. they show the selected area against a pink background). This shows that imagemap data can be freed from the clutches of HTML, and repurposed. You can do similar things server-side using Apache Batik, a Java SVG toolkit. There are still a few 2002 examples floating around, showing how bits of the image can be described in RDF that includes imagemap info, and then manipulated using SVG tools driven from metadata.

Once we have this ability to pick out a region of an image (eg. photo) and tag it, it opens up a few fun directions. In the FOAF scene a few years ago, we had fun using RDF to tag image region parts with information about the things they depicted. But we didn’t really get into questions of surface-syntax, ie. how to maker rich claims about the image area directly within the HTML markup. These days, some combination of RDFa or microformats would probably be the thing to use (or perhaps GRDDL). I’ve sent mail to the RDFa group looking for help with this (see that message for various further related-work links too).

Specifically, I’d love to have some clean HTML markup that said, not just “this area of the photo is associated with the URI http://danbri.org/”, but “this area is the Person whose openid is danbri.org, … and this area depicts the thing that is the primary topic of http://en.wikipedia.org/wiki/Eiffel_Tower”. If we had this, I think we’d have some nice tools for finding images, for explaining images to people who can’t see them, and for connecting people and social networks through codepiction.

Codepiction

SQL, OpenOffice: would a JDBC driver for SPARQL protocol make sense?

I’d like to have access to my SPARQL data from desktop tools like OpenOffice’s spreadsheet. Apparently it can talk to remote databases via JDBC and OBDBC, both for the spreadsheet and database tools within OpenOffice.

Many years ago, pre-SPARQL, I hassled Libby into making her Squish RDF query system talk JDBC. There have been related efforts since, but I’ve lost track of what’s currently feasible with modern tools. There was a SPARQL4J collaboration initially between Asemantics, Profium and HP Labs, and some JBDC code from 2005 is on sourceforge. SPARQL inside JDBC is always going to be a bit of a hack, I suspect, since SPARQL results aren’t exactly like SQL result. But if it’s the easiest way to get things into a spreadsheet, is probably worth pursuing. I suspect this is the sort of problem OpenLink are good at, given their database driver background.

Would it be possible to wire up OpenOffice somehow so that it thought it was talking JDBC, but the connection parameters told the Java driver about a SPARQL endpoint URLs. Actually all this is a bit predicated on the assumption that OpenOffice allows the app user to pass in SQL, so we could pass SPARQL instead. If all the SQL is generated by OpenOffice we’d need to hack the OO src. Amy I making any sense? What still needs to be done? Where are we today?

FOAF diagram (day 2)


FOAF diagram (day 2)
Originally uploaded by danbri

Another revision, after feedback from Ivan.

The original had “Thing” in italics (a convention I tried before adding in doap: dc: and sioc: references), to indicate it was from another namespace. I’ve now made that heritage explicit (although I suspect it might confuse, the idea is pretty central so worth explaining).

Originally I had both “tipjar” and “mbox” drawn as if they were literal properties, when both are relational. The later layout I’m using allows tipjar to be drawn without crossovers, so now only “mbox” is an oddity. I’ve used italics as a hint of that, but without explicit explanation.

A lot of the redundancy in the diagram comes from the property inverses, but I want to leave them in since they’re important to explain. The expanded key in the bottom-left now has a nerdy little explanation of “inverse property”. I also use the word “subClass” explicitly on the fat arrow, to tap into an OO heritage that might be floating around in people’s minds. In other words, I want people to realise that “a Person is an Agent is a Spatial thing is a thing” is what we’re getting at with those fat little arrows.

On Ivan’s suggestion I trimmed the property lists a little. Removing the non-jabber IM properties, and first/last name. The space earned from the latter was immediately spent by adding in bio:olb since it’s useful and I’ve the impression it’s widely used.

I think that’s about it for changes. Ivan suggested removing the DOAP and SIOC partys, but to my mind they are important because those vocabularies (and projects) elaborate on parts of FOAF which are important but otherwise neglected: the description of Projects, and of Users.

OK one more version. This has coloured blogs for the core FOAF classes. I think I made a mistake choosing light blue, since that is the colour code I’ve assigned to mean “inverse functional property”, nearly. Perhaps a light yellow?

Anyway here it is for now. Guess I should stop using Flickr for this really!

with colours

Update: I’ve put the files in svn (original OmniGraffle XML is the .graffle file; also some SVG output although I’m unsure the quality), and made another slight revision, this time focussing on the layout of the ends arrows for better readability; previously they were cluttered and the property directions were therefore hard to read.


FOAF spec diagram


experimental foafspec diagram
Originally uploaded by danbri

I’ve been trying to capture the core terms of FOAF in a diagram. Here’s version two.

Notes: there are a few terms I’ve missed out, to avoid massive clutter: workInfoHomepage, geekcode, myersBriggs, currentProject, pastProject, dnaChecksum, membershipClass, sha1, fundedBy, theme, Online*XyzAccount, logo, phone. These are all underspecified, underused, or boring. Or all three :) The properties mbox and tipjar are the main quirks; they’re drawn like string-valued properties but are really relations. Some of the non-FOAF namespaces are done the same way too. Maybe it needs a footnote or notation for those?

I’ve included a few nods towards external namespaces; obviously these aren’t supposed to be complete.

Even this kind of diagram is a bit meta; there should be a companion diagram with instance level data using these terms. Work in progress. Suggestions and feedback would be much appreciated.

cvs2svn migration tips

Some advice from Garrett Rooney on Subversion. I was asking about moving the historical records for the FOAF project (xmlns.com etc) from dusty old CVS into shiny new Subversion. In particular, from CVS where I own and control the box, to Subversion where I’m a non-root normal mortal user (ie. Dreamhost customer).

The document records are pretty simple (no branches etc.). In fact only the previous versions of the FOAF RDF namespace document are of any historical interest at all. But I wanted to make sure that I could get things out again easily, without owning the Svn repository or begging the Dreamhost sysadmins.

Answer: “…a slightly qualified yes. assuming the subversion server is running svn 1.4 or newer a non-root user can use the svnsync tool to copy the contents (i.e. entire history they have read access to) of one repository into a new one. with servers before 1.4 it’s still possible to extract the information, but it’s more difficult.

And finding the version number? “if they’re letting you access svn via http you can just browse a repository and by default it’ll tell you the server version in the html view of the repository“.

Easily done, we’re on 1.4.2. That’s good.

Q: Any recommendations for importing from CVS?
A: “Converting from cvs to svn is a hellish job, due to the amount of really necessary data that cvs doesn’t record. cvs2svn has to infer a lot of it, and it does a damn good job of it considering what a pain in the ass it is. I’m immediately skeptical of anyone who goes out and writes another conversion tool ;-)

If you don’t have the ability to load a dumpfile into the repository you can load it into a local repos and then user svnsync to copy that into a remote empty repository. svnsync unfortunately won’t let you load into a non-empty repository, if you want to do that you need to use a svnadmin load, which requires direct access to the repository. most hosting sites will give you some way to do that sort of thing though.

Thanks Garrett!

Symbol languages and the Semantic Web

OK so I just stumbled upon this…

Bomb in Baghdad

…via Jonathan Chetwynd’s ever-inventive and SVG-happy Peepo.com.

The “Car Bomb in Baghdad” story is from a site created by Widgit Software, and explains itself as follows:

Symbolworld has been set up to provide a web site with material suitable for symbol readers of all ages. The internet is an important medium which many people really like to use. Sadly there is very little material that is appropriate or accessible by people with learning difficulties.

The copyright statement for Symbolworld says “symbols on this page are copyright of the commercial owners. They may not be copied or used in any other format without the written permission from the owner.“, which initially struck me as a potential challenge to any use of this particular symbol-set for online communication. But I don’t really know this scene, and I guess this copyright could be just the same as the way eg. fonts are copyrighted.

So I don’t know much about this particular project/company/product, but it reminded me of some similar work I heard about a few years ago. Back when the EU, in their occasionally infinite wisdom, funded SWAD-Europe to run around talking to interesting people about standards and the Semantic Web (and giving them t-shirts) , Chaals organised a great developer workshop in Madrid on Image annotation. We had the usual fun with SVG and RDF (which btw I’m still betting on) and I got to learn a bit about CCF. Seeing the Baghdad example this morning reminded me of all this. I’ve been clicking around and trying to gather my sprawling thoughts.

CCF, the Concept Coding Framework, is kinda image annotation in reverse. Instead of focussing on the description of the content, concepts etc associated with images, the emphasis is on the use of images to illustrate some enumerated set of concepts. From Chaals’ workshop report re outcomes, I’m reminded that we discussed…

How to use Creative Commons and similar vocabularies to determine whether a particular symbol can be freely used (typically in commercial systems the symbols themselves are proprietary, which can be a major barrier to communication between people who have different systems).

CCF was using some variant of SKOS (another SWAD-Europe activity). This found its way into SKOS itself, where we now have prefSymbol and altSymbol relationships that associate a skos:Concept with a dcmitype:Image. Borrowing an illustration from the SKOS Core guide here:

skos symbol diagram

The guide also notes a distinction between symbolic labelling and “depiction” in the FOAF sense; some symbols are purely symbolic, and have no representational content.

So, catching up with this area of work a little, I find the Bliss Symbolics, the WWAAC project final report, and various other accessibility-related efforts. But I’ve not really figured out where I’d start if I wanted to build something simple using a freely available symbol-set, nor what the main options/projects currently are. But there’s plenty of reading, including pages from a recent Bliss “think tank” meeting.

The latest I can find on CCF is that it has moved sites and that there are some Web interfaces available, but “sorry – there are no downloads for this project yet.”. Ah, apparently the SYMBERED project is continuing development of CCF (aside: Bliss with swedish translation; doubly incomprehensible to me!). There is a nice example on their site showing the multilingual aspect to this work, as well as contrasting Bliss with a more representationally-oriented symbol set; see their site for details.

Here’s a simple example just in English:

I want coffee and milk and cookies.

In case anyone thinks this whole exploration is a bit niche and obscure, take a look at how people use MSN and other IM systems sometime. And of course the Jabber/XMPP guys have been exploring specs for standard emoticons. Chaals also points out some connections to VoiceXML where “there are a handful of options available in an interaction designed to be through voice, and developers will define assorted ways of recognising from a user’s speech which of the relevant concepts is being matched.”

We’re also only a few clicks away from the survey conducted by the dreamily named W3C Emotion incubator group into use cases and technologies for the description of emotion using markup languages.

If an RDF-ized Wordnet is also thrown into the mix, assigning URIs to Synsets, WordSenses, Words, I think we might actually be getting somewhere. The version of Wordnet in RDF published at W3C doesn’t currently use SKOS, although this was discussed, and of course there are other representations that make more use of RDF class hierarchies for nouns (at the expense of linguistic lossyness and losing non-noun content). Princeton’s original English-language Wordnet has spawned many related projects and translations, but as far as I know, there is little integration amongst them, and not all of the data is public or freely re-usable.

I once had a silly dream of taking a photo to go with each and every noun term in Wordnet. A kind of SemWeb I-Spy, a cousin to Immuexa’s FOAF bingo. Or better, of doing that with friends. The rise of Flickr and tagging means that we’d probably do this now by aggregate using Flickr and similar sites. But it seems conceivable to me that such an “illustrated wordnet” could be made, using either photo-oriented or symbol-oriented illustrations.

OK what might that buy us? Let’s try these two samples.

Not perfect, … but imho Wordnet gives a nice set of common and identifiable concepts that can be used as a hub for all kinds of different projects. And all we’d need is a huge pile of shared data shaped like this:

wordnet:word-cookie skos.prefLabel wikicommons:Explosion.svg .

OK it’s not going to bring about world peace and the return of esperanto, and of course there’s much more to language and communication than nouns and verbs (the easiest part of wordnet to turn into visual symbols), but it does strike me as a fun little (big) project…. Wordnet is too huge to be useful in every context where we’d want a modest-sized symbol set (eg. IM emoticons), … but it is nicely searchable, and would provide a framework for such subsets to evolve and be interconnected.

“The World is now closed”

Facebook in many ways is pretty open for a ‘social networking’ site. It gives extension apps a good amount of access to both data and UI. But the closed world language employed in their UI betrays the immodest assumption “Facebook knows all”.

  • Eric Childress and Stuart Weibel are now friends with Charles McCathienevile.
  • John Doe is now in a relationship.
  • You have 210 friends.

To state the obvious: maybe Eric, Stu and Chaals were already friends. Maybe Facebook was the last to know about John’s relationship; maybe friendship isn’t countable. As the walls between social networking sites slowly melt (I put Jabber/XMPP first here, with OpenID, FOAF, SPARQL and XFN as helper apps), me and my 210 closest friends will share fragments of our lives with a wide variety of sites. If we choose to make those descriptions linkable, the linked sites will increasingly need to refine their UI text to be a little more modest: even the biggest site doesn’t get the full story.

Closed World Assumption (Abort/Retry/Fail)
Facebook are far from alone in this (see this Xbox screenshot too, “You do not have any friends!”); but even with 35M users, the mistake is jarring, and not just to Semantic Web geeks of the missing isn’t broken school. It’s simply a mistake to fail to distinguish the world from its description, or the territory from the map.

A description of me and my friends hosted by a big Web site isn’t “my social network”. Those sites are just a database containing claims made by different people, some verified, some not. And with, inevitably, lots missing. My “social network” is an abstractification of a set of interlinked real-world histories. You could make the case that there has only ever been one “social network” since the distant beginnings of human society; certainly those who try to do geneology with Web data formats run into this in a weaker form, including the need to balance competing and partial information. We can do better than categorised “buddylists” when describing people, their inter-connections and relationships. And in many ways Facebook is doing just great here. Aside from the Pirates-vs-Ninjas noise, many extension applications on Facebook allow arbitrary events from elsewhere in the Web to bubble up through their service and be seen (or filtered) by others who are linked to me in their database. For example:

Facebook is good at reporting events, generally. Especially those sourced outside the system. Where it isn’t so great is when reporting internal-events, eg. someone telling it about a relationship. Event descriptions are nice things to syndicate btw since they never go out of date. Syndicating descriptions of the changeable properties of the world, on the other hand, is more slippery since you need to have all other relevant facts to be able to say how the world is right now (or implicitly, how it used to be, before). “Dan has painted his car red” versus “Dan’s car is now red”. “Dan has bookmarked the Jabber user profile spec” versus “Dan now has 1621 bookmarks”. “Dan has added Charles to his Facebook profile” versus “Dan is now friends with Charles”.

We need better UI that reflects what’s really going on. There will be users who choose to live much of their lives in public view, spread across sites, sharing enough information for these accounts to be linked. Hopefully they’ll be as privacy-smart and selective as Pew suggests. Personas and ‘characters’ can be spread across sites without either site necessarily revealing a real-world identity; secrets are keepable, at least in theory. But we will see people’s behaviour and claims from one site leak into another, and with approval. I don’t think this will be just through some giant “social graph” of strictly enumerated relationships, but through a haze of vaguer data.

What we’re most missing is a style of end-user UI here that educates users about this world that spans websites, couching things in terms of claims hosted in sites, rather than in absolutist terms. I suppose I probably don’t have 210 “friends” (whatever that means) in real life, although I know a lot of great people and am happy to be linked to them online. But I have 210 entries in a Facebook-hosted database. My email whitelist file has 8785 email addresses in it currently; email accounts that I’m prepared to assume aren’t sending me spam. I’m sure I can’t have 8785 friends. My Google Mail (and hence GTalk Jabber) account claims 682 contacts, and has some mysterious relationship to my Orkut account where I have 200+ (more randomly selected) friends. And now the OpenID roster on my blog gives another list (as of today, 19 OpenIDs that made it past the WordPress spam filter). Modern social websites shouldn’t try to tell me how many friends I have; that’s just silly. And they shouldn’t assume their database knows it all. What they can do is try to tell me things that are interesting to me, with some emphasis on things that touch my immediate world and the extended world of those I’m variously connected to.

So what am I getting at here? I guess it’s just that we need these big social sites to move away from making teen-talk claims about how the world is – “Sally (now) loves John” – and instead become reflectors for the things people are saying, “Sally announces that she’s in love with John”; “John says that he used to work for Microsoft” versus “John worked for Microsoft 2004-2006″; “Stanford University says Sally was awarded a PhD in 2008″. Today’s young internet users are growing up fast, and the Web around them needs also to mature.

One of the most puzzling criticisms you’ll often hear about the Semantic Web initiative is that is requires a single universal truth, a monolithic ontology to model all of human knowledge. Those of us in the SW community know that this isn’t so; we’ve been saying for a long time that our (meta)data architecture is designed to allow people to publish claims “in which
statements can draw upon multiple vocabularies that are managed in a decentralised fashion by various communities of expertise.”
As the SemWeb technology stack now has a much better approach to representing data provenance (SPARQL named graphs replacing RDF’99 statement reification) I believe we should now be putting more emphasis on a related theme: Semantic Web data can represent disputes, competing claims, and contradictions. And we can query it in an SQL-like language (SPARQL) that allows us to ask questions not just of some all-knowing database, but about what different databases are telling us.

The closed world approach to data gives us a lot, don’t get me wrong. I’m not the only one with a love-hate relationship with SQL. There are many optimisations we can do in a traditional SQL or XML Schema environment which become hard in an RDF context. In particular, going “open world” makes for a harder job when hosting and managing data rather than merely aggregating and integrating it. Nevertheless, if you’re looking for a modern Web data environment for aggregating claims of the “Stanford University says Sally was awarded a PhD in 1995″ form, SPARQL has a lot to offer.

When we’re querying a single, all-knowing, all-trusted database, SQL will do the job (eg. see Facebook’s FQL for example). When we need to take a bit more care with “who said what” and “according to whom?” aspects, coupled with schema extensibility and frequently missing data, SQL starts to hurt. If we’re aggregating (and building UI for) ‘social web’ claims about the world rather than simple buddylists (which XMPP/Jabber gives us out of the box), I suspect aggregators will get burned unless they take care to keep careful track of who said what, whether using SPARQL or some home-grown database system in the same spirit. And I think they’ll find that doing so will be peculiarly rewarding, giving us a foundation for applications that do substantially more than merely listing your buddies…