SocialWeb


The laconi.ca microblogging platform is as open as you could hope for. That elusive trinity: open source; open standards; and open content.

The project is led by Evan Prodromou (evan) of Wikitravel fame, whose company just launched identi.ca, “an open microblogging service” built with Laconica. These are fast gaining feature-parity with twitter; yesterday we got a “replies” tab; this morning I woke to find “search” working. Plenty of interesting people have  signed up and grabbed usernames. Twitter-compatible tools are emerging.

At first glance this might look the typical “clone” efforts that spring up whenever a much-loved site gets overloaded. Identi.ca’s success is certainly related to the scaling problems at Twitter, but it’s much more important than that. Looking at FriendFeed comments about identi.ca has sometimes been a little depressing: there is too often a jaded, selfish “why is this worth my attention?” tone. But they’re missing something. Dave Winer wrote a “how to think about identi.ca” post recently; worth a read, as is the ever-wise Edd Dumbill on “Why identica is important”. This project deserves your attention if you value Twitter, or if you care about a standards-based decentralised Social Web.

I have a testbed copy at foaf2foaf.org (I’ve been collecting notes for Laconica installations at Dreamhost). It is also federated. While there is support for XMPP (an IM interface) the main federation mechanism is based on HTTP and OAuth, using the openmicroblogging.org spec. Laconica supports OpenID so you can play  without needing another password. But the OpenID usage can also help with federation and account matching across the network.

Laconica (and the identi.ca install) support FOAF by providing a FOAF files  - data that is being indexed already by Google’s Social Graph API. For eg. see  my identi.ca FOAF; and a search of Google SGAPI for my identi.ca account.  It is in PHP (and MySQL) - hacking on FOAF consumer code using ARC is a natural step. If anyone is interested to help with that, talk to me and to Evan (and to Bengee of course).

Laconica encourages everyone to apply a clear license to their microblogged posts; the initial install suggests Creative Commons Attribution 3. Other options will be added. This is important, both to ensure the integrity of this a system where posts can be reliably federated, but also as part of a general drift towards the opening up of the Web.

Imagine you are, for example, a major media content owner, with tens of thousands of audio, video, or document files. You want to know what the public are saying about your stuff, in all these scattered distributed Social Web systems. That is just about do-able. But then you want to know what you can do with these aggregated comments. Can you include them on your site? Horrible problem! Who really wrote them? What rights have they granted? The OpenID/CC combination suggests a path by which comments can find their way back to the original publishers of the content being discussed.

I’ve been posting a fair bit lately about OAuth, which I suspect may be even more important than OpenID over the next couple of years. OAuth is an under-appreciated technology piece, so I’m glad to see it being used nicely for Laconica. Laconica installations allow you to subscribe to an account from another account elsewhere in the Web. For example, if I am logged into my testbed site at http://foaf2foaf.org/bandri and I visit http://identi.ca/libby, I’ll get an option to (remote-)subscribe. There are bugs and usability problems as of right now, but the approach makes sense: by providing the url of the remote account, identi.ca can bounce me over to foaf2foaf which will ask “really want to subscribe to Libby? [y/n]“, setting up API permissioning for cross-site data flow behind the scenes.

I doubt that the openmicroblogging spec will be the last word on this kind of syndication / federation. But it is progress, practical and moving fast. A close cousin of this design is the work from the SMOB (Semantic Microblogging) project, who use SIOC, FOAF and HTTP. I’m happy to see a conversation already underway about bridging those systems.

Do please consider supporting the project. And a special note for Semantic Web (over)enthusiasts: don’t just show up and demand new RDF-related features. Either build them yourself or dive into the project as a whole. Have a nose around the buglist. There is of course plenty of scope for semwebbery, but I suggest a first priority ought to be to help the project reach a point of general usability and adoption. I’ve nothing against Twitter just as I had nothing at all against Six Apart and Movable Type, back before they opensourced. On the contrary, Movable Type was a great product from great people. But the freedoms and flexibility that opensource buys us are hard to ignore. And so I use Wordpress now, having migrated like countless others. My suspicion is we’re at a “Wordpress/MovableType” moment here with Identica/Laconica and Twitter, and that of all the platforms jostling to be the “new twitter”, this one is most deserving of success. With opensource, Laconica can be the new Laconica…

You can follow me here identi.ca/danbri

(a clock showing no time)

From my Skype logs [2008-06-19 Dan Brickley: 18:24:23]

So I had a drunken dream about online microcurrencies last night. Also about cats and water-slides but that’s another story. Idea was of a karma donation system based on one-off assignments from person to person of specified chunks of their lifetime; ‘giving the time of day’. you’re allowed to give any time of day taken from those days you’ve been alive so far. they’re not directly redistributable, nor necessarily related to what happened during the specified time. there’s no central banker, beyond the notion of ‘the public record’. The system naturally favours the old/experienced, but if someone gets drunk and gives all their time/karma to a porn site, at least in the morning they’ll have another 24h ‘in the bank’. Or they could retract/deny the gift, although doing so a lot would also be visible in the public record and doing so excessively would make one look a bit sketchy, one’s time gifts seem less valuable etc. Anchoring to a real world ‘good’ (time) is supposed to provide some control against runaway inflation, as is non-redistributability, but also the time thing is nice for visualizations and explanation. I’m not really sure if it makes sense but thought i’d write it down before i forget the idea…

One idea would be for the time gifts to be redeemable, but that i think pushes the metaphor too far into being a real currency for a fictional world where hourly rates are flattened. Some Lets schemes probably work that way I guess…

So I’ve been meaning to write this up, but in the absense of having done so, here’s the idea as it first struck me. I had been thinking a bit about online reputation services, and the kinds of information they might aggregate. Garlik’s QDOS and FOAF experiments being a good example of this kind of evidence aggregation. As OpenID, FOAF, microformats etc. take hold, I really think we’ll see a massive parting of waves, red sea style, with the “public record” on one side, and “private stuff” on the other.

And in the public record, we’ll be attaching information about the things we make and do to well-known identifiers for people (and their semi-detached aliases). Various websites have rating and karma mechanisms, but it is far from clear how they’ll look when shared in the public Web. Nor whether something robust and not-too-gameable will come out of it. There are certainly various modelling idioms (eg. advogato do their internal calculations, and then put everyone in one of several broad-brush groups; here’s my advogato FOAF). See also my previous notes on representing expertise.

Now in some IRC channels, there are bots where you can dish out credit by typing things like:

edd++ # xtechy

…and have a bot add up the credits, as well as the comments. In small IRC communities these aren’t gamed except for fun. So I’ve been thinking: how can these kinds of habits ever work in the wider Web, where people are spread across Web sites (but nevetherless identifiable with OpenID and FOAF). How could it not turn hideous? What limited resource do we each have a supply of? No, not kidneys. And in a hungover stupor I came to think that “the time of day” could be such a resource. It’s really just a metaphor, and I’m not sure at all that the quantifiable nature is a benefit. But I also quite like that we each have a neverending supply of the stuff, and that even a fleeting moment can count.

Update: here’s a post from Simon Lucy which has a very similar direction (it was Simon I was drinking with the night before writing this). Excerpt:

And what do you do with your positive balance? Need you do anything? I imagine those that care will publish their balance or compare it with others in similar way to company cars or hi fi tvs. There will always be envy and jealousy.

But no one can steal your balance, misuse it.

So who wants to host the Ego Bank.

The main difference compared to my suggested scheme, is just the ‘the Web’ and the public record it carries, are the “ego bank”, creating a playground for aggregators of karma, credibility and reputation information. “The time of day” would just be one such category of information…

From Yaron Koren on the semediawiki-users list:

I’m pleased to announce the release of the site Referata, at referata.com: a hosting site for SMW-based semantic wikis. This is not the first site to offer hosting of wikis using Semantic MediaWiki (that’s Wikia, as of a few months ago), but it is the first to also offer the usage of Semantic Forms, Semantic Drilldown, Semantic Calendar, Semantic Google Maps and some of the other related extensions you’ve probably heard about; Widgets, Header Tabs, etc. As such, I consider it the first site that lets people create true collaborative databases, where many people can work together on a set of well-structured data.

See announcement and their features page for more details. Basic usage is free; $20/month premium accounts can have private data, and $250/month enterprise accounts can use their own domains. Not a bad plan I think. A showcase Referata wiki would help people understand the offering better. In the meantime there is elsewhere a list of sites using Semantic MediaWiki. That list omits Chickipedia; we can only wonder why. Also I have my suspicions that Intellipedia runs with the SMW extensions too, but that’s just guessing. Regardless, there are a lot of fun things you could do with this, take a look…

From Wired via Thomas Roessler:

Google will have to turn over every record of every video watched by YouTube users, including users’ names and IP addresses, to Viacom, which is suing Google for allowing clips of its copyright videos to appear on YouTube, a judge ruled Wednesday.

I hope nobody thought their behaviour on youtube.com was a private matter between them and Google.

The Judge’s ruling (pdf) is interesting to read (ok, to skim). As the Wired article says,

The judge also turned Google’s own defense of its data retention policies — that IP addresses of computers aren’t personally revealing in and of themselves, against it to justify the log dump.

Here’s an excerpt. Note that there is also a claim that youtube account IDs aren’t personally identifying.

Defendants argue that the data should not be disclosed because of the users’ privacy concerns, saying that “Plaintiffs would likely be able to determine the viewing and video uploading habits of YouTube’s users based on the user’s login ID and the user’s IP address” .

But defendants cite no authority barring them from disclosing such information in civil discovery proceedings, and their privacy concerns are speculative.  Defendants do not refute that the “login ID is an anonymous pseudonym that users create for themselves when they sign up with YouTube” which without more “cannot identify specific individuals”, and Google has elsewhere stated:

“We . . . are strong supporters of the idea that data protection laws should apply to any data  that could identify you.  The reality is though that in most cases, an IP address without additional information cannot.” — Google Software Engineer Alma Whitten, Are IP addresses personal?, GOOGLE PUBLIC POLICY BLOG (Feb. 22, 2008)

So forget the IP address part for now.

Since early this year, Google have been operating an experimental service called the Social Graph API. From their own introduction to the technology:

With so many websites to join, users must decide where to invest significant time in adding their same connections over and over. For developers, this means it is difficult to build successful web applications that hinge upon a critical mass of users for content and interaction. With the Social Graph API, developers can now utilize public connections their users have already created in other web services. It makes information about public connections between people easily available and useful.

Only public data. The API returns web addresses of public pages and publicly declared connections between them. The API cannot access non-public information, such as private profile pages or websites accessible to a limited group of friends.

Google’s Social Graph API makes easier something that was already possible: using XFN and FOAF markup from the public Web to associate more personal information with YouTube accounts. This makes information that was already public increasingly accessible to automated processing. If I choose to link to my YouTube profile with the XFN markup rel=’me’ from another of my profiles,  those 8 characters are sufficient to bridge my allegedly anonymous YouTube ID with arbitrary other personal information. This is done in a machine-readable manner, one that Google has already demonstrated a planet-wide index for.

Here is the data returned by Google’s Social Graph API when asking for everything about my YouTube URL:

{
 “canonical_mapping”: {
  “http://youtube.com/user/modanbri”: “http://youtube.com/user/modanbri”
 },
 “nodes”: {
  “http://youtube.com/user/modanbri”: {
   “attributes”: {
    “url”: “http://youtube.com/user/modanbri”,
    “profile”: “http://youtube.com/user/modanbri”,
    “rss”: “http://youtube.com/rss/user/modanbri/videos.rss”
   },
   “claimed_nodes”: [
   ],
   “unverified_claiming_nodes”: [
    "http://friendfeed.com/danbri",
    "http://www.mybloglog.com/buzz/members/danbri"
   ],
   “nodes_referenced”: {
   },
   “nodes_referenced_by”: {
    “http://friendfeed.com/danbri”: {
     “types”: [
      "me"
     ]
    },
    “http://guttertec.swurl.com/friends”: {
     “types”: [
      "friend"
     ]
    },
    “http://www.mybloglog.com/buzz/members/danbri”: {
     “types”: [
      "me"
     ]
    }
   }
  }
 }
}

You can see here that the SGAPI, built on top of Google’s Web crawl of public pages, has picked out the connection to my FriendFeed (see FOAF file) and MyBlogLog (see FOAF file) accounts, both of whom export XFN and FOAF descriptions of my relationship to this YouTube account, linking it up with various other sites and profiles I’m publicly associated with.

YouTube users who have linked their YouTube account URLs from other social Web sites (something sites like FriendFeed and MyBlogLog actively encourage), are no longer anonymous on YouTube. This is their choice. It can give them a mechanism for sharing ‘favourited’ videos with a wide circle of friends, without those friends needing logins on YouTube or other Google services. This clearly has business value for YouTube and similar ’social video’ services, as well as for users and Social Web aggregators.

Given such a trend towards increased cross-site profile linkage, it is unfortunate to read that YouTube identifiers are being presented as essentially anonymous IDs: this is clearly not the case. If you know my YouTube ID ‘modanbri’ you can quite easily find out a lot more about me, and certainly enough to find out with strong probability my real world identity. As I say, this is my conscious choice as a YouTube user; had I wanted to be (more) anonymous, I would have behaved differently. To understand YouTube IDs as being anonymous accounts is to radically misunderstand the nature of the modern Web.

Although it wouldn’t protect against all analysis, I hope the user IDs are at least scrambled before being handed over to Viacom. This would make it harder for them to be used to look up other data via (amongst other things) Google’s own YouTube and Social Graph APIs.

Update: I should note also that the bridging of YouTube IDs with other profiles is one that is not solely under the control of the YouTube user. Friends, contacts, followers and fans on other sites can link to YouTube profiles freely; this can be enough to compromise an otherwise anonymous account. Increasingly, these links are machine-processable; a trend I’ve previously argued is (for better or worse) inevitable.

Furthermore, the hypertext and data environment around YouTube and the Social Web is rapidly evolving; the lookups and associations we’ll be able to make in 1-2 years will outstrip what is possible today. It only takes a single hyperlink to reveal the owner of a YouTube account name; many such links will be created in the months to come.

Building on last month’s announcement of OAuth for the Google Contacts API, this from Wei on the oauth list:

Just want to let you know that we officially support OAuth for all Google Data APIs.

See blog post:

You’ll now be able to use standard OAuth libraries to write code that authenticates users to any of the Google Data APIs, such as Google Calendar Data API, Blogger Data API, Picasa Web Albums Data API, or Google Contacts Data API. This should reduce the amount of duplicate code that you need to write, and make it easier for you to write applications and tools that work with a variety of services from multiple providers. [...]

There’s also a footnote, “* OAuth also currently works for YouTube accounts that are linked to a Google Account when using the YouTube Data API.”

See the documentation for more details.

On the YouTube front, I have no idea what % of their accounts are linked to Google; lots I guess. Some interesting parts of the YouTube API: retrieve user profiles, access/edit contacts, find videos uploaded by a particular user or favourited by them plus of course per-video metadata (categories, keywords, tags, etc). There’s a lot you could do with this, in particular it should be possible to find out more about a user by looking at the metadata for the videos they favourite.

Evidence-based profiles are often better than those that are merely asserted, without being grounded in real activity. The list of people I actively exchange mail or IM with is more interesting to me than the list of people I’ve added on Facebook or Orkut; the same applies with profiles versus tag-harvesting. This is why the combination of last.fm’s knowledge of my music listening behaviour with the BBC’s categorisation of MusicBrainz artist IDs is more interesting than asking me to type my ‘favourite band’ into a box. Finding out which bands I’ve friended on MySpace would also be a nice piece of evidence to throw into that mix (and possible, since MusicBrainz also notes MySpace URIs).

So what do these profiles look like? The YouTube ‘retrieve a profile‘ API documentation has an example. It’s Atom-encoded, and beyond the video stuff mentioned above has fields like:

  <yt:age>33</yt:age>
  <yt:username>andyland74</yt:username>
  <yt:books>Catch-22</yt:books>
  <yt:gender>m</yt:gender>
  <yt:company>Google</yt:company>
  <yt:hobbies>Testing YouTube APIs</yt:hobbies>
  <yt:location>US</yt:location>
  <yt:movies>Aqua Teen Hungerforce</yt:movies>
  <yt:music>Elliott Smith</yt:music>
  <yt:occupation>Technical Writer</yt:occupation>
  <yt:school>University of North Carolina</yt:school>
  <media:thumbnail url=’http://i.ytimg.com/vi/YFbSxcdOL-w/default.jpg’/>
  <yt:statistics viewCount=’9′ videoWatchCount=’21′ subscriberCount=’1′
    lastWebAccess=’2008-02-25T16:03:38.000-08:00′/>

Not a million miles away from the OpenSocial schema I was looking at yesterday, btw.

I haven’t yet found where it says what I can and can’t do with this information…

OpenSocial’s API reference describes a number of classes (’Person’, ‘Name’, ‘Email’, ‘Phone’, ‘Url’, ‘Organization’, ‘Address’, ‘Message’, ‘Activity’, ‘MediaItem’, ‘Activity’, …), each of which has various properties whose values are either strings, references to instances of other classes, or enumerations. I’d like to make them usable beyond the confines of OpenSocial, so I’m making an RDF/OWL version. OpenSocial’s schema is an attempt to provide an overarching model for much of present-day mainstream ’social networking’ functionality, including dating, jobs etc. Such a broad effort is inevitably somewhat open-ended, and so may benefit from being linked to data from other complementary sources.

With a bit of help from the shindig-dev list, #opensocial IRC, and Kevin Brown and Kevin Marks, I’ve tracked down the source files used to represent OpenSocial’s data schemas: they’re in the opensocial-resources SVN repository on code.google.com. There is also a downstream copy in the Apache Shindig SVN repo (I’m not very clear on how versioning and evolution is managed between the two). They’re Javascript files, structured so that documentation can be generated via javadoc. The Shindig-PHP schema diagram I posted recently is a representation of this schema.

So - my RDF version. At the moment it is merely a list of classes and their properties (expressed using via rdfs:domain), written using RDFa/HTML. I don’t yet define rdfs:range for any of these, nor handle the enumerated values (opensocial.Enum.Smoker, opensocial.Enum.Drinker, opensocial.Enum.Gender, opensocial.Enum.LookingFor, opensocial.Enum.Presence) that are defined in enum.js.

The code is all in the FOAF SVN, and accessible via “svn co http://svn.foaf-project.org/foaftown/opensocial/vocab/”. I’ve also taken the liberty of including a copy of the OpenSocial *.js files, and Mozilla’s Rhino Javascript interpreter js.jar in there too, for self-containedness.

The code in schemarama.js will simply generate an RDFA/XHTML page describing the schema. This can be checked using the W3C validator, or converted to RDF/XML with the pyRDFa service at W3C.

I’ve tested the output using the OwlSight/pellet service from Clark & Parsia, and with Protege 4. It’s basic but seems OK and a foundation to build from. Here’s a screenshot of the output loaded into Protege (which btw finds 10 classes and 99 properties).

An example view from protege, showing the class browser in one panel, and a few properties of Person in another.

OK so why might this be interesting?

  • Using OpenSocial-derrived vocabulary, OpenSocial-exported data in other contexts
    • databases (queryable via SPARQL)
    • mixed with FOAF
    • mixed with Microformats
    • published directly in RDFa/HTML
  • Mapping OpenSocial terms with other contact and social network schemas

This suggests some goals for continued exploration:

It should be possible to use “OpenSocial markup” in an ordinary homepage or blog (HTML or XHTML), drawing on any of the descriptive concepts they define, through using RDFa’s markup notation. As Mark Birbeck pointed out recently, RDFa is an empty vessel - it does not define any descriptive vocabulary. Instead, the RDF toolset offers an environment in which vocabulary from multiple independent sources can be mixed and merged quite freely. The hard work of the OpenSocial team in analysing social network schemas and finding commonalities, or of the Microformats scene in defining simple building-block vocabularies … these can hopefully be combined within a single environment.

A new work-in-progress vCard spec. See June 25th draft, and emailed changelog. Excerpted here:

o Removed useless text in IMPP description.
o Added CalDAV-SCHED example to CALADRURI.
o Removed CAPURI property.
o Dashes in dates and colons in times are now mandatory.
o Allow for dates such as 2008 and 2008-05 and times such as 07 and
07:54.
o Removed inline vCard value.
o Made AGENT only accept URI references instead of inline vCards.
o Added the MEMBER property.
o Renamed the UID parameter to PID.
o Changed the value type of the PID parameter to “a small integer.”
o Changed the presence of UID and PID when synchronization is to be
used from MUST to SHOULD.
o Added the RELATED (Section 7.6.7) property.
o Fixed many ABNF typos (issue #252).
o Changed formatting of ABNF comments to make them easier to read
(issue #226).

From the Introduction:

This draft contains much of the same text as 2425 and 2426 which may not be correct. Those two RFCs have been merged and the structure of this draft is what’s new. Some vCard-specific
suggestions have been added, but for the most part this is still very open. But we’d like to get feedback on the structure mostly so that it may be fixed.

Via Rajdeep Dua on shindig-dev, “Shindig : An Architectural Overview ( PHP Version)

Shindig is the Apache-incubated project to build an opensource implementation of the Google-and-friends OpenSocial platform for gadgety social networking. Most activity on the list is around the Java version, but a PHP effort/port is also underway.

This article follows the earlier “Architectural Overview of Shindig , an OpenSocial Reference Implementation“; both are based on work-in-progress code from Shindig svn.

One part that caught me eye was this handy (if inaccessible) overview of the schemas:

Figure 7 provides the details of various classes which represent and store the social graph on the client side for OpenSocial APIs.

I’ve transcribed the Person properties from this UMLish diagram; anyone doing serious mapping work ought to go track down a more authoritative reference (and let me know where it is!). Since the diagram is about client-side code, I guess it’s source is in Javascript and is shared with the Java codebase. A couple weeks ago at ESWC I made an attempt to expose this data via SPARQL/RDF using D2RQ, following a guide I found to wrapping Java Shindig around a MySQL database. I’d be happy to share that, but fear the codebase is moving so fast that some work would be needed to update to latest version.