OpenID, OAuth UI and tool links

A quick link roundup:

From ‘Google OAuth & Federated Login Research‘:

“The following provides some guidelines for the user interface define of becoming an OAuth service provider”

Detailed notes on UI issues, with screenshots and links to related work (opensocial etc.).

Myspace’s OAuth Testing tool:

The MySpace OAuth tool creates examples to show external developers the correct format for constructing HTTP requests signed according to OAuth specifications

Google’s OAuth playground tool (link):

… to help developers cure their OAuth woes. You can use the Playground to help debug problems, check your own implementation, or experiment with the Google Data APIs.

If anyone figures out how to post files to Blogger via their AtomPub/OAuth API, please post a writeup! We should be able to use it to post RDFa/FOAF etc hopefully…

Yahoo’s OpenID usability research. Really good to see this made public, I hope others do likewise. There’s a summary page and a full report in PDF, “Yahoo! OpenID: One Key, Many Doors“.

Finally, what looks like an excellent set of introductory posts on OAuth: a Beginner’s Guide to OAuth from Eran Hammer-Lahav.

Google Data APIs (and partial YouTube) supporting OAuth

Building on last month’s announcement of OAuth for the Google Contacts API, this from Wei on the oauth list:

Just want to let you know that we officially support OAuth for all Google Data APIs.

See blog post:

You’ll now be able to use standard OAuth libraries to write code that authenticates users to any of the Google Data APIs, such as Google Calendar Data API, Blogger Data API, Picasa Web Albums Data API, or Google Contacts Data API. This should reduce the amount of duplicate code that you need to write, and make it easier for you to write applications and tools that work with a variety of services from multiple providers. [...]

There’s also a footnote, “* OAuth also currently works for YouTube accounts that are linked to a Google Account when using the YouTube Data API.”

See the documentation for more details.

On the YouTube front, I have no idea what % of their accounts are linked to Google; lots I guess. Some interesting parts of the YouTube API: retrieve user profiles, access/edit contacts, find videos uploaded by a particular user or favourited by them plus of course per-video metadata (categories, keywords, tags, etc). There’s a lot you could do with this, in particular it should be possible to find out more about a user by looking at the metadata for the videos they favourite.

Evidence-based profiles are often better than those that are merely asserted, without being grounded in real activity. The list of people I actively exchange mail or IM with is more interesting to me than the list of people I’ve added on Facebook or Orkut; the same applies with profiles versus tag-harvesting. This is why the combination of last.fm’s knowledge of my music listening behaviour with the BBC’s categorisation of MusicBrainz artist IDs is more interesting than asking me to type my ‘favourite band’ into a box. Finding out which bands I’ve friended on MySpace would also be a nice piece of evidence to throw into that mix (and possible, since MusicBrainz also notes MySpace URIs).

So what do these profiles look like? The YouTube ‘retrieve a profile‘ API documentation has an example. It’s Atom-encoded, and beyond the video stuff mentioned above has fields like:

  <yt:age>33</yt:age>
  <yt:username>andyland74</yt:username>
  <yt:books>Catch-22</yt:books>
  <yt:gender>m</yt:gender>
  <yt:company>Google</yt:company>
  <yt:hobbies>Testing YouTube APIs</yt:hobbies>
  <yt:location>US</yt:location>
  <yt:movies>Aqua Teen Hungerforce</yt:movies>
  <yt:music>Elliott Smith</yt:music>
  <yt:occupation>Technical Writer</yt:occupation>
  <yt:school>University of North Carolina</yt:school>
  <media:thumbnail url='http://i.ytimg.com/vi/YFbSxcdOL-w/default.jpg'/>
  <yt:statistics viewCount='9' videoWatchCount='21' subscriberCount='1'
    lastWebAccess='2008-02-25T16:03:38.000-08:00'/>

Not a million miles away from the OpenSocial schema I was looking at yesterday, btw.

I haven’t yet found where it says what I can and can’t do with this information…

Opening and closing like flowers (social platform roundupathon)

Closing some tabs…

Stephen Fry writing on ‘social network’ sites back in January (also in the Guardian):

…what an irony! For what is this much-trumpeted social networking but an escape back into that world of the closed online service of 15 or 20 years ago? Is it part of some deep human instinct that we take an organism as open and wild and free as the internet, and wish then to divide it into citadels, into closed-border republics and independent city states? The systole and diastole of history has us opening and closing like a flower: escaping our fortresses and enclosures into the open fields, and then building hedges, villages and cities in which to imprison ourselves again before repeating the process once more. The internet seems to be following this pattern.

How does this help us predict the Next Big Thing? That’s what everyone wants to know, if only because they want to make heaps of money from it. In 1999 Douglas Adams said: “Computer people are the last to guess what’s coming next. I mean, come on, they’re so astonished by the fact that the year 1999 is going to be followed by the year 2000 that it’s costing us billions to prepare for it.”

But let the rise of social networking alert you to the possibility that, even in the futuristic world of the net, the next big thing might just be a return to a made-over old thing.

McSweenys:

Dear Mr. Zuckerberg,

After checking many of the profiles on your website, I feel it is my duty to inform you that there are some serious errors present. [...]

Lest-we-forget. AOL search log privacy goofup from 2006:

No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men” to “dog that urinates on everything.”

And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.”

It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. “Those are my searches,” she said, after a reporter read part of the list to her.

Time magazine punditising on iGoogle, Facebook and OpenSocial:

Google, which makes its money on a free and open Web, was not happy with the Facebook platform. That’s because what happens on Facebook stays on Facebook. Google would much prefer that you come out and play on its platform — the wide-open Web. Don’t stay behind Facebook’s closed doors! Hie thee to the Web and start searching for things. That’s how Google makes its money.

So, last fall, Google rallied all the other major social networks (MySpace, Bebo, Hi5 and so on) and announced a new initiative called OpenSocial. OpenSocial wants to be like Facebook’s platform, only much bigger: Widget makers can write applications for it and they can run anywhere — on MySpace, Bebo and Google’s own social network, Orkut, which is very big in Brazil.

Google’s platform could actually dwarf Facebook — if it ever gets off the ground.

Meanwhile on the widget and webapp security front, we have “BBC exposes Facebook flaw” (information about your buddies is accessible to apps you install; information about you is accessible to apps they install). Also see Thomas Roessler’s comments to my Nokiana post for links to a couple of great presentations he made on widget security. This includes a big oopsie with the Google Mail widget for MacOSX. Over in Ars Technica we learn that KDE 4.1 alpha 1 now has improved widget powers, including “preliminary support for SuperKaramba and Mac OS X Dashboard widgets“. Wonder if I can read my Gmail there…

As Stephen Fry says,  these things are “opening and closing like a flower”. The big hosted social sites have a certain oversimplifying retardedness about them. But the ability for code to go visit data (the widget/gadget model), is I think as valid as the opendata model where data flows around to visit code. I am optimistic that good things will come out of this ferment.

A few weeks ago I had the pleasure of meeting several of the Google OpenSocial crew in London. They took my grumbling about accessibility issues pretty well, and I hope to continue that conversation. Industry politics and punditry aside, I’m impressed with their professionalism and with the tie-in to an opensource implementation through Apache’s ShinDig project. The OpenSocial specs list is open to the public, where Cassie has just announced that “all 0.8 opensocial and gadgets spec changes have been resolved” (after a heroic slog through the issue list). I’m barely tracking the detail of discussion there, things are moving fast. There’s now a proposed REST API, for example; and I learned in London about plans for a formatting/templating system, which might be one mechanism for getting FOAF/RDF out of OpenSocial containers.

If OpenSocial continues to grow and gather opensource mindshare, it’s possible Facebook will throw some chunks of their platform over the wall (ie. “do an Adobe“). And it’ll probably be left to W3C to clean up the ensuring mess and fragmentation, but I guess that’s what they’re there for. Meanwhile there’s plenty yet to be figured out, … I think we’re in a pre-standards experimentation phase, regardless of how stable or mature we’re told these platforms are.

The fundamental tension here is that we want open data, open platforms, … for data and code to flow freely, but to protect the privacy, lives and blushes of those it describes. A tricky balance. Don’t let anyone tell you it’s easy, that we’ve got it figured out, or that all we need to do is “tear down the walls”.

Opening and closing like flowers…

MusicBrainz SQL-to-RDF D2RQ mapping from Yves Raimond

More great music-related stuff from Yves Raimond. He’s just announced (on the Music ontology list) a D2RQ mapping of the MusicBrainz SQL into RDF and SPARQL. There’s a running instance of it on his site. The N3 mapping files are on the  motools sourceforge site.

Yves writes…

Added to the things that are available within the Zitgist mapping:

  •  SPARQL end point
  •  Support for tags
  • Supports a couple of advanced relationships (still working my way  through it, though)
  • Instrument taxonomy directly generated from the db, and related to performance events
  • Support for orchestras

This is pretty cool, since the original MusicBrainz RDF is rather dated (if it’s even still available). The new representations are much richer and probably also easier to maintain.

Nearby in the Web: discussion of RDF views into MySpace; and the RDB2RDF Incubator Group at W3C discussions are getting started (this group is looking at technology such as D2RQ which map non-RDF databases into our strange parallel world…)

Google Social Graph API, privacy and the public record

I’m digesting some of the reactions to Google’s recently announced Social Graph API. ReadWriteWeb ask whether this is a creeping privacy violation, and danah boyd has a thoughtful post raising concerns about whether the privileged tech elite have any right to experiment in this way with the online lives of those who are lack status, knowledge of these obscure technologies, and who may be amongst the more vulnerable users of the social Web.

While I tend to agree with Tim O’Reilly that privacy by obscurity is dead, I’m not of the “privacy is dead, get over it” school of thought. Tim argues,

The counter-argument is that all this data is available anyway, and that by making it more visible, we raise people’s awareness and ultimately their behavior. I’m in the latter camp. It’s a lot like the evolutionary value of pain. Search creates feedback loops that allow us to learn from and modify our behavior. A false sense of security helps bad actors more than tools that make information more visible.

There’s a danger here of technologists seeming to blame those we’re causing pain for. As danah says, “Think about whistle blowers, women or queer folk in repressive societies, journalists, etc.”. Not everyone knows their DTD from their TCP, or understand anything of how search engines, HTML or hyperlinks work. And many folk have more urgent things to focus on than learning such obscurities, let alone understanding the practical privacy, safety and reputation-related implications of their technology-mediated deeds.

Web technologists have responsibilities to the users of the Web, and while media education and literacy are important, those who are shaping and re-shaping the Web ought to be spending serious time on a daily basis struggling to come up with better ways of allowing humans to act and interact online without other parties snooping. The end of privacy by obscurity should not mean the death of privacy.

Privacy is not dead, and we will not get over it.

But it does need to be understood in the context of the public record. The reason I am enthusiastic about the Google work is that it shines a big bright light on the things people are currently putting into the public record. And it does so in a way that should allow people to build better online environments for those who do want their public actions visible, while providing immediate – and sometimes painful – feedback to those who have over-exposed themselves in the Web, and wish to backpedal.

I hope Google can put a user support mechanism on this. I know from our experience in the FOAF community, even with small scale and obscure aggregators, people will find themselves and demand to be “taken down”. While any particular aggregator can remove or hide such data, unless the data is tracked back to its source, it’ll crop up elsewhere in the Web.

I think the argument that FOAF and XFN are particularly special here is a big mistake. Web technologies used correctly (posh – “plain old semantic html” in microformats-speak) already facilitate such techniques. And Google is far from the only search engine in existence. Short of obfuscating all text inside images, personal data from these sites is readily harvestable.

ReadWriteWeb comment:

None the less, apparently the absence of XFN/FOAF data in your social network is no assurance that it won’t be pulled into the new Google API, either. The Google API page says “we currently index the public Web for XHTML Friends Network (XFN), Friend of a Friend (FOAF) markup and other publicly declared connections.” In other words, it’s not opt-in by even publishers – they aren’t required to make their information available in marked-up code.

The Web itself is built from marked-up code, and this is a thing of huge benefit to humanity. Both microformats and the Semantic Web community share the perspective that the Web’s core technologies (HTML, XHTML, XML, URIs) are properly consumed both by machines and by humans, and that any efforts to create documents that are usable only by (certain fortunate) humans is anti-social and discriminatory.

The Web Accessibility movement have worked incredibly hard over many years to encourage Web designers to create well marked up pages, where the meaning of the content is as mechanically evident as possible. The more evident the meaning of a document, the easier it is to repurpose it or present it through alternate means. This goal of device-independent, well marked up Web content is one that unites the accessibility, Mobile Web, Web 2.0, microformat and Semantic Web efforts. Perhaps the most obvious case is for blind and partially sighted users, but good markup can also benefit those with the inability to use a mouse or keyboard. Beyond accessibility, many millions of Web users (many poor, and in poor countries) will have access to the Web only via mobile phones. My former employer W3C has just published a draft document, “Experiences Shared by People with Disabilities and by People Using Mobile Devices”. Last month in Bangalore, W3C held a Workshop on the Mobile Web in Developing Countries (see executive summary).

I read both Tim’s post, and danah’s post, and I agree with large parts of what they’re both saying. But not quite with either of them, so all I can think to do is spell out some of my perhaps previously unarticulated assumptions.

  • There is no huge difference in principle between “normal” HTML Web pages and XFN or FOAF. Textual markup is what the Web is built from.
  • FOAF and XFN take some of the guesswork out of interpreting markup. But other technologies (javascript, perl, XSLT/GRDDL) can also transform vague markup into more machine-friendly markup. FOAF/XFN simply make this process easier and less heuristic, less error prone.
  • Google was not the first search engine, it is not the only search engine, and it will not be the last search engine. To obsess on Google’s behaviour here is to mistake Google for the Web.
  • Deeds that are on the public record in the Web may come to light months or years later; Google’s opening up of the (already public, but fragmented) Usenet historical record is a good example here.
  • Arguing against good markup practice on the Web (accessible, device independent markup) is something that may hurt underprivileged users (with disabilities, or limited access via mobile, high bandwidth costs etc).
  • Good markup allows content to be automatically summarised and re-presented to suit a variety of means of interaction and navigation (eg. voice browsers, screen readers, small screens, non-mouse navigation etc).
  • Good markup also makes it possible for search engines, crawlers and aggregators to offer richer services.

The difference between Google crawling FOAF/XFN from LiveJournal, versus extracting similar information via custom scripts from MySpace, is interesting and important solely to geeks. Mainstream users have no idea of such distinctions. When LiveJournal originally launched their FOAF files in 2004, the rule they followed was a pretty sensible one: if the information was there in the HTML pages, they’d also expose it in FOAF.

We need to be careful of taking a ruthless “you can’t make an omelete without breaking eggs” line here. Whatever we do, people will suffer. If the Web is made inaccessible, with information hidden inside image files or otherwise obfuscated, we exclude a huge constituency of users. If we shine a light on the public record, as Google have done, we’ll embarass, expose and even potentially risk harm to the people described by these interlinked documents. And if we stick our head in the sand and pretend that these folk aren’t exposed, I predict this will come back to bite us in the butt in a few months or years, since all that data is out there, being crawled, indexed and analysed by parties other than Google. Parties with less to lose, and more to gain.

So what to do? I think several activities need to happen in parallel:

  • Best practice codes for those who expose, and those who aggregate, social Web data
  • Improved media literacy education for those who are unwittingly exposing too much of themselves online
  • Technology development around decentralised, non-public record communication and community tools (eg. via Jabber/XMPP)

Any search engine at all, today, is capable of supporting the following bit of mischief:

Take some starting point a collection of user profiles on a public site. Extract all the usernames. Find the ones that appear in the Web less than say 10,000 times, and on other sites. Assume these are unique userIDs and crawl the pages they appear in, do some heuristic name matching, … and you’ll have a pile of smushed identities, perhaps linking professional and dating sites, or drunken college photos to respectable-new-life. No FOAF needed.

The answer I think isn’t to beat up on the aggregators, it’s to improve the Web experience such that people can have real privacy when they need it, rather than the misleading illusion of privacy. This isn’t going to be easy, but I don’t see a credible alternative.

MySpace open data oopsie

Latest megasite privacy screwup, this time from MySpace who appear to have allowed users to consider photos “private” when associated with a private profile, while (as far as I can make out) have the URLs visible of guessable. Whoopsadaisy.  Predictably enough someone has crawled and shared many of the images. Wired reports that the site knew about the flaw for months, and didn’t address it. I hope that’s not true.

Open social networks: bring back Iran

Three years ago, we lost Iran from Internet community. I simplify somewhat, but forgivably. Many Iranian ISPs cut off access to blogs and social networking sites, on government order. At the time, Iran was one of the most active nations on Orkut; and Orkut was the network of choice, faster than the then-fading Friendster, but not yet fully eclipsed by MySpace. It provided a historically unprecedented chance for young people from Iran, USA, Europe and the world to hang out together in an online community. But when Orkut was blocked at the ISP level in Iran, pretty much nobody in the English-speaking blog-tech-pundit scene seemed to even notice. This continues to bug me. Web technologists apparantly care collectively more about freeing Robert Scoble’s addressbook from Facebook, than about the real potential for unmediated, uncensored, global online community.

Most folk in the US will never visit Iran, and vice-versa. And the press and government in both states are engaged in scary levels of sabre-rattling and demonisation. For me, one of the big motivations for working (through FOAF, SPARQL, XMPP and other technologies) on social networking interop, is so young people in the future can grow up naturally having friends in distant nations, regardless of whether their government thinks that’s a priority. If hundreds of blog posts can be written about the good Mr Scoble’s addressbook portability situation, why are thousands of posts not being written about the need for social networking tools to connect people regardless of nationality and national firewalls?

Some things are too important to leave to governments…

Update: a few hours after writing this, things get hairy in Hormuz.  Oof…

Begin again

facebook grabThere was an old man named Michael Finnegan
He went fishing with a pinnegan
Caught a fish and dropped it in again
Poor old Michael Finnegan
Begin again.

Let me clear something up. Danny mentions a discussion with Tim O’Reilly about SemWeb themes.

Much as I generally agree with Danny, I’m reaching for a ten-foot bargepole on this one point:

While Facebook may have achieved pretty major adoption for their approach, it’s only very marginally useful because of their overly simplistic treatment of relationships.

Facebook, despite the trivia, the endless wars between the ninja zombies and the pirate vampires; despite being centralised, despite [insert grumble] is massively useful. Proof of that pudding: it is massively used. “Marginal” doesn’t come into it. The real question is: what happens next?

Imagine 35 million people. Imagine them marching thru your front room. Jumping off a table at the same time. Sending you an email. Or turning the tap off when they brush their teeth. 35 million is a fair-sized nation. Taking that 35 million figure I’ve heard waved around, and placing it in the ever scientific Wikipedia listing … that puts the land of Facebook somewhere between Kenya and Algeria in the population charts. Perhaps the figures are exagerrated. Perhaps a few million have wandered off, or forgotten their passwords. Doubtless some only use it every month or few.

Even a million is a lot of use; and a lot of usefulness.

Don’t let anything I ever say here in this blog be taken as claiming such sites and services are only marginally useful. To be used is to be useful; and that’s something SemWeb people should keep in the forefront of their minds. And usually they do, I think, although the community tends towards the forward-looking.

But let’s be backwards-looking for a minute. My concern with these sites is not that they’re marginally useful, but that they could be even more useful. Slight difference of emphasis. SixDegrees.com was great, back in 2000 when we started FOAF. But it was a walled garden. It had cool graph traversal stuff that evocatively showed your connection path to anyone else in the network. Their network. Then followed Friendster, which got slow as it proved useful to too many people. Ditto Orkut, which everyone signed up to, then wandered off from when it proved there was rather little to do there except add people. MySpace and Facebook cracked that one, … but guess what, there’ll be more.

I got a signup to Yahoo’s Mash yesterday. Anyone wanna be my friend? It has fun stuff (“Mecca Ibrahim smacked The Mash Pet (your Mash pet)!”), … wiki-like profile editing, extension modules … and I’d hope given that this is 2007, eventually some form of API. People won’t live in Facebook-land forever. Nor in Mash, however fun it is. I still lean towards Jabber/XMPP as the long-term infrastructure for this sort of system, but that’s for another time. The appeal of SixDegrees, of Friendster, of Orkut … wasn’t ever the technology. It was the people. I was there ‘cos others were there. Nothing more. And I don’t see this changing, no matter how much the underlying technology evolves. And people move around, drift along to the next shiny thing, … go wherever their friends are. Which is our only real problem here.

Begin again.

I’ve been messing with RDF a bit. I made a sample SPARQL query that asks (exported RDF from) a few networks about my IM addresses; here are the results from Redland/Rasqal JSON.