Mozilla Ubiquity

The are some interesting things going on at Mozilla Labs. Yesterday, Ubiquity was all over the mailing lists. You can think of it as “what the Humanized folks did next”, or as a commandline for the Web, or as a Webbier sibling to QuickSilver, the MacOSX utility. I prefer to think of it as the Mozilla add-on that distracted me all day. Ubiquity continues Mozilla’s exploration of the potential UI uses of its “awesome bar” (aka Location bar). Ubiquity is invoked on my Mac with alt-space, at which point it’ll enthusiastically try to autocomplete a verb-centric Webby task from whatever I type. It does this by consulting a pile of built-in and community-provided Javacript functions, which have access to the Web, your browser (hello, widget security fans)… and it also has access to UI, in terms of an overlaid preview window, as well as a context menu that can actually be genuinely contextual, ie. potentially sensitive to microformat and RDFa markup.

So it might help to think of ubiquity as a cross between The Hobbit, GreaseMonkeyBookmarklets, and Mozilla’s earlier forms of packaged addon. Ok, well it’s not very Hobbit, I just wanted an excuse for this screen grab. But it is about natural language interfaces to complex Webby datasources and services.

The basic idea here is that commands (triggered by some keyword) can be published in the Web as links to simple Javascript files that can be single-click added (without need for browser restart) by anyone trusting enough to add the code to their browser. Social/trust layers to help people avoid bad addons are in the works too.

I spent yesterday playing. There are some rough edges, but this is fun stuff for sure. The emphasis is on verbs, hence on doing, rather than solely on lookups, query and data access. Coupled with the dependency on third party Javascript, this is going to need some serious security attention. But but but… it’s so much fun to use and develop for. Something will shake out security-wise. Even if Ubiquity commands are only shared amongst trusting power users who have signed each other’s PGP keys, I think it’ll still have an important niche.

What did I make? A kind of stalk-a-tron, FOAF lookup tool. It currently only consults Google’s Social Graph API, an experimental service built from all the public FOAF and XFN on the Web plus some logic to figure out which account pages are held by the same person. My current demo simply retrieves associated URLs and photos, and displays them overlaid on the current page. If you can’t get it working via the Ubiquity auto-subscribe feature, try adding it by pasting the raw Javascript into the command-editor screen. See also the ‘sindice-term‘ lookup tool from Michael Hausenblas. It should be fun seeing how efforts like Bengee’s SPARQLScript work can be plugged in here, too.

YouAndYouAndYouTube: Viacom, Privacy and the Social Graph API

From Wired via Thomas Roessler:

Google will have to turn over every record of every video watched by YouTube users, including users’ names and IP addresses, to Viacom, which is suing Google for allowing clips of its copyright videos to appear on YouTube, a judge ruled Wednesday.

I hope nobody thought their behaviour on youtube.com was a private matter between them and Google.

The Judge’s ruling (pdf) is interesting to read (ok, to skim). As the Wired article says,

The judge also turned Google’s own defense of its data retention policies — that IP addresses of computers aren’t personally revealing in and of themselves, against it to justify the log dump.

Here’s an excerpt. Note that there is also a claim that youtube account IDs aren’t personally identifying.

Defendants argue that the data should not be disclosed because of the users’ privacy concerns, saying that “Plaintiffs would likely be able to determine the viewing and video uploading habits of YouTube’s users based on the user’s login ID and the user’s IP address” .

But defendants cite no authority barring them from disclosing such information in civil discovery proceedings, and their privacy concerns are speculative.  Defendants do not refute that the “login ID is an anonymous pseudonym that users create for themselves when they sign up with YouTube” which without more “cannot identify specific individuals”, and Google has elsewhere stated:

“We . . . are strong supporters of the idea that data protection laws should apply to any data  that could identify you.  The reality is though that in most cases, an IP address without additional information cannot.” — Google Software Engineer Alma Whitten, Are IP addresses personal?, GOOGLE PUBLIC POLICY BLOG (Feb. 22, 2008)

So forget the IP address part for now.

Since early this year, Google have been operating an experimental service called the Social Graph API. From their own introduction to the technology:

With so many websites to join, users must decide where to invest significant time in adding their same connections over and over. For developers, this means it is difficult to build successful web applications that hinge upon a critical mass of users for content and interaction. With the Social Graph API, developers can now utilize public connections their users have already created in other web services. It makes information about public connections between people easily available and useful.

Only public data. The API returns web addresses of public pages and publicly declared connections between them. The API cannot access non-public information, such as private profile pages or websites accessible to a limited group of friends.

Google’s Social Graph API makes easier something that was already possible: using XFN and FOAF markup from the public Web to associate more personal information with YouTube accounts. This makes information that was already public increasingly accessible to automated processing. If I choose to link to my YouTube profile with the XFN markup rel=’me’ from another of my profiles,  those 8 characters are sufficient to bridge my allegedly anonymous YouTube ID with arbitrary other personal information. This is done in a machine-readable manner, one that Google has already demonstrated a planet-wide index for.

Here is the data returned by Google’s Social Graph API when asking for everything about my YouTube URL:

{
 "canonical_mapping": {
  "http://youtube.com/user/modanbri": "http://youtube.com/user/modanbri"
 },
 "nodes": {
  "http://youtube.com/user/modanbri": {
   "attributes": {
    "url": "http://youtube.com/user/modanbri",
    "profile": "http://youtube.com/user/modanbri",
    "rss": "http://youtube.com/rss/user/modanbri/videos.rss"
   },
   "claimed_nodes": [
   ],
   "unverified_claiming_nodes": [
    "http://friendfeed.com/danbri",
    "http://www.mybloglog.com/buzz/members/danbri"
   ],
   "nodes_referenced": {
   },
   "nodes_referenced_by": {
    "http://friendfeed.com/danbri": {
     "types": [
      "me"
     ]
    },
    "http://guttertec.swurl.com/friends": {
     "types": [
      "friend"
     ]
    },
    "http://www.mybloglog.com/buzz/members/danbri": {
     "types": [
      "me"
     ]
    }
   }
  }
 }
}

You can see here that the SGAPI, built on top of Google’s Web crawl of public pages, has picked out the connection to my FriendFeed (see FOAF file) and MyBlogLog (see FOAF file) accounts, both of whom export XFN and FOAF descriptions of my relationship to this YouTube account, linking it up with various other sites and profiles I’m publicly associated with.

YouTube users who have linked their YouTube account URLs from other social Web sites (something sites like FriendFeed and MyBlogLog actively encourage), are no longer anonymous on YouTube. This is their choice. It can give them a mechanism for sharing ‘favourited’ videos with a wide circle of friends, without those friends needing logins on YouTube or other Google services. This clearly has business value for YouTube and similar ‘social video’ services, as well as for users and Social Web aggregators.

Given such a trend towards increased cross-site profile linkage, it is unfortunate to read that YouTube identifiers are being presented as essentially anonymous IDs: this is clearly not the case. If you know my YouTube ID ‘modanbri’ you can quite easily find out a lot more about me, and certainly enough to find out with strong probability my real world identity. As I say, this is my conscious choice as a YouTube user; had I wanted to be (more) anonymous, I would have behaved differently. To understand YouTube IDs as being anonymous accounts is to radically misunderstand the nature of the modern Web.

Although it wouldn’t protect against all analysis, I hope the user IDs are at least scrambled before being handed over to Viacom. This would make it harder for them to be used to look up other data via (amongst other things) Google’s own YouTube and Social Graph APIs.

Update: I should note also that the bridging of YouTube IDs with other profiles is one that is not solely under the control of the YouTube user. Friends, contacts, followers and fans on other sites can link to YouTube profiles freely; this can be enough to compromise an otherwise anonymous account. Increasingly, these links are machine-processable; a trend I’ve previously argued is (for better or worse) inevitable.

Furthermore, the hypertext and data environment around YouTube and the Social Web is rapidly evolving; the lookups and associations we’ll be able to make in 1-2 years will outstrip what is possible today. It only takes a single hyperlink to reveal the owner of a YouTube account name; many such links will be created in the months to come.