In the NoTube project, we are exploring the use of Semantic Web technology in Television and Web-TV scenarios. By making use of richer and linked descriptions of content and users, we hope to help users better find (and annotate, tag, cross-link etc.) content that is interesting to them. The growing amount of linked RDF data out there in the public Web provides a useful background dataset; for example we can use SKOS thesauri or DBpedia to indicate content topics or user interests.
I have been looking at aspects of the existing mainstream ecosystem, including so-called media centre systems, and at the various classes of app (remote controls, tv guides, media players) available on smart phones such as the iPhone. At the moment, all these applications are quite fragmented, with different pairs of remote control and player, different APIs, different metadatasystems.
This post gives some background thinking before jumping into super-technical details. In the next post I’ll give an overview of some of the ‘media centre’ APIs I’ve been looking at; this is the beginnings of a survey to explore the extent to which existing players can be driven by external software through a network interface. My working hypothesis is that the XMPP protocol provides the best candidate protocol environment to do this, since it allows devices to be addressed globally, even when they’re in the home; however the initial survey makes no assumptions about how these APIs might be unified.
Nearby: for some smartphone visuals, see my Flickr’d collection of iPhone screen grabs. I hope to revisit and annotate these later. For ongoing W3C work on an API and ontology for Media Objects, see the Use Cases and Requirements document over at W3C. This work is largely focussed on describing individual media objects, however it does note a requirement for “Being able to apply the ontology / API for collections of metadata”, as well as a requirement to allow references “to multimedia objects on several abstraction levels, in order to separate e.g. a movie, a DVD which contains the movie and a specific copy of the DVD. Especially for collections of multimedia objects, knowledge about such abstraction levels is helpful, as a means for accessing the objects on each level.”
In NoTube, we are also looking at work such as the BBC Programmes Ontology, which makes such distinctions in terms of brands, series, shows, episodes etc. This is not only to group different representations of the same content (eg. a mobile-friendly downcoding of a high-definition news article); it also allows us to cluster metadata about similar items or different versions or manifestations of the same underlying content.
Since metadata is notoriously expensive and hard to manage, having it richly interlinked makes a lot of sense.
For example, if I write a comment while watching a file that I created from an Amazon purchased DVD of “The Fog of War”, it would be a waste (and lead to a very sparse metadata network) if my annotation was only available to other users of that exact same file within this closed home network, although the media file itself must of course stay private. The first time I saw that movie, was a screening in Bristol’s Cube Cinema. If I wrote a blog post about the movie back then, that information should also be linked to common identifiers (eg. IMDB, FreeBase, Wikipedia, DBPedia, Sony’s page, the director’s synopsis etc.). But there’s more annotation in the Web than just my annotations: the director recently wrote a blog post about the main subject of the film, Robert McNamara. The director, Errol Morris, also publishes a transcript of the movie, and his homepage links to his Twitter microblog. If the content is available eg. through a system like Joost or Hulu, or broadcast on TV, or screened at a local cinema, most metadata about it will still be relevant, even if it was created in a different context.
For Web-heads, this online data-linking story is business as usual; for the broadcast TV industry, it’s something of a new world.
The Web can be more than a way of driving content to viewing portals, it can be more than a threat to content owners, it can change the way we think about what all these moving pictures are telling us. Douglas Adams described this very well 20 years ago, and if you’ve not seen his Hyperland documentary on pre-Web hypertext, I do urge you to go find a way to see it.
So, let’s stick with my example movie, and work outwards from an assumption we have one reasonable identifier for the film that connects us to a Web of linked data about it. Errol Morris’s (excellent) film The Fog of War. How could a simple piece of software (running on a smart phone or home media centre) figure out that he is the director? Well, if we believe Wikipedia infoboxes and we find and believe DBpedia’s derived page about the film, the information is there in a machine-friendly form: it says that that ‘director’ property of the film is Errol Morris. And following that link, in turn we can see a claim that Errol Morris has a ‘website’ at www.errolmorris.com.
So, working outwards from the content, we find the director, and the director’s website url. What can we do with that? Well we could try loading pages from the site directly, and interpreting them. But the chances are currently low that we’ll find any machine-oriented, automation-friendly markup there. And indeed, if we look at the source of his page, there isn’t a lot there to help machines. Here is the markup for the twitter link mentioned earlier:
<a href="http://twitter.com/ErrolMorris" target="_blank">
If the page had included just a few extra characters, the microformat markup “rel=’me’“, or had a FOAF file, we could automatically have discovered that he had a microblog. He also has another blog on the New York Times site, and both of these have feeds (Twitter RSS; NYTimes RSS). Blogs and microblogs provide a way of establishing a more vivid and immediate connection between viewers and content creators; but finding the relevant feed for a given media object is non-trivial. So let’s walk through some of the issues and technologies.
0. Content identification
For any of this ‘Linked TV’ scenario to make any sense, we need to get our hands on a solid widely-known identifier for the content. This is our entry pass into Douglas Adams’ ‘hyperland’, and is harder than it might seem. Typically all we have to identify a piece of content is information about a file, and maybe a textual label or two. There are a variety of automatic and semi-automatic approaches here, and growing support for rich disambiguation in freely available tools (eg. Boxee).
1. Reliability of linked data
Given a wikipedia or imdb url, we can find a lot of linked RDF data. In the example here, we use dbpedia to find the director’s homepage. Is this risky? Could the page be edited mischievously? Yes! Are there scenarios in which such Webby uncertainty is inappropriate in a TV context? Yes. Can we expect to see commercial and Musicbrainz-style collaborative enrichment and QA of linked datasets? I think so.
2. Social Graph discovery
OK, so we’ve found the director’s homepage. Maybe we could have found actors, presenters or writers too, given different content. This is great, but we don’t have much there that makes sense in a TV user interface. How do we find the twitter microblog programmatically? or his New York Times blog? How can we be sure of those links?
This is partially a matter of waiting, and partially a chicken-and-egg situation. If we build TV tools and lightweight standards which work better if content creator sites have a little extra markup (eg. rel=’me’), some content creators will add it to their homepages and blogs, and some hosting / tool vendors will add it automatically anyway. But what can we do today? Let’s look at Google’s Social Graph API.
The SGAPI allows us to take advantage of Google’s global Web index, and ask questions about people, their profiles, account pages, and connections.
We can ask for example, which accounts are claimed by http://www.errolmorris.com/. Today this gives no results, since the link in that page to the matching twitter page contains no semantic markup.
We can also ask, which accounts are claimed by http://twitter.com/ErrolMorris … and this finds some interesting information, since (unlike errormorris.com) Twitter is a site that is understood by the Google Social Graph API. We find a location (Boston MA), some photo and feed urls, but also a link from ErrolMorris on twitter to his homepage, www.errolmorris.com.
This last point is important: Google Social Graph API is structured in terms of claims. This gives it a lot more robustness against spammers and mischief. SGAPI notices that the markup on the Twitter page says, in effect, “this is my homepage over here” (in microformat-speak, ‘this is (also) me’). Here it is in full, … the link from Errol Morris on Twitter, linking to his homepage:
<a href="http://www.errolmorris.com" class="url" rel="me nofollow" target="_blank">
But the Google SGAPI notices that the homepage doesn’t explicitly reciprocate the claim. There is no machine markup in the homepage to indicate that the owner of the page is saying that the twitter page “is me”. It might just be a link to a friend, for example.
Nevertheless, we can use the SGAPI in less trusting mode (‘show inbound links’), and take advantage of Google’s massive Web index to ask: which pages claim to have the same owner as www.errolmorris.com?
This gives us a story about how we can find a lot of useful contextual information, given a basic starting point. The Google service I show here is just one of several that could be exploited in ‘linked tv’ scenarios.
For finding other identifiers, we might use sameas.org. Here’s what I get when I ask it for other URIs for the fog of war film, using a dbpedia uri as entry point:
For ‘social graph’ data, we might also check sindice.com or foaf.qdos.com.
A quick word on topics and wikipedia: by using content identifiers that link to Wikipedia and DBpedia, we have the potential for extremely rich, fine grained topical annotations.
Wikipedia’s page about the movie notes that it covers the Cuban Missile Crisis. This association is now machine-visible, since I personally improved the relevant Wikipedia entry to explicitly link to the Cuban Missile Crisis page on Wikipedia. This allows us to use wikipedia:Cuban_Missile_Crisis as a topic indicator, not just against the entire film but against particular segments of the movie that are about that topic. It is easy to get a list of such links from the Wikipedia markup, eg. [[Ford Motor Company]], [[Vietnam War]] are also already there. If we have a TV presentation system that has a unique id for the content (and ideally content-version, since content often varies), and we know an offset in seconds, then rich topical tags could be applied to sections of the film, without the need for textual data entry. An iphone or similar device could allow users to pause, annotate/tag/bookmark and continue their viewing, with no need for a keyboard.
That smart-phone story there is worth investigating, but let’s first start closer to the screen: what kinds of set top box, media centre or gadget might be able to index, navigate and play content in a way that makes interesting use of ‘linked tv’ techniques such as these? Which brings me to APIs for ‘media centre‘ systems, finally.
If we want to recommend interesting content to users, show them relevant links, annotations, related materials (not necessarily more video – text, audio, even a spreadsheet with statistics might be appropriate), suggestions from friends, upcoming broadcasts or archived materials, then we need an environment that is scripting friendly and capable of interacting with users in a rich and compelling manner. The simplest path here is to start with what’s out there already, and look at commonalities in API and data model, to see how far existing software can be ‘remote controlled’ from external scripts (and, eventually, from actual hand-held remote controls, eg. smartphones). I’ll go into some detail on that in the followup post.
The main point I want to (re-)emphasise here, is that once we make get to the stage of having well known identifiers for content, it facilitates a very rich marketplace for TV-related metadata, with emphasis on the word “-related“. And that this is necessarily very open-ended, since TV content can be about absolutely anything. I gave some examples drawing on Wikipedia/DBpedia for content metadata, and on homepages, twitter and Google’s Social Graph API to show how additional highly relevant information can be pulled from the Web, once we get a basic starting point. Finding ways of presenting such extra information to users, and giving them appropriate navigation and interaction options, is far from easy. Fortunately it is easier to share, syndicate and merge TV meta-content, than TV content. We are already seeing systems such as the XBMC-based Boxee which will normalise content identifiers in a way that encourages legal uses over illegal. Intellectual properties issues around TV content means that actual playable content is often not broadly shareable. However the same need not be true of user-supplied metadata, since this can be about a specific media file, but also it can be about the things the content is about. I expect to see TV meta-content shared in a global linked system, even while the underlying video and audio remain relatively hidden away; rich user-supplied TV metadata isn’t just about the TV show, it equally can tell us about the world and the viewer, and deserves to be widely available through open standards.
To go back to my original example, Errol Morris’s film about Robert McNamara, The Fog of War… there’s a world of metadata options beyond five-star ratings that can enrich such content, beyond worrying about numerical ranking in statistical recommender systems. Existing work there, eg. around the Netflix prize, would be hard to beat. Statistical recommendations work well over a regular dataset where everything has a well known identifier. We can see RDF and linked data techniques serving a pre-processing role for such analysis, by linking together otherwise fragmented pieces of information about content, allowing classic techniques to be applied over a wider dataset.
However, I see the true ‘linked TV’ potential to be primarily in another direction: in creating more meaningful conversations around content, and helping users find other information that gives a complementary perspective on the materials, and on cross-referencing everything with everything so that unexpected new paths can be found. Morris’ film can teach us a lot about McNamara, but also about the wider world and recent history … about the Ford Motor Company, the Vietnam War, the Cuban Missile Crisis, the firebombing of Tokyo under the command of Curtis LeMay. Not all easy or pleasant topics to try to understand, and each with thousands of other relevant sources (video or otherwise) out there, different perspectives to cross-reference, different accounts to reconcile.
If television is an environment in which we can be informed, educated and entertained (by broadcasters, content creators, and increasingly, by everyone else too), we need to think through what this means for better metadata. Today we are lucky if we can find a good way of even identifying a piece of content. But tomorrow, we should have TV that comes out of the box with “Do You Want To Know More?” switched ‘on‘ by default.
When Obama gives a talk (see his June 2009 speech in Cairo) he mentioned the following, at around 33 minutes and 30 seconds:
For many years, Iran has defined itself in part by its opposition to my country, and there is in fact a tumultuous history between us. In the middle of the Cold War, the United States played a role in the overthrow of a democratically elected Iranian government.
This last point is well covered by historians, but was likely to be new information for many viewers in the USA and UK, if not for those in Iran. What can we do to improve that, by bringing Web and TV closer together?
While the raw video, audio, and transcript are available on the Whitehouse site, we don’t yet have enough for “do you want to know more?”. W3C’s TimedText work might provide a basis for associating the transcript with the video as subtitles. This is not enough for someone who wants to know more, in following up this unusual acknowledgement of superpower interference.
An Internet-literate, laptop-owning Web user might try to find out more in an active manner. Although they are in a minority amongst TV viewers, there are still thousands of such people, and they know how to go to google.com and do searches. Can we harness their energy to improve the TV metadata environment for everyone? If you search Google for ‘US overthrow iranian government’ you find a good starting point for learning more, a Wikipedia page “1953 Iranian coup d’état“. Although that page has content which Wikipedia flags as controversial, it has a world-visible talk page in which contributors from around the world can debate the detail of the topic. So how can this Web content be bridged with the world of TV?
TV viewers are commonly concerned characterised in terms of a pyramid:
- a passive majority, content to watch without interaction
- a smaller group who will interact and navigate, but who rarely create new content
- an even smaller group of activist users, who will explore, annotate, interact and create using all the tools available
The production and consumption of ‘do you want to know more?’ annotation won’t equally distributed. All viewers can benefit from, for example, a link from the Obama Cairo speech video to background information on the 1953 Iran coup. Some viewers might benefit indirectly, because their more inquisitive friends and family will have explored the linked materials and will talk to them about it. Others might read the wikipedia page and linked pages directly; if not on their television, then later on a laptop or PC.
Still others (a minority, but such minorities can be influential) might engage further with the material, become intrigued by the differences of opinion and seek more perspectives from the wider Web. Now 1953 is not ancient history; it falls within the living memory of countless people, in Iran, the UK and the world at large. A challenge for Linked TV is to find ways to integrate those memories and views into the TV environment, such that any viewer could pause the playing of the Obama Cairo speech video because they want to know more about what he just said. Regardless of whether they are watching on whitehouse.gov, YouTube, Joost, a live re-broadcast, a radio-extract, a player embedded in a social network site (facebook, orkut, hyves, …), or running on a mobile phone. The current problem Iranians have with getting full and high-bandwidth Web access are another obstacle, of course, but eventually we can expect relevant video or audio memories from Iran (with subtitles and translations) to be 1 or 2 clicks away from Obama’s video, regardless of the software and environment playing the video. Much of that material is already out there somewhere, the challenge is to find it, link it and present it appropriately.
The core piece of metadata, a link from a section of the video (ie. the offset, 33mins, 30 secs) of his speech to a URL for the topic it describes, could be created by whitehouse.gov staff, or it could be created by activist users. We pretty much know how to do this. A time-offset and a wikipedia URL do that job well enough for now. But what we have so far failed to achieve is a world-wide data ecosystem in which this kind of TV-enriching metadata is plentiful, widely used, and comprehensive enough that we can grow to expect to have fact-checkable TV, cross-referencable TV, hypertext TV. Instead of throwing physical things at the TV when we disagree with what we hear, it should become commonplace to be able to press pause and then or later publish your perspective (an audio rant, a fact-check crosslink, a blog post) in a way that will become accessible by other viewers of that material in the future.
To achieve this, we can’t afford needless fragmentation
- of meta-content about different versions of the same basic content
- between geographical regions based on geo-rights to play the material
- between desktop, media-centre, mobile, set top box, broadcast and on-demand scenarios
- between radio and TV
- between files and streams
- between file formats
- between populations (‘net censorship and crude blocking technology)
If these huge obstacles can be overcome, it might become reasonable to expect to find useful information attached to most of what we see on next-generation TV, regardless of delivery system. My hunch is the right place to prototype is around opensource and hackable media centre systems, and the right low-level communications system is XMPP. I’ll go into this more in the next post. Apologies for the length of this one. Comments welcomed by blog email or whatever!