Local Video for Local People

OK it’s all Google stuff, but still good to see. Go to Google Maps, My Maps, to find ‘Videos from YouTube’ listed. Here’s where I used to live (Bristol UK) and where I live now (Amsterdam, The Netherlands). Here’s a promo film of some nearby art installations from ArtZuid, who even have a page in English. I wouldn’t have found the video or the nearby links except through the map overlay. I don’t know exactly how they’re geotagging the videos, I can’t see an option under ‘my videos’ in YouTube, so perhaps it’s automatic or viewer annotations. In YouTube, you can add a map link under ‘My Videos’ / ‘Edit Video’; I didn’t see that initially. I made some investigations into similar issues (videos on maps) while at Joost; see brief mention in my Fundamentos Web slides from a couple of years ago.
Oh, nearly forgot to mention: zooming out to get a Europe or World-wide view is quite striking too.

Posted in General, Geo, Technology, ggg | 1 Comment

Quick clarification on SPARQL extensions and “Lock-in”

It’s clear from discussion bouncing around IRC, Twitter, Skype and elsewhere that “Lock-in” isn’t a phrase to use lightly.

So I post this to make myself absolutely clear. A few days ago I mentioned in IRC a concern that newcomers to SPARQL and RDF databases might not appreciate which SPARQL extensions are widely implemented, and which are the specialist offerings of the system they happen to be using. I mentioned OpenLink’s Virtuoso in particular as a SPARQL implementation that had a rich and powerful set of extensions.

Since it seems there is some risk I might be mis-interpreted as suggesting OpenLink are actively trying to “do a Microsoft” and trap users in some proprietary pseudo-SPARQL, I’ll state what I took to be obvious background knowledge: OpenLink is a company who owe their success to the promotion of cross-vendor database portability, they have been tireless advocates of a standards-based Semantic Web, and they’re active in proposing extensions to W3C for standardisation. So – no criticism of OpenLink intended. None at all.

All I think we need here, are a few utilities that help developers understand the nature of the various SPARQL dialects and the potential costs/benefits of using them. Perhaps an online validator, alongside those for RDF/XML, RDFa, Turtle etc. Such a validator might usefully list the extensions used in some query, and give pointers (perhaps into a wiki) where the status of the various extensions constructs can be discussed and documented.

Since SPARQL is such a young language, it lacks a lot of things that are taken from granted in the SQL world, and so using rich custom extensions when available is for many developers a sensible choice. My only concern is that it must be a choice, and one entered into consciously.

Posted in SPARQL, Semantic Web, Technology, coding | 4 Comments

Linked TV (part 1): Why APIs and identifiers matter

In the NoTube project, we are exploring the use of Semantic Web technology in Television and Web-TV scenarios. By making use of richer and linked descriptions of content and users, we hope to help users better find (and annotate, tag, cross-link etc.) content that is interesting to them. The growing amount of linked RDF data out there in the public Web provides a useful background dataset; for example we can use SKOS thesauri or DBpedia to indicate content topics or user interests.

I have been looking at aspects of the existing mainstream ecosystem, including so-called media centre systems, and at the various classes of app (remote controls, tv guides, media players) available on smart phones such as the iPhone. At the moment, all these applications are quite fragmented, with different pairs of remote control and player, different APIs, different metadatasystems.

This post gives some background thinking before jumping into super-technical details. In the next post I’ll give an overview of some of the ‘media centre’ APIs I’ve been looking at; this is the beginnings of a survey to explore the extent to which existing players can be driven by external software through a network interface. My working hypothesis is that the XMPP protocol provides the best candidate protocol environment to do this, since it allows devices to be addressed globally, even when they’re in the home; however the initial survey makes no assumptions about how these APIs might be unified.

Nearby: for some smartphone visuals, see my Flickr’d collection of iPhone screen grabs. I hope to revisit and annotate these later. For ongoing W3C work on an API and ontology for Media Objects, see the Use Cases and Requirements document over at W3C. This work is largely focussed on describing individual media objects, however it does note a requirement for “Being able to apply the ontology / API for collections of metadata”, as well as a requirement to allow references “to multimedia objects on several abstraction levels, in order to separate e.g. a movie, a DVD which contains the movie and a specific copy of the DVD. Especially for collections of multimedia objects, knowledge about such abstraction levels is helpful, as a means for accessing the objects on each level.

In NoTube, we are also looking at work such as the BBC Programmes Ontology, which makes such distinctions in terms of brands, series, shows, episodes etc. This is not only to group different representations of the same content (eg. a mobile-friendly downcoding of a high-definition news article); it also allows us to cluster metadata about similar items or different versions or manifestations of the same underlying content.

Since metadata is notoriously expensive and hard to manage, having it richly interlinked makes a lot of sense.

For example, if I write a comment while watching a file that I created from an Amazon purchased DVD of “The Fog of War”, it would be a waste (and lead to a very sparse metadata network) if my annotation was only available to other users of that exact same file within this closed home network, although the media file itself must of course stay private. The first time I saw that movie, was a screening in Bristol’s Cube Cinema. If I wrote a blog post about the movie back then, that information should also be linked to common identifiers (eg. IMDB, FreeBase, WikipediaDBPedia, Sony’s page, the director’s synopsis etc.). But there’s more annotation in the Web than just my annotations: the director recently wrote a blog post about the main subject of the film, Robert McNamara. The director, Errol Morris, also publishes a transcript of the movie, and his homepage links to his Twitter microblog. If the content is available eg. through a system like Joost or Hulu, or broadcast on TV, or screened at a local cinema, most metadata about it will still be relevant, even if it was created in a different context.

For Web-heads, this online data-linking story is business as usual; for the broadcast TV industry, it’s something of a new world.

The Web can be more than a way of driving content to viewing portals, it can be more than a threat to content owners, it can change the way we think about what all these moving pictures are telling us. Douglas Adams described this very well 20 years ago, and if you’ve not seen his Hyperland documentary on pre-Web hypertext, I do urge you to go find a way to see it.

So, let’s stick with my example movie, and work outwards from an assumption we have one reasonable identifier for the film that connects us to a Web of linked data about it. Errol Morris’s (excellent) film The Fog of War. How could a simple piece of software (running on a smart phone or home media centre) figure out that he is the director? Well, if we believe Wikipedia infoboxes and we find and believe DBpedia’s derived page about the film, the information is there in a machine-friendly form: it says that that ‘director’ property of the film is Errol Morris. And following that link, in turn we can see a claim that Errol Morris has a ‘website’ at www.errolmorris.com.

So, working outwards from the content, we find the director, and the director’s website url. What can we do with that? Well we could try loading pages from the site directly, and interpreting them. But the chances are currently low that we’ll find any machine-oriented, automation-friendly markup there. And indeed, if we look at the source of his page, there isn’t a lot there to help machines. Here is the markup for the twitter link mentioned earlier:

<a href="http://twitter.com/ErrolMorris" target="_blank">

If the page had included just a few extra characters, the microformat markup “rel=’me’“, or had a FOAF file, we could automatically have discovered that he had a microblog. He also has another blog on the New York Times site, and both of these have feeds (Twitter RSS; NYTimes RSS). Blogs and microblogs provide a way of establishing a more vivid and immediate connection between viewers and content creators; but finding the relevant feed for a given media object is non-trivial. So let’s walk through some of the issues and technologies.

0.  Content identification

For any of this ‘Linked TV’ scenario to make any sense, we need to get our hands on a solid widely-known identifier for the content. This is our entry pass into Douglas Adams’ ‘hyperland’, and is harder than it might seem. Typically all we have to identify a piece of content is information about a file, and maybe a textual label or two. There are a variety of automatic and semi-automatic approaches here, and growing support for rich disambiguation in freely available tools (eg. Boxee).

1. Reliability of linked data

Given a wikipedia or imdb url, we can find a lot of linked RDF data. In the example here, we use dbpedia to find the director’s homepage. Is this risky? Could the page be edited mischievously? Yes! Are there scenarios in which such Webby uncertainty is inappropriate in a TV context? Yes. Can we expect to see commercial and Musicbrainz-style collaborative enrichment and QA of linked datasets? I think so.

2. Social Graph discovery

OK, so we’ve found the director’s homepage. Maybe we could have found actors, presenters or writers too, given different content. This is great, but we don’t have much there that makes sense in a TV user interface. How do we find the twitter microblog programmatically? or his New York Times blog? How can we be sure of those links?

This is partially a matter of waiting, and partially a chicken-and-egg situation. If we build TV tools and lightweight standards which work better if content creator sites have a little extra markup (eg. rel=’me’), some content creators will add it to their homepages and blogs, and some hosting / tool vendors will add it automatically anyway. But what can we do today? Let’s look at Google’s Social Graph API.

The SGAPI allows us to take advantage of Google’s global Web index, and ask questions about people, their profiles, account pages, and connections.

We can ask for example, which accounts are claimed by http://www.errolmorris.com/. Today this gives no results, since the link in that page to the matching twitter page contains no semantic markup.

We can also ask, which accounts are claimed by http://twitter.com/ErrolMorris … and this finds some interesting information, since (unlike errormorris.com) Twitter is a site that is understood by the Google Social Graph API. We find a location (Boston MA), some photo and feed urls, but also a link from ErrolMorris on twitter to his homepage, www.errolmorris.com.

This last point is important: Google Social Graph API is structured in terms of claims. This gives it a lot more robustness against spammers and mischief. SGAPI notices that the markup on the Twitter page says, in effect, “this is my homepage over here” (in microformat-speak, ‘this is (also) me’). Here it is in full, … the link from Errol Morris on Twitter, linking to his homepage:

<a href="http://www.errolmorris.com" class="url" rel="me nofollow" target="_blank">

But the Google SGAPI notices that the homepage doesn’t explicitly reciprocate the claim. There is no machine markup in the homepage to indicate that the owner of the page is saying that the twitter page “is me”. It might just be a link to a friend, for example.

Nevertheless, we can use the SGAPI in less trusting mode (’show inbound links’), and take advantage of Google’s massive Web index to ask: which pages claim to have the same owner as www.errolmorris.com?

This gives us a story about how we can find a lot of useful contextual information, given a basic starting point. The Google service I show here is just one of several that could be exploited in ‘linked tv’ scenarios.

For finding other identifiers, we might use sameas.org. Here’s what I get when I ask it for other URIs for the fog of war film, using a dbpedia uri as entry point:

  1. http://dbpedia.org/resource/The_Fog_of_War
  2. http://dbpedia.org/resource/11_Lessons_from_the_Life_of_Robert_S._Mcnamara
  3. http://mpii.de/yago/resource/The_Fog_of_War
  4. http://rdf.freebase.com/ns/guid.9202a8c04000641f80000000004e1a97

For ’social graph’ data, we might also check sindice.com or foaf.qdos.com.

Topic description

A quick word on topics and wikipedia: by using content identifiers that link to Wikipedia and DBpedia, we have the potential for extremely rich, fine grained topical annotations.

Wikipedia’s page about the movie notes that it covers the Cuban Missile Crisis. This association is now machine-visible, since I personally improved the relevant Wikipedia entry to explicitly link to the Cuban Missile Crisis page on Wikipedia. This allows us to use wikipedia:Cuban_Missile_Crisis as a topic indicator, not just against the entire film but against particular segments of the movie that are about that topic. It is easy to get a list of such links from the Wikipedia markup, eg. [[Ford Motor Company]], [[Vietnam War]] are also already there. If we have a TV presentation system that has a unique id for the content (and ideally content-version, since content often varies), and we know an offset in seconds, then rich topical tags could be applied to sections of the film, without the need for textual data entry. An iphone or similar device could allow users to pause, annotate/tag/bookmark and continue their viewing, with no need for a keyboard.

That smart-phone story there is worth investigating, but let’s first start closer to the screen: what kinds of set top box, media centre or gadget might be able to index, navigate and play content in a way that makes interesting use of ‘linked tv’ techniques such as these? Which brings me to APIs for ‘media centre‘ systems, finally.

If we want to recommend interesting content to users, show them relevant links, annotations, related materials (not necessarily more video – text, audio, even a spreadsheet with statistics might be appropriate), suggestions from friends, upcoming broadcasts or archived materials, then we need an environment that is scripting friendly and capable of interacting with users in a rich and compelling manner. The simplest path here is to start with what’s out there already, and look at commonalities in API and data model, to see how far existing software can be ‘remote controlled’ from external scripts (and, eventually, from actual hand-held remote controls, eg. smartphones). I’ll go into some detail on that in the followup post.

The main point I want to (re-)emphasise here, is that once we make get to the stage of having well known identifiers for content, it facilitates a very rich marketplace for TV-related metadata, with emphasis on the word “-related“. And that this is necessarily very open-ended, since TV content can be about absolutely anything. I gave some examples drawing on Wikipedia/DBpedia for content metadata, and on homepages, twitter and Google’s Social Graph API to show how additional highly relevant information can be pulled from the Web, once we get a basic starting point. Finding ways of presenting such extra information to users, and giving them appropriate navigation and interaction options, is far from easy. Fortunately it is easier to share, syndicate and merge TV meta-content, than TV content. We are already seeing systems such as the XBMC-based Boxee which will normalise content identifiers in a way that encourages legal uses over illegal. Intellectual properties issues around TV content means that actual playable content is often not broadly shareable. However the same need not be true of user-supplied metadata, since this can be about a specific media file, but also it can be about the things the content is about. I expect to see TV meta-content shared in a global linked system, even while the underlying video and audio remain relatively hidden away; rich user-supplied TV metadata isn’t just about the TV show, it equally can tell us about the world and the viewer, and deserves to be widely available through open standards.

To go back to my original example, Errol Morris’s film about Robert McNamara, The Fog of War… there’s a world of metadata options beyond five-star ratings that can enrich such content, beyond worrying about numerical ranking in statistical recommender systems. Existing work there, eg. around the Netflix prize, would be hard to beat. Statistical recommendations work well over a regular dataset where everything has a well known identifier. We can see RDF and linked data techniques serving a pre-processing role for such analysis, by linking together otherwise fragmented pieces of information about content, allowing classic techniques to be applied over a wider dataset.

However, I see the true ‘linked TV’ potential to be primarily in another direction: in creating more meaningful conversations around content, and helping users find other information that gives a complementary perspective on the materials, and on cross-referencing everything with everything so that unexpected new paths can be found. Morris’ film can teach us a lot about McNamara, but also about the wider world and recent history … about the Ford Motor Company, the Vietnam War, the Cuban Missile Crisis, the firebombing of Tokyo under the command of Curtis LeMay. Not all easy or pleasant topics to try to understand, and each with thousands of other relevant sources (video or otherwise) out there, different perspectives to cross-reference, different accounts to reconcile.

If television is an environment in which we can be informed, educated and entertained (by broadcasters, content creators, and increasingly, by everyone else too), we need to think through what this means for better metadata. Today we are lucky if we can find a good way of even identifying a piece of content. But tomorrow, we should have TV that comes out of the box with “Do You Want To Know More?” switched ‘on‘ by default.

When Obama gives a talk (see his June 2009 speech in Cairo) he mentioned the following, at around 33 minutes and 30 seconds:

For many years, Iran has defined itself in part by its opposition to my country, and there is in fact a tumultuous history between us.  In the middle of the Cold War, the United States played a role in the overthrow of a democratically elected Iranian government.

This last point is well covered by historians, but was likely to be new information for many viewers in the USA and UK, if not for those in Iran. What can we do to improve that, by bringing Web and TV closer together?

While the raw video, audio, and transcript are  available on the Whitehouse site, we don’t yet have enough for “do you want to know more?”. W3C’s TimedText work might provide a basis for associating the transcript with the video as subtitles. This is not enough for someone who wants to know more, in following up this unusual acknowledgement of superpower interference.

An Internet-literate, laptop-owning Web user might try to find out more in an active manner. Although they are in a minority amongst TV viewers, there are still thousands of such people, and they know how to go to google.com and do searches. Can we harness their energy to improve the TV metadata environment for everyone? If you search Google for ‘US overthrow iranian government’ you find a good starting point for learning more, a Wikipedia page “1953 Iranian coup d’état“. Although that page has content which Wikipedia flags as controversial, it has a world-visible talk page in which contributors from around the world can debate the detail of the topic. So how can this Web content be bridged with the world of TV?

TV viewers are commonly concerned characterised in terms of a pyramid:

  • a passive majority, content to watch without interaction
  • a smaller group who will interact and navigate, but who rarely create new content
  • an even smaller group of activist users, who will explore, annotate, interact and create using all the tools available

The production and consumption of ‘do you want to know more?’ annotation won’t equally distributed. All viewers can benefit from, for example, a link from the Obama Cairo speech video to background information on the 1953 Iran coup. Some viewers might benefit indirectly, because their more inquisitive friends and family will have explored the linked materials and will talk to them about it. Others might read the wikipedia page and linked pages directly; if not on their television, then later on a laptop or PC.

Still others (a minority, but such minorities can be influential) might engage further with the material, become intrigued by the differences of opinion and seek more perspectives from the wider Web. Now 1953 is not ancient history; it falls within the living memory of countless people, in Iran, the UK and the world at large. A challenge for Linked TV is to find ways to integrate those memories and views into the TV environment, such that any viewer could pause the playing of the Obama Cairo speech video because they want to know more about what he just said. Regardless of whether they are watching on whitehouse.gov, YouTube, Joost, a live re-broadcast, a radio-extract, a player embedded in a social network site (facebook, orkut, hyves, …), or running on a mobile phone. The current problem Iranians have with getting full and high-bandwidth Web access are another obstacle, of course, but eventually we can expect relevant video or audio memories from Iran (with subtitles and translations) to be 1 or 2 clicks away from Obama’s video, regardless of the software and environment playing the video. Much of that material is already out there somewhere, the challenge is to find it, link it and present it appropriately.

The core piece of metadata, a link from a section of the video (ie. the offset, 33mins, 30 secs) of his speech to a URL for the topic it describes, could be created by whitehouse.gov staff, or it could be created by activist users. We pretty much know how to do this. A time-offset and a wikipedia URL do that job well enough for now. But what we have so far failed to achieve is a world-wide data ecosystem in which this kind of TV-enriching metadata is plentiful, widely used, and comprehensive enough that we can grow to expect to have fact-checkable TV, cross-referencable TV, hypertext TV. Instead of throwing physical things at the TV when we disagree with what we hear, it should become commonplace to be able to press pause and then or later publish your perspective (an audio rant, a fact-check crosslink, a blog post) in a way that will become accessible by other viewers of that material in the future.

To achieve this, we can’t afford needless fragmentation

  • of meta-content about different versions of the same basic content
  • between geographical regions based on geo-rights to play the material
  • between desktop, media-centre, mobile, set top box, broadcast and on-demand scenarios
  • between radio and TV
  • between files and streams
  • between file formats
  • between populations (‘net censorship and crude blocking technology)

If these huge obstacles can be overcome, it might become reasonable to expect to find useful information attached to most of what we see on next-generation TV, regardless of delivery system. My hunch is the right place to prototype is around opensource and hackable media centre systems, and the right low-level communications system is XMPP. I’ll go into this more in the next post. Apologies for the length of this one. Comments welcomed by blog email or whatever!

Posted in Activism, Essays, FOAF, Jabber/XMPP, Project ideas, RSS/Atom, SocialWeb, Technology, Web Technology, coding, ggg, tv | 1 Comment

Getting started with Mozilla Jetpack for Thunderbird (on OSX)

A few weeks ago, I started to experiment with Mozilla’s new Jetpack extension model when it became available for Thunderbird. Revisiting the idea today, I realise I’d forgotten the basic setup details, so am recording them here for future reference.

I found I had to download the source from the Mercurial Web interface, rather than use pre-prepared XPI installers. This may have improved by the time you read this. I also learned (from Standard9 in #jetpack IRC) that I need asuth’s repository, rather than the main one. Again, things move quickly, don’t assume this is true forever.

Here is what worked for me, on OSX.

1. Grab a .zip from the Jetpack repo, and unpack it locally on a machine that has Thunderbird installed.

2. Edit extensions/install.rdf and make sure the em:maxVersion in the Thunderbird section matches your version of Thunderbird. In mine I updated it to say <em:maxVersion>3.0b4</em:maxVersion> (instead of 3.0b4pre).

3.  See the README in the jetpack filetree for installation. With Thunderbird closed, I ran “python manage.py install –app=thunderbird” and I found Jetpack installed fine.

4. Run Thunderbird, you should see an about:jetpack tab, and corresponding options in the Tools menu.

This was enough to get started. See discussion on visophyte.org for some example code.

After installation, you can use the about:jetpack windows to load, reload and delete Jetpacks from URL.

So, why would you bother doing all this? Jetpack provides a simple way of extending an email client using Web technology.

In my current (unfinished!) experiment, for example, I’m looking at making a sidebar the shows information (photo, blog etc.) about the sender of the currently-viewed email. And I figured that if I blogged this HOWTO, someone more familiar with ajax, jquery etc might care to help with wiring this up to the Google Social Graph JSON API, so we can use FOAF and XFN to provide more contextual information around incoming mail…

Assuming you are running Thunderbird 3b4

Posted in FOAF, Technology, coding | 2 Comments

Mirrors and Prisms: robust site-specific browsers

Mozilla (amongst others, see Chris Messina’s writeup of the trend, also Matt’s) have been exploring site-specific browsers through their Prism project. These combine aspects of the Web and Desktop environments, allowing you to have a desktop app tuned for browsing just one specific Web site. Prism is an application which, when run, will generate new per-site desktop applications. Currently it does not yet have a fancy packaging/installer, so users will need to install Prism plus the site files separately.

I have started to look at Prism as a basis for accessing robust, mirrored sites, so that a single point of failure (or censorship) might be avoided. With a lot help from Matt and others in #prism IRC chat, I have something almost working. The idea is simple: hack Prism so that the running browser code intercepts clicks and (based on some as-yet-undefined logic and preferences) gets the page from a list of mirrors, which might also be fetched dynamically from the ‘net.

I should also mention that one motivation here is for anti-censorship tools, to give users an easy way to access sites which might be blocked by their IP address or URL otherwise. I looked at FoxyProxy as an option but for site-specific robustness, running a full proxy server seems a bit heavy, compared to simply duplicating a set of files. Here’s what the main Prism app looks like:

prism-gutenberg

Screenshot showing Prism config settings for a site-specific browser.

Once you have Prism installed, you can hack a file named webrunner.js to intervene when links are clicked. In OSX, this can be found as /Applications/Prism.app/Contents/Resources/chrome/webrunner/content/webrunner.js.

Edit this: _domActivate : function(aEvent)

I added the following block to the start of this function:

var link = aEvent.target;
if (link instanceof HTMLAnchorElement && !WebRunner._isLinkExternal(link)) {
aEvent.preventDefault();
WebRunner._getBrowser().loadURI(“http://example.org/mirrors/”+link.href,null,null);
}

The idea here being that we intercept clicks, and rewrite them to point to equivalent http:// URIs elsewhere in the Web. As far as this goes, it works as advertised. But what I have is far from working… it would need some code in there to find the right mirror URLs to fetch from. Perhaps a list might be fetched on startup or first time a link is followed. It could also do with some work on packaging, so that this hacked version of Prism plus some actual site-specific browser config can be made into an easy-install Windows .exe or OSX .app. For a Windows installer, I am told that NSIS is a good place to start. You could also imagine a version that hid the mirrored URLs from user’s view. Since Prism has a built-in option to completely hide the URL navigation bar, I didn’t investigate this idea yet.

OK I think I’ve written up everything I learned from the helpful folks in IRC. I hope this repays some karma. If anyone cares to explore this further, or wants to help target student projects on exploring it, please get in touch.

Posted in Activism, coding | 3 Comments

Wolfram Alpha Interview

An interview with Xiang Wang of Wolfram Research in China, by Han Xu (Collin Hsu) of W3China fame.

Interesting excerpt:

Q: Since Wolfram|Alpha is dubbed ’smarter’ than traditional search engines, I wonder how much AI techniques are actually employed in the system? How is inference done? What is the provenance of each fact/claim? And what if there is a disagreement? For example, how it would represent information about Israel/Palestine area?

A: It’s much more an engineered artifact than a humanlike artificial intelligence. Some of what it does – especially in language understanding – may be similar to what humans do. But its primary objective is to do directed computations, not to act as a general intelligence. Wolfram|Alpha uses established scientific or other models as the basis for its computations. Whenever it does new computations, it’s effectively deriving new facts. About the controversial data you asked about, we deal in different ways with numerical data and particular issues. For numerical data, Wolfram| Alpha curators typically assign a range of values that are then carried through computations. For issues such as the interpretation of particular names or terms, like Israel/Palestine area issue mentioned in your question, Wolfram|Alpha typically prompts users to choose the assumption they want. We spend considerable effort on automated testing, expert review, and checking external data that we use to ensure the results. But with trillions of pieces of data, it’s inevitable that there are still errors out there. If you ever see a problem, please report it.

Posted in Semantic Web | Tagged , , | Leave a comment

What kind of Semantic Web researcher are you?

It’s hard to keep secrets in today’s increasingly interconnected, networked world. Social network megasites, mobile phones, webcams and  inter-site syndication can broadcast and amplify the slightest fragment of information. Data linking and interpretation tools can put these fragments together, to paint a detailed picture of your life, both online and off.

This online richness creates offline risk. For example, if you’re going away on holiday, there are hundreds of ways in which potential thieves could learn that your home is vacant and therefore a target for crime: shared calendars, twittered comments from friends or family, flickr’d photographs. Any of these could reveal that your home and possessions sit unwatched, unguarded, presenting an easy target for criminals.

Q: What research challenge does this present to the Semantic Web community? How can we address the concern that Semantic and Social Web technology have more to offer Burglar Bill than to his victims?

A1: We need better technology for limiting the flow of data, proving a right to legitimate access to information, cross-site protocols for deleting leaked or retracted data that flows between sites, and calculating trust metrics for parties requesting data access.

A2: We need to find ways to reconnect people with their neighbours and neighbourhoods, so that homes don’t sit unwatched when their occupants are away.

ps. Dear Bill, I have my iphone, laptop, piggy bank and camera with me…

Posted in FOAF, Project ideas, SocialWeb, ggg, privacy | Tagged | 3 Comments

NoTube scenario: Facebooks groups and TV recommendation

Short version: If the Web knows I like a TV show, why can’t my TV be more useful?

So I have just joined a Facebook group, “Spaced Appreciation Society“:

Basic Info
Type: Common Interest – Pets & Animals
Description: If you’ve ever watched (and therefore loved) the TV series Spaced, then come and pay homage to the great Simon Pegg and Jess Stevenson. “You f’ing plum”
Contact Details
Website: http://www.spaced-out.org.uk/
Location: Meteor Street

That URL is (as with many of these groups) from a site whose primary topic is the thing the group’s about. In this case, about a TV show. It’s even in the public page for that group:

<tr><td class=”label”>Website:</td>
<td class=”data”><div class=”datawrap”><a href=”http://www.spaced-out.org.uk/” onmousedown=”return wait_for_load(this, event, function() { UntrustedLink.bootstrap($(this), &quot;&quot;, event) });” target=”_blank” rel=”nofollow”>http://www.spaced-out.org.uk/</a></div></td></tr>

If I search Google (Yahoo BOSS might be wiser, they have APIs) with:

link:http://www.spaced-out.org.uk/ site:wikipedia.org

It finds me:

http://en.wikipedia.org/wiki/Spaced

Although “link:http://www.spaced-out.org.uk/ site:dbpedia.org” doesn’t find anything, some URL rewriting gets me to:

http://dbpedia.org/page/Spaced

“Spaced is a British television situation comedy written by and starring Simon Pegg and Jessica Stevenson, and directed by Edgar Wright. It is noted for its rapid-fire editing, frequent dropping of pop-culture references, and occasional displays of surrealism. Two series of seven episodes were broadcast in 1999 and 2001 on Channel 4.”

dbpedia-owl:author
* dbpedia:Jessica_Hynes
* dbpedia:Simon_Pegg

dbpedia-owl:completionDate
* 2001-04-13 (xsd:date)

dbpedia-owl:director
* dbpedia:Edgar_Wright

dbpedia-owl:episodenumber
* 14

dbpedia-owl:executiveproducer
* dbpedia:Humphrey_Barclay

dbpedia-owl:genre
* dbpedia:Situation_comedy

dbpedia-owl:language
* dbpedia:English_language

dbpedia-owl:network
* dbpedia:Channel_4

dbpedia-owl:producer
* dbpedia:Gareth_Edwards
* dbpedia:Nira_Park

dbpedia-owl:releaseDate
* 1999-09-24 (xsd:date)

dbpedia-owl:runtime
* 24

dbpedia-owl:starring
* dbpedia:Jessica_Hynes
* dbpedia:Simon_Pegg

There are also links from here to Cyc (but an incorrect match) and to Freebase (to http://www.freebase.com/view/en/spaced).

Unfortunately, the Wikipedia “external links” section, with the URL for http://www.spaced-out.org.uk/ (marked “offical, fan-operated site” is not part of the DBpedia RDF export. I guess as it is not in an infobox. Extracting these external-link URLs at least for the TV, Actor and Movie related sections of Wikipedia might be worthwhile. And DBpedia would be useful for identifying the relevant subset to re-extract.

This idea of using such URLs as keys into Wikipedia/dbpedia data would also work with Identi.ca groups and others. In fact the matching might be easier in Identi.ca – I’m not sure how the Facebook APIs expose this stuff.

Anyway, if a show is about to be broadcast that includes eg. an interview with dbpedia:Jessica_Hynes or dbpedia:Simon_Pegg I’d like to hear about it.

So… is there any way I can use BBC’s /programmes to get upcoming information about who will be on the radio or telly, in a way that could be matched against dbpedia URIs?

Edit: I should’ve mentioned that Facebook in particular also has a more explicit “is a fan of” construct, with Products, Celebs, TV shows and Stores as types of thing you can be a fan of. Furthermore these show up on your public page, eg. here’s mine. I’m certainly interested in using that data, but also in a model that uses  general groups, since it is applicable to other sites that allow a group to indicate itself with a topical URL.

Posted in General | Leave a comment

Twitter Iran RT chaos

From Twitter in the last few minutes, a chaos of echo’d posts about army moves. Just a few excerpts here by copy/paste, mostly without the all-important timestamps. Without tools to trace reports to their source, to claims about their source from credible intermediaries, or evidence, this isn’t directly useful. Even grassroots journalists needs evidence. I wonder how Witness and Identi.ca fit into all this. I was thinking today about an “(person) X claims (person) Y knows about (topic) Z” notation, perhaps built from FOAF+SKOS. But looking at this “Army moving in…” claim, I think something couched in terms of positive claims (along lines of the old OpenID showcase site Jyte) might be more appropriate.

The following is from my copy/paste from Twitter a few minutes ago. It gives a flavour of the chaos. Note also that observations from very popular users (such as stephenfry) can echo around for hours, often chased by attempts at clarification from others.

(“RT” is Twitter notation for re-tweet, meaning that the following content is redistributed, often in abbreviated or summarised form)

plotbunnytiff: RT @suffolkinace: RT From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
r0ckH0pp3r: RT .@AliAkbar: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection
jax3417: RT @ktyladie: RT @GennX: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection #iran
ktladie: RT @GennX: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection #iran
MellissaTweets: RT @AliAkbar: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection
GennX: RT @MelissaTweets: RT @AliAkbar: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection

The above all arrived at around the same time, and cite two prior “sources”:

suffolkinnace: RT From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection   18 minutes ago from web

Who is this? Nobody knows of course, but there’s a twitter bio:

http://twitter.com/suffolkinace # Bio Some-to-be Royal Military Policeman in the British Army. Also a massive Xbox geek and part-time comedian

The other “source” seems to be http://twitter.com/AliAkbar
AliAkbar: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection
about 1 hour ago from web
url http://republicmodern.com

This leads us to   http://republicmodern.com/about where we’re told
“Ali Akbar is the founder and president of Republic Modern Media. A conservative blogger, he is a contributor to Right Wing News, Hip Hop Republican, and co-host of The American Resolve online radio show. He was also the editor-in-chief of Blogs for McCain.”

I should also mention that a convention emerged in the last day two replace the names of specific local Twitter users in Tehran with a generic “from Iran”, to avoid getting anyone into trouble. Which makes plenty of sense, but without any in the middle vouching for sources makes it even harder to know which reports to take seriously.
More… back to twitter search, what’s happened since I started this post?

http://twitter.com/#search?q=iranelection%20army

badmsm: RT @dpbkmb @judyrey: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLZ RT! URGENT! #IranElection #gr88
SimaoC: RT @parizot: CONFIRMÉ! L’armée se dirige vers Téhéran contre les manifestants! #IranElection #gr88
SpanishClash: RT @mytweetnickname: RT From Iran:ARMY movement NOT confirmed in last 2:15, plz RT this until confrmed #IranElection #gr88
artzoom: RT @matyasgabor @humberto2210: RT CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! #IranElection #iranrevolution
sjohnson301: RT @RonnyPohl From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection #iran9
dauni: RT @withoutfield: RT: @tspe: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
interdigi: RT @ivanpinozas From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
PersianJustice: Once again, stop RT army movements until source INSIDE Iran verifies! Paramilitary is the threat anyway. #iranelection #gr88
Klungtveit Anyone: What’s the origin of reports of “army moving in” on protesters? #iranelection
Eruethemar: RT @brianlltdhq: RT @lumpuckaroo: Only IRG moving, not national ARMY… this is confirmed for real #IranElection #gr88
SAbbasRaza: RT @bymelissa: RT @alexlobov: RT From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
timnilsson: RT @Iridium24: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection
edmontalvo: RT @jasona: RT @Marble68: RT From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
stevelabate: RT army moving into Tehran against protesters. Please RT. #iranelection
ivanpinozas: From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
bschh: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection (via @dlayphoto)
dlayphoto: RT From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection

In short … chaos!

Is this just a social / information problem, or can different tooling and technology help filter out what on earth is happening?

Posted in Activism, Conspiracy Theory, FOAF, Politics, Project ideas, SKOS, The Web at War, ggg, openid, privacy | Tagged , , , , , , | 3 Comments