‘Republic of Letters’ in R / Custom Widgets for Second Screen TV navigation trails

As ever, I write one post that perhaps should’ve been two. This is about the use and linking of datasets that aid ‘second screen’ (smartphone, tablet) TV remotes, and it takes as a quick example a navigation widget and underlying dataset that show us how we might expect to navigate TV archives, in some future age when TV lives more fully in the World Wide Web. I argue that access to the ‘raw data‘ and frameworks for embedding visualisation apps are of equal importance when thinking about innovative ways of exploring the ever-growing archives. All of this comes from many discussions with my NoTube colleagues and other collaborators; rambling scribblyness is all my own.

Ben Hammersley points us at a lovely Flash visualization http://www.stanford.edu/group/toolingup/rplviz/”>Mapping the Republic of Letters”.

From the YouTube overview, “Researchers map thousands of letters exchanged in the 18th century’s “Republic of Letters” and learn at a glance what it once took a lifetime of study to comprehend.”

Mapping the Republic of Letters has at its center a multidimensional data set which spans 300 years and nearly 100,000 letters. We use computing tools that help us to measure and analyze data quantitatively, though that will not take us to our goal. While we use software and computing techniques that were designed for scientific and statistical methods, we are seeking to develop computing tools to enhance humanistic methods, to help us to explore qualitative aspects of the Republic of Letters. The subject of our study and the nature of the material require it. The collections of correspondence and records of travel from this period are incomplete. Of that incomplete material only a fraction has been digitized and is available to us. Making connections and resolving ambiguities in the data is something that can only be done with the help of computing, but cannot be done by computing alone. (from ‘methods and philosophy‘)

screenshot of Republic of Letters app, showing social network links superimposed on map of historical western Europe

See their detailed writeup for more on this fascinating and quite beautiful work. As I’m working lately on linking TV content more deeply into the Web, and on ‘second screen’ navigation, this struck me as just the kind of interface which it ought to be possible to re-use on a tablet PC to explore TV archives. Forgetting for the moment difficulties with Flash on iPads and so on, the idea roughly is that it would be great to embed such a visualization within a TV watching environment, such that when the ‘republic of letters’ widget is focussed on some person, place, or topic, we should have the opportunity to scan the available TV archives for related materials to show.

So a glance at Chrome’s ‘developer tools’ panel gave me a link to the underlying data used by the visualisation. I don’t know exactly whose it is, nor how they want it used, so please treat it with respect. Still, there it is, sat in the Web, in tab-separated format, begging to be used. There’s a lot you can do with the Flash application that I’ve barely touched, but I’m intrigued by the underlying dataset. In particular, where they have the string “Tonson, Jacob”, the data linker in me wants to see a Wikipedia or DBpedia link, since they provide explanation, context, related people, places and themes; all precious assets when trying to scrape together related TV materials to inform, educate or entertain someone with. From a few test searches, it turns out that (many? most?) the correspondents are quite easily matched to Wikipedia: William Congreve, Montagu, 1st earl of Halifax, CharlesHough, bishop of Worcester, John; Stanyan, Abraham;  … Voltaire and others. But what about the data?

Lately I’ve been learning just a little about R, a language used mainly for statistics and related analysis. Here’s what it’ll do ‘out of the box’, in untrained hands:

letters<-read.csv('data.txt',sep='\t', header=TRUE)
v_author = letters$Author=="Voltaire"
v_letters = letters[v_author, ]
Where were Voltaire’s letters sent?
> cbind(summary(v_letters$dest_country))
Austria            2
Belgium            6
Canada             0
Denmark            0
England           26
France          1312
Germany           97
India              0
Ireland            0
Italy             68
Netherlands       22
Portugal           0
Russia             5
Scotland           0
Spain              1
Sweden             0
Switzerland      342
The Netherlands    1
Turkey             0
United States      0
Wales              0
As the overview and video in the ‘Republic of Letters‘ site points out (“Tracking 18th-century “social network” through letters”), the patterns of correspondence eg. between Voltaire and e.g. England, Scotland and Ireland jumps out of the data (and more so its visualisation). There are countless ways this information could be explored, presented, sliced-and-diced. Only a custom app can really make the most of it, and the Republic of Letters work goes a long way in that direction. They also note that
The requirements of our project are very much in sync with current work being done in the linked-data/ semantic web community and in the data visualization community, which is why collaboration with computer science has been critical to our project from the start.
So the raw data in the Web here is a simple table; while we could spend time arguing about whether it would better be expressed in JSON, XML or an RDF notation, I’d rather see some discussion around what we can do with this information. In particular, I’m intrigued by the possibilities of R alongside the data-linking habits that come with RDF. If anyone manages to tease anything interesting from this dataset, perhaps mixed in with DBpedia, do post your results.
And of course there are always other datasets to examine; for example see the Darwin correspondence archives, or the Open Knowledge Foundation’s Open Correspondence project which has a Dickens-based pilot. While it is wonderful having UI that is tuned to the particulars of some dataset, it is also great when we can re-use UI code to explore similarly structured data from elsewhere. On both the data side and the UI side, this is expensive, tough work to do well. My current concern is to maximise re-use of both UI and data for the particular circumstances of second-screen TV navigation, a scenario rarely a first priority for anyone!
My hope is that custom navigation widgets for this sort of data will be natural components of next-generation TV remote controls, and that TV archives (and other collections) will open up enough of their metadata to draw in (possibly paying) viewers. To achieve this, we need the raw data on both sides to be as connectable as possible, so that application authors can spend their time thinking about what their users really need and can use, rather than on whether they’ve got the ‘right’ Henry Newton.
If we get it right, there’s a central role for librarianship and archivists in curating the public, linked datasets that tell us about the people, places and topics that will allow us to make new navigation trails through Web-connected television, literature and encyclopedia content. And we’ll also see new roles for custom visualizations, once we figure out an embedding framework for TV widgets that lets them communicate with a display system, with other users in the same room or community, and that is designed for cross-referencing datasets that talk about the same entities, topics, places etc.
As I mentioned regarding Lonclass and UDC, collaboration around open shared data often takes place in a furtive atmosphere of guilt and uncertainty. Is it OK to point to the underlying data behind a fantastic visualisation? How can we make sure the hard work that goes into that data curation is acknowledged and rewarded, even while its results flow more freely around the Web, and end up in places (your TV remote!) that may never have been anticipated?

“Stuff I’ve been thinking about” (SocialNetworkPortability WebCamp) – my slides

I’m in Cork, mainly for the excellent Social Network Portability event on Sunday, but am also staying through Blogtalk’08 which has been great. I’ve uploaded my slides from my talk (slideshare in Flash, included inline here, or a pdf). I have some rough speaking notes too,  maybe I’ll get those online. I have no idea how they relate to whatever actually came out of my mouth during the talk :) Apologies to those without PDF or Flash. I haven’t tried Keynote’s HTML output yet.

Basically much of what I was getting at in the talk, and my thoughts are only just congealing on this … is that the idea of a ‘claim’ is a useful bridge between Semantic Web and Social Networking concerns. Also that it helps us understand how technologies fit together. FOAF defines a dictionary of terms for making claims, as does xfn, hCard. RDF/XML, Microformats, RDFa, GRDDL define textual notations for publishing documents that encode claims, and SPARQL gives us a way of asking questions about the claims made in different documents.

IM/RSS bot – BBC Persian News Flash

OK this is old news, but pretty cool so I’m happy to write it up belatedly.

I just logged into MSN chat, and was greeted by Mario Menti’s IM bot, which provides a text-chat UI for navigating the BBC’s news feeds from their Persian service. I’m pasting the output here, hoping it’ll display reasonably. I can’t read a word of it of course, but remember Ian Forrester’s XTech talk a few years back about the headaches for getting I18N right for such feeds (and the varying performance of newsreader clients with right-to-left and mixed direction text). This hack came out of a conversation with Mario and Ian around the BBC Backstage scene, and from comments from a couple of friends in Tehran, this sort of technology direction is much appreciated by those whose news access is restricted. The bot is called bbcpersian at hotmail.co.uk, and seems to still be running 18 months later. See also some more recent hacks from Mario that wire up BBC feeds to twitter.

BBC Persian News Flash says: (23:01:02)

Hi, this is your hourly BBCPersian.com news flash with the 10 most recent new items
1 افزایش نیروها در عراق ‘درحال نتیجه دادن است’
2 انتقاد شدید کروبی از ‘مخالفان احزاب’
3 نواز شریف از پاکستان اخراج شد
4 بازداشت یکی از ‘قاچاقچیان بزرگ’ کلمبیا
5 ترکیه: کشورهای منطقه از اقدامات تنش زا دوری کنند
6 ‘عاشقان قلندر’ جشنواره ای دیگر برپا کردند
7 کاهش ساعت کار ادارات دولتی ایران در ماه رمضان
8 ‘عراقیها احساس امنیت بیشتری نمی کنند’
9 نواز شریف از پاکستان اخراج شد
10 شرکت مردم گواتمالا در انتخابات این کشور

Reply with number 1 to 10 to see more information, or any other message if you want to stop receiving these news flashes

Anyone know what the state of the art is with IM-based feed readers? or have a wishlist?

OpenID plugin for WordPress

I’ve just installed Alan J Castonguay’s WordPress OpenID plugin on my blog, part of a cleanup that included nuking 11000+ comments in the moderation queue using the Spam Karma 2 plugin. Apologies if I zapped any real comments too. There are a few left, at least!

The OpenID thing appears to “just work”. By which I mean, I could log in via it and leave a comment. I’d be super-grateful if those of you with OpenIDs could take a minute to leave a comment on this post, to see if it works as well as it seems to. If it doesn’t, a bug report (to danbrickley@gmail.com) would be much appreciated. Those of you with LiveJournals or AOL/AIM accounts already have OpenID, even if you didn’t notice. See the HTML source for my homepage to see how I use “danbri.org” as an OpenID while delegating the hard work to LiveJournal. For more on OpenID, check out these tutorial slides (flash/pdf) from Simon Willison and David Recordon.

Thinking about OpenID-mediated blog comments, the tempting thing then would be to do something with the accumulated URIs. The plugin keeps its data in nice SQL tables and presumably accessible by other WordPress plugins. It’s been a while since I made a WordPress plugin, but they seem to have a pretty good framework accessible to them now.

mysql> select user_id, url from wp_openid_identities;
| user_id | url                |
|      46 | http://danbri.org/ |
1 row in set (0.28 sec)

At the moment, it’s just me. It’d be fun to try scooping up RDF (FOAF, SKOS, SIOC, feeds…) from any OpenID URIs that accumulate there. Hmm I even wrote up that project idea a while back – SparqlPress. At the time I tried prototyping it in Redland + PHP, but nowadays I’d probably use Benjamin Nowack’s ARC library, which provides SPARQL query of a MySQL-backed RDF store, and is written in PHP. This gives it the same dependencies as WordPress, making it ideal for pluginization. If anyone’s looking for a modest-sized practical SemWeb project to hack on, that one could be a lot of fun.

There’s a lot of interesting and creative fuss about “social networking” site interop around lately, largely thanks to the social graph paper from Brad Fitzpatrick and David Recordon. I lean towards the “show me, don’t tell me” approach regarding buddylists and suchlike (as does Julian Bond with Ecademy), which is why FOAF has only ever had the mild-mannered “knows” relationship in the core vocabulary, rather than trying to over-formalise “bestest friend EVER” and other teenisms. So what I like about this WordPress plugin is that it gives some evidence-based raw material for decentralised social networking apps. Blog comments don’t tell the whole story; nothing tells the whole story. But rather than maintain a FOAF “knows” list (or blogroll, or blog-reader config) by hand, I’d prefer to be able to partially automate it by querying information about whose blogs I’ve commented on, and vice-versa. There’s a lot that could be built, intimidatingly much, that it’s hard to know where to start. I suggest that everyone in the SemWeb scene having an OpenID with a FOAF file linked from it would be an interesting platform from which to start exploring…

Meanwhile, I’ll try generating an RDF blogroll from any URIs that show up in my OpenID WordPress table, so I can generate a planetplanet or chumpologica configuration automatically…

Open Source Flash Development and WorldKit

Handy article, “Towards Open Source Flash Development” by Carlos Rovira.

Background to looking at this is some great news: Mikel Maron is open-sourcing the WorldKit system, a lightweight Flash/SWF-based Web mapping application. So I’m interested to find some open source tools that would allow me to rebuild it from source.

I also wonder whether SVG hackers might be interested to port some of it to SVG/Javascript. WorldKit supports geo/rss location tagging, so I’m also curious about what it’d take to get full RDF support in there. Has anybody made an RDF parser for SWF/Flash yet?

A special day for technologists

As Mrs. Nakamura stood watching her neighbour, everything flashed whiter than any white she had ever seen. She did not notice what happened to the man next door; the reflex of a mother set her in motion toward her children. She had taken a single step (the house was 1,350 yards, or three quarters of a mile, from the centre of the explosion) when something picked her up and she seemed to fly into the next room over the raised sleeping platform, pursued by parts of her house.

Timbers fell around her as she landed, and a shower of tiles pommelled her; everything became dark, for she was buried. The debris did not cover her deeply. She rose up and freed herself. She heard a child cry, “Mother, help me!” and saw her youngest – Myeko, the five-year-old – buried up to her breast and unable to move. As Mrs. Nakamura started frantically to claw her way towards the baby, she could see or hear nothing of her other children.

John Hersey, Hiroshima (Part 1, A Noiseless Flash), first published in the New Yorker, August 1946.

This August 6, the 60th anniversary of the atomic bombing, is a moment of shared lamentation in which more than 300 thousand souls of A-bomb victims and those who remain behind transcend the boundary between life and death to remember that day. It is also a time of inheritance, of awakening, and of commitment, in which we inherit the commitment of the hibakusha to the abolition of nuclear weapons and realization of genuine world peace, awaken to our individual responsibilities, and recommit ourselves to take action. This new commitment, building on the desires of all war victims and the millions around the world who are sharing this moment, is creating a harmony that is enveloping our planet.

The keynote of this harmony is the hibakusha warning, “No one else should ever suffer as we did,” along with the cornerstone of all religions and bodies of law, “Thou shalt not kill.” [...]

On this, the sixtieth anniversary of the atomic bombing, we seek to comfort the souls of all its victims by declaring that we humbly reaffirm our responsibility never to “repeat the evil.” “Please rest peacefully; for we will not repeat the evil.”

Peace Declaration, Tadatoshi Akiba, Mayor, The City of Hiroshima, 6 August 2005.

Data syndication

There have been various developments in the last week, via Planet RDF, on the topic of data syndication using RSS/Atom.

Edd Dumbill on iTunes RSS extensions; a handy review of the extensions they’ve added to support a “podcasting” directory. See also comments from Danny.

Nearby in the Web, Yahoo! and friends are still busy with their their media RSS spec, which lives on the rss-media yahoogroups list. Yahoo are also looking creative on other fronts. Today’s Yahoo! Search blog has an entry on Yahoo! Maps, which again uses RSS extensions to syndicate map-related data:

The Yahoo! Maps open API is based on geoRSS, a RSS 2.0 with w3c geo extension. For more information check out developer.yahoo.net/maps. We also offer API support via a group forum at yws-maps.

This is particularly interesting to me, as they’re picking up the little geo: namespace I made with collaborators from the W3C Semantic Web (formerly RDF) Interest Group.

Although the namespace was designed to be used in RDF, they’re using it in non-RDF RSS2 documents. This is a little dissapointing, since it makes the data less available to RDF processors. RSS1, of course, was designed as an RDF application specifically to support such data-centric extensions. Yahoo! have some developer pages with more detail, but they seem to have picked up where the Worldkit Flash RSS geocoding project left off. Worldkit attaches geo:lat and geo:long information to RSS2 items, and can display these in a Flash-based UI.

The WGS 84 geo vocabulary used here by Worldkit and Yahoo! was a collaborative experiment in minimalism. The GIS world has some very rich, sophisticated standards. The idea with the geo: namespace was to take the tiniest step towards reflecting that world into the RDF data-merging environment. RDF is interesting precisely because it allows for highly mixed, yet predictably structured, data mixing.

So that little geo namespace experiment (thanks to the efforts of various geo/mapping hackers, most of whom aren’t very far in FOAF space from Jo Walsh) seems to be proving its worth. A little bit of GIS can go a very long way.

I should at this point stress that the “w3c geo extension” (as Yahoo!’s search blog calls it” is an informal, pre-standardisation piece of work. This is important to stress, particularly given that the work is associated with W3C, both through hosting of the namespace document and because it came about as a collaboration of theW3C RDF/SW Interest Group. If it were the product of a real W3C Working Group, it would have received much more careful review. Someone might have noticed the inadquate (or non-) definition of geo:alt, for example!

I’m beginning to think that a small but more disciplined effort at W3C around RDF vocabulary for geo/mapping (with appropriate liasion to GIS standards) would be timely. But I digress. I was talking about data syndication. Going back to the Yahoo! maps example… they have taken RSS2, and the SWIG Geo vocab, and added a number of extensions that relate to their mapping interface (image, zoom etc), as well as Address, CityState, Zip, Country code, etc. Useful entities to have
markup for. In an RDF environment I’d probably have used vCard for those.

Yahoo aren’t the only folk getting creative with RSS this week. Microsoft have published some information on their work, including some draft proposals for extending RSS with “lists”. This, again, brings an emphasis back to RSS for data syndication. In other words, RSS documents as a carrier for arbitrary other information whose dissemination fits the syndication/publication model of RSS. Some links on the Microsoft proposal are on scoble’s blog. See Microsoft’s RSS in Longhorn page for details and a link to their
Simple List Extensions specification, which seems to focus on allowing RSS feeds to be presented as lists of items, including use of datatyped (and hence sortable) extensions.