‘Republic of Letters’ in R / Custom Widgets for Second Screen TV navigation trails

As ever, I write one post that perhaps should’ve been two. This is about the use and linking of datasets that aid ‘second screen’ (smartphone, tablet) TV remotes, and it takes as a quick example a navigation widget and underlying dataset that show us how we might expect to navigate TV archives, in some future age when TV lives more fully in the World Wide Web. I argue that access to the ‘raw data‘ and frameworks for embedding visualisation apps are of equal importance when thinking about innovative ways of exploring the ever-growing archives. All of this comes from many discussions with my NoTube colleagues and other collaborators; rambling scribblyness is all my own.

Ben Hammersley points us at a lovely Flash visualization http://www.stanford.edu/group/toolingup/rplviz/”>Mapping the Republic of Letters”.

From the YouTube overview, “Researchers map thousands of letters exchanged in the 18th century’s “Republic of Letters” and learn at a glance what it once took a lifetime of study to comprehend.”


Mapping the Republic of Letters has at its center a multidimensional data set which spans 300 years and nearly 100,000 letters. We use computing tools that help us to measure and analyze data quantitatively, though that will not take us to our goal. While we use software and computing techniques that were designed for scientific and statistical methods, we are seeking to develop computing tools to enhance humanistic methods, to help us to explore qualitative aspects of the Republic of Letters. The subject of our study and the nature of the material require it. The collections of correspondence and records of travel from this period are incomplete. Of that incomplete material only a fraction has been digitized and is available to us. Making connections and resolving ambiguities in the data is something that can only be done with the help of computing, but cannot be done by computing alone. (from ‘methods and philosophy‘)


screenshot of Republic of Letters app, showing social network links superimposed on map of historical western Europe


See their detailed writeup for more on this fascinating and quite beautiful work. As I’m working lately on linking TV content more deeply into the Web, and on ‘second screen’ navigation, this struck me as just the kind of interface which it ought to be possible to re-use on a tablet PC to explore TV archives. Forgetting for the moment difficulties with Flash on iPads and so on, the idea roughly is that it would be great to embed such a visualization within a TV watching environment, such that when the ‘republic of letters’ widget is focussed on some person, place, or topic, we should have the opportunity to scan the available TV archives for related materials to show.

So a glance at Chrome’s ‘developer tools’ panel gave me a link to the underlying data used by the visualisation. I don’t know exactly whose it is, nor how they want it used, so please treat it with respect. Still, there it is, sat in the Web, in tab-separated format, begging to be used. There’s a lot you can do with the Flash application that I’ve barely touched, but I’m intrigued by the underlying dataset. In particular, where they have the string “Tonson, Jacob”, the data linker in me wants to see a Wikipedia or DBpedia link, since they provide explanation, context, related people, places and themes; all precious assets when trying to scrape together related TV materials to inform, educate or entertain someone with. From a few test searches, it turns out that (many? most?) the correspondents are quite easily matched to Wikipedia: William Congreve, Montagu, 1st earl of Halifax, CharlesHough, bishop of Worcester, John; Stanyan, Abraham;  … Voltaire and others. But what about the data?

Lately I’ve been learning just a little about R, a language used mainly for statistics and related analysis. Here’s what it’ll do ‘out of the box’, in untrained hands:

letters<-read.csv('data.txt',sep='\t', header=TRUE)
v_author = letters$Author=="Voltaire"
v_letters = letters[v_author, ]
Where were Voltaire’s letters sent?
> cbind(summary(v_letters$dest_country))
[,1]
Austria            2
Belgium            6
Canada             0
Denmark            0
England           26
France          1312
Germany           97
India              0
Ireland            0
Italy             68
Netherlands       22
Portugal           0
Russia             5
Scotland           0
Spain              1
Sweden             0
Switzerland      342
The Netherlands    1
Turkey             0
United States      0
Wales              0
As the overview and video in the ‘Republic of Letters‘ site points out (“Tracking 18th-century “social network” through letters”), the patterns of correspondence eg. between Voltaire and e.g. England, Scotland and Ireland jumps out of the data (and more so its visualisation). There are countless ways this information could be explored, presented, sliced-and-diced. Only a custom app can really make the most of it, and the Republic of Letters work goes a long way in that direction. They also note that
The requirements of our project are very much in sync with current work being done in the linked-data/ semantic web community and in the data visualization community, which is why collaboration with computer science has been critical to our project from the start.
So the raw data in the Web here is a simple table; while we could spend time arguing about whether it would better be expressed in JSON, XML or an RDF notation, I’d rather see some discussion around what we can do with this information. In particular, I’m intrigued by the possibilities of R alongside the data-linking habits that come with RDF. If anyone manages to tease anything interesting from this dataset, perhaps mixed in with DBpedia, do post your results.
And of course there are always other datasets to examine; for example see the Darwin correspondence archives, or the Open Knowledge Foundation’s Open Correspondence project which has a Dickens-based pilot. While it is wonderful having UI that is tuned to the particulars of some dataset, it is also great when we can re-use UI code to explore similarly structured data from elsewhere. On both the data side and the UI side, this is expensive, tough work to do well. My current concern is to maximise re-use of both UI and data for the particular circumstances of second-screen TV navigation, a scenario rarely a first priority for anyone!
My hope is that custom navigation widgets for this sort of data will be natural components of next-generation TV remote controls, and that TV archives (and other collections) will open up enough of their metadata to draw in (possibly paying) viewers. To achieve this, we need the raw data on both sides to be as connectable as possible, so that application authors can spend their time thinking about what their users really need and can use, rather than on whether they’ve got the ‘right’ Henry Newton.
If we get it right, there’s a central role for librarianship and archivists in curating the public, linked datasets that tell us about the people, places and topics that will allow us to make new navigation trails through Web-connected television, literature and encyclopedia content. And we’ll also see new roles for custom visualizations, once we figure out an embedding framework for TV widgets that lets them communicate with a display system, with other users in the same room or community, and that is designed for cross-referencing datasets that talk about the same entities, topics, places etc.
As I mentioned regarding Lonclass and UDC, collaboration around open shared data often takes place in a furtive atmosphere of guilt and uncertainty. Is it OK to point to the underlying data behind a fantastic visualisation? How can we make sure the hard work that goes into that data curation is acknowledged and rewarded, even while its results flow more freely around the Web, and end up in places (your TV remote!) that may never have been anticipated?

Archive.org TV metadata howto

The  following is composed from answers kindly supplied by Hank Bromley, Karen Coyle, George Oates, and Alexis Rossi from the archive.org team. I have mixed together various helpful replies and retro-fitted them to a howto/faq style summary.

I asked about APIs and data access for descriptions of the many and varied videos in Archive.org. This guide should help you get started with building things that use archive.org videos. Since the content up there is pretty much unencumbered, it is perfect for researchers looking for content to use in demos. Or something to watch in the evening.

To paraphrase their answer, it was roughly along these  lines:

  • you can do automated lookups of the search engine using a simple HTTP/JSON API
  • downloading a lot or everything is ok if you need or prefer to work locally, but please write careful scripts
  • hopefully the search interface is useful and can avoid you needing to do this

Short API overview: each archive entry that is a movie, video or tv file should have a type ‘movie’. Everything in the archive has a short textual ID, and an XML description at a predictable URL. You can find those by using the JSON flavour of the archive’s search engine, then download the XML (and content itself) at your leisure. Please cache where possible!

I was also pointed to http://deweymusic.org/ which is an example of a site that provides a new front-end for archive.org audio content – their live music collection. My hope in posting these notes here is to help people working on new interfaces to Web-connected TV explore archive.org materials in their work.

JSON API to archive.org services

See online documentation for JSON interface; if you’re happy working with the remote search engine and are building a Javascript-based app, this is perfect.

We have been moving the majority of our services from formats like XML, OAI and other to the more modern JSON format and method of client/server interaction.

How to … play well with others

As we do not have unlimited resources behind our services, we request that users try to cache results where they can for the more high traffic and popular installations/uses. 8-)

TV content in the archive

The archive contains a lot of video files; old movies, educational clips, all sorts of fun stuff. There is also some work on reflecting broadcast TV into the system:

First off, we do have some television content available on the site right now:
http://www.archive.org/details/tvarchive – It’s just a couple of SF gov channels, so the content itself is not terribly exciting.  But what IS cool is that this being recorded directly off air and then thrown into publicly available items on archive.org automatically.  We’re recording other channels as well, but we currently aren’t sure what we can make public and how.

See also televisionarchive.orghttp://www.archive.org/details/sept_11_tv_archive

How to… get all metadata

If you really would rather download all the metadata and put it in their own search engine or database, it’s simple to do:  get a list of the identifiers of all video items from the search engine (mediatype:movies), and for each one, fetch this file:

http://www.archive.org/download/{itemID}/{itemID}_meta.xml

So it’s a bit of work since you have to retrieve each metadata record separately, but perhaps it is easily programmable.

However, once you have the identifier for an item, you can automatically find the meta.xml for it (or the files.xml if that’s what you want).  So if the item is at:
http://www.archive.org/details/Sita_Sings_the_Blues
the meta.xml is at
http://www.archive.org/download/Sita_Sings_the_Blues/Sita_Sings_the_Blues_meta.xml
and the files.xml is at
http://www.archive.org/download/Sita_Sings_the_Blues/Sita_Sings_the_Blues_files.xml

This is true for every single item in the archive.

How to… get a list of all IDs

Use http://www.archive.org/advancedsearch.php

Basically, you put in a query, choose the metadata you want returned, then choose the format you’d like it delivered in (rss, csv, json, etc.).

Downsides to this method – you can only get about 10,000 items at once (you might be able to push it to 20,000) before it crashes on you, and you can only get the metadata fields listed.

How to… monitor updates with RSS?

Once you have a full dump, you can monitor incoming items via the RSS feed on this page:

http://www.archive.org/details/movies

Subtitles / closed captions

For the live TV collection, there should be extracted subtitles. Maybe I just found bad examples. (e.g

http://www.archive.org/details/SFGTV2_20100909_003000).

Todo: more info here!

What does the Archive search engine index?

In general *everything* in the meta.xml files is indexed in the IA search engine, and accessible for scripted queries at http://www.archive.org/advancedsearch.php.

But it may be that the search engine will support whatever queries you want to make, without your having to copy all the metadata to your own site.

How many “movies” are in the database?

Currently 314,624 “movies” items in the search engine. All tv and video items are supposed to be have “movies” for their mediatype, although there has been some leakage now and then.

Should I expect a valid XML file for each id?

eg.  “identifier”:”mosaic20031001″ seemed problematic.
There are definitely items on the archive that have extremely minimally filled outmeta.xml files.

Response from a trouble report:

“I looked at a couple of your examples, i.e. http://www.archive.org/details/HomeElec,  and they do have a meta.xml file in our system… but it ONLY contains a mediatype (movies) and identifier and nothing else.  That seems to be making our site freak out.  There are at least 800 items in movies that do not have a title.  There might be other minimal metadata that is required for us to think it’s a real item, but my guess is that if you did a search like this one you’d see fewer of those errors:
http://www.archive.org/search.php?query=mediatype%3Amovies%20AND%20title%3A[*%20TO%20*]

The other error you might see is “The item is not available due to issues with the item’s content.”  This is an item that has been taken down but for some reason it did not get taken out of the SE – it’s not super common, but it does happen.
I don’t think we’ve done anything with autocomplete on the Archive search engine, although one can use wildcards to find all possible completions by doing a query.  For example, the query:

http://www.archive.org/advancedsearch.php?q=mediatype%3Avideo+AND+title%3Aopen*&fl[]=identifier&fl[]=title&rows=10&page=1&output=json&save=yes

will match all items whose titles contain any words that start with “open” – that sample result of ten items shows titles containing “open,” “opening,” and “opener.”

How can I autocomplete against archive.org metadata?

Not at the moment.

“I believe autocomplete *has* been explored with the search engine on our “Open Library” sister site, openlibrary.org.”

How can I find interesting and well organized areas of the video archive?

I assume you’re looking for collections with pretty regular metadata to work on?  These collections tend to be fairly filled out:
http://www.archive.org/details/prelinger
http://www.archive.org/details/academic_films
http://www.archive.org/details/computerchronicles


Remote remotes

I’ve just closed the loop on last weekend’s XMPP / Apple Remote hack, using Strophe.js, a library that extends XMPP into normal Web pages. I hope I’ll find some way to use this in the NoTube project (eg. wired up to Web-based video playing in OpenSocial apps), but even if not it has been a useful learning experience. See this screenshot of a live HTML page, receiving and displaying remotely streamed events (green blob: button clicked; grey blob: button released). It doesn’t control any video yet, but you get the idea I hope.

Remote apple remote HTML demo

Remote apple remote HTML demo, screenshot showing a picture of handheld apple remote with a grey blob over the play/pause button, indicating a mouse up event. Also shows debug text in html indicating ButtonUpEvent: PLPZ.

This webclient needs the JID and password details for an XMPP account, and I think these need to be from the same HTTP server the HTML is published on. It works using BOSH or other tricks, but for now I’ve not delved into those details and options. Source is in the Buttons area of the FOAF svn: webclient. I made a set of images, for each button in combination with button-press (‘down’), button-release (‘up’). I’m running my own ejabberd and using an account ‘buttons@foaf.tv’ on the foaf.tv domain. I also use generic XMPP IM accounts on Google Talk, which work fine although I read recently that very chatty use of such services can result in data rates being reduced.

To send local Apple Remote events to such a client, you need a bit of code running on an OSX machine. I’ve done this in a mix of C and Ruby: imremoted.c (binary) to talk to the remote, and the script buttonhole_surfer.rb to re-broadcast the events. The ruby code uses Switchboard and by default loads account credentials from ~/.switchboardrc.

I’ve done a few tests with this setup. It is pretty responsive considering how much indirection is involved: but the demo UI I made could be prettier. The + and – buttons behave differently to the left and right (and menu and play/pause); only + and – send an event immediately. The others wait until the key is released, then send a pair of events. The other keys except for play/pause will also forget what’s happening unless you act quickly. This seems to be a hardware limitation. Apparently Apple are about to ship an updated $20 remote; I hope this aspect of the design is reconsidered, as it limits the UI options for code using these remotes.

I also tried it using two browsers side by side on the same laptop; and two laptops side by side. The events get broadcasted just fine. There is a lot more thinking to do re serious architecture, where passwords and credentials are stored, etc. But XMPP continues to look like a very interesting route.

Finally, why would anyone bother installing compiled C code, Ruby (plus XMPP libraries), their own Jabber server, and so on? Well hopefully, the work can be divided up. Not everyone installs a Jabber server. My thinking is that we can bundle a collection of TV and SPARQL XMPP functionality in a single install, such that local remotes can be used on the network, but also local software (eg. XBMC/Plex/Boxee) can also be exposed to the wider network – whether it’s XMPP .js running inside a Web page as shown here, or an iPhone or a multi-touch table. Each will offer different interaction possibilities, but they can chat away to each other using a common link, and common RDF vocabularies (an area we’re working on in NoTube). If some common micro-protocols over XMPP (sending clicks or sending commands or doing RDF queries) can support compelling functionality, then installing a ‘buttons adaptor’ is something you might do once, with multiple benefits. But for now, apart from the JQbus piece, the protocols are still vapourware. First I wanted to get the basic plumbing in place.

Update: I re-discovered a useful ‘which bosh server do you need?’ which reminds me that there are basically two kinds of BOSH software offering; those that are built into some existing Jabber/XMPP server (like the ejabberd installation I’m using on foaf.tv) and those that are stand-alone connection managers that proxy traffic into the wider XMPP network. In terms of the current experiment, it means the event stream can (within the XMPP universe) come from addresses other than the host running the Web app. So I should also try installing Punjab. Perhaps it will also let webapps served from other hosts (such as opensocial containers) talk to it via JSON tricks? So far I have only managed to serve working Buttons/Strophe HTML from the same host as my Jabber server, ie. foaf.tv. I’m not sure how feasible the cross-domain option is.

Update x2: Three different people have now mentioned opensoundcontrol to me as something similar, at least on a LAN; it clearly deserves some investigation

Getting started with Mozilla Jetpack for Thunderbird (on OSX)

A few weeks ago, I started to experiment with Mozilla’s new Jetpack extension model when it became available for Thunderbird. Revisiting the idea today, I realise I’d forgotten the basic setup details, so am recording them here for future reference.

I found I had to download the source from the Mercurial Web interface, rather than use pre-prepared XPI installers. This may have improved by the time you read this. I also learned (from Standard9 in #jetpack IRC) that I need asuth’s repository, rather than the main one. Again, things move quickly, don’t assume this is true forever.

Here is what worked for me, on OSX.

1. Grab a .zip from the Jetpack repo, and unpack it locally on a machine that has Thunderbird installed.

2. Edit extensions/install.rdf and make sure the em:maxVersion in the Thunderbird section matches your version of Thunderbird. In mine I updated it to say <em:maxVersion>3.0b4</em:maxVersion> (instead of 3.0b4pre).

3.  See the README in the jetpack filetree for installation. With Thunderbird closed, I ran “python manage.py install –app=thunderbird” and I found Jetpack installed fine.

4. Run Thunderbird, you should see an about:jetpack tab, and corresponding options in the Tools menu.

This was enough to get started. See discussion on visophyte.org for some example code.

After installation, you can use the about:jetpack windows to load, reload and delete Jetpacks from URL.

So, why would you bother doing all this? Jetpack provides a simple way of extending an email client using Web technology.

In my current (unfinished!) experiment, for example, I’m looking at making a sidebar the shows information (photo, blog etc.) about the sender of the currently-viewed email. And I figured that if I blogged this HOWTO, someone more familiar with ajax, jquery etc might care to help with wiring this up to the Google Social Graph JSON API, so we can use FOAF and XFN to provide more contextual information around incoming mail…

Assuming you are running Thunderbird 3b4

Visual SPARQL query tools

Quick links – thinking about tools that allow graphical SPARQL query authoring…

OpenLink Virtuoso: InteractiveSparqlQueryBuilder (in HTML/CSS/.js). Pictured below; extensive documentation and screenshots linked from their main page.

…an ancestor of which was Damian Steer’s RDFAuthor tool for MacOSX, which could generate Squish (a SPARQL precursor) and query services over the ‘array of hashtables’ SOAP-for-rdf-query non spec that Libby Miller and I had implementations of. From the RDFAuthor tutorial:

The old Maryland BINPIQ SHOE knowledgebase query applet is the grandaddy of them all. Sadly I don’t have any screenshots and the applet itself seems to be coderotted. [...] Ah, but here I find an email I wrote about it 8 years ago(!), which has screenshots:

SemanticSoft from Moldova also have some visual SPARQL UI:

No real conclusion here. I just found myself looking around some of these links, and thought I’d share them. I’m sure there’s a lot more related work out there (eg. NIGHTLIGHT from folk at Southampton Uni), and that the rise of fancy HTML-based UIs and JSON for data access makes for an ever-more interesting environment for zero-install graphical query tools.

One thing I remember about the old Maryland applets: as their representational language became more expressive (moving from binary to n-ary), the graphical query UI became somewhat less intuitive. Now since SPARQL itself adds some concepts not in the underlying target language (ie. RDF doesn’t have named graphs, optionals etc), the ability to make a graphical query UI that exploits the “it’s just an RDF graph with bits labelled as missing” (per Guha’s original proposal) perhaps gets a bit strained. In particular, how might named graphs best be represented in visual editors?

JQbus: social graph query with XMPP/SPARQL

Righto, it’s about time I wrote this one up. One of my last deeds at W3C before leaving at the end of 2005, was to begin the specification of an XMPP binding of the SPARQL querying protocol. For the acronym averse, a quick recap. XMPP is the name the IETF give to the Jabber messaging technology. And SPARQL is W3C’s RDF-based approach to querying mixed-up Web data. SPARQL defines a textual query language, an XML result-set format, and a JSON version for good measure. There is also a protocol for interacting with SPARQL databases; this defines an abstract interface, and a binding to HTTP. There is as-yet no official binding to XMPP/Jabber, and existing explorations are flawed. But I’ll argue here, the work is well worth completing.

jqbus diagram

So what do we have so far? Back in 2005, I was working in Java, Chris Schmidt in Python, and Steve Harris in Perl. Chris has a nice writeup of one of the original variants I’d proposed, which came out of my discussions with Peter St Andre. Chris also beat me in the race to have a working implementation, though I’ll attribute this to Python’s advantages over Java ;)

I won’t get bogged down in the protocol details here, except to note that Peter advised us to use IQ stanzas. That existing work has a few slight variants on the idea of sending a SPARQL query in one IQ packet, and returning all the results within another, and that this isn’t quite deployable as-is. When the result set is too big, we can run into practical (rather than spec-mandated) limits at the server-to-server layers. For example, Peter mentioned that jabber.org had a 65k packet limit. At SGFoo last week, someone suggested sending the results as an attachment instead; apparently this one of the uncountably many extension specs produced by the energetic Jabber community. The 2005 work was also somewhat partial, and didn’t work out the detail of having a full binding (eg. dealing with default graphs, named graphs etc).

That said, I think we’re onto something good. I’ll talk through the Java stuff I worked on, since I know it best. The code uses Ignite Online’s Smack API. I have published rough Java code that can communicate with instances of itself across Jabber. This was last updated July 2007, when I fixed it up to use more recent versions of Smack and Jena. I forget if the code to parse out query results from the responses was completed, but it does at least send SPARQL XML results back through the XMPP network.

sparql jabber interaction

sparql jabber interaction

So why is this interesting?

  • SPARQLing over XMPP can cut through firewalls/NAT and talk to data on the desktop
  • SPARQLing over XMPP happens in a social environment; queries are sent from and to Jabber addresses, while roster information is available which could be used for access control at various levels of granularity
  • XMPP is well suited for async interaction; stored queries could return results days or weeks later (eg. job search)
  • The DISO project is integrating some PHP XMPP code with WordPress; SparqlPress is doing same with SPARQL

Both SPARQL and XMPP have mechanisms for batched results, it isn’t clear which, if either, to use here.

XMPP also has some service discovery mechanisms; I hope we’ll wire up a way to inspect each party on a buddylist roster, to see who has SPARQL data available. I made a diagram of this last summer, but no code to go with it yet. There is also much work yet to do on access control systems for SPARQL data, eg. using oauth. It is far from clear how to integrate SPARQL-wide ideas on that with the specific possibilities offered within an XMPP binding. One idea is for SPARQL-defined FOAF groups to be used to manage buddylist rosters, “friend groups”.

Where are we with code? I have a longer page in FOAF SVN for the Jqbus Java stuff, and a variant on this writeup (includes better alt text for the images and more detail). The Java code is available for download. Chris’s Python code is still up on his site. I doubt any of these can currently talk to each other properly, but they show how to deal with XMPP in different languages, which is useful in itself. For Perl people, I’ve uploaded a copy of Steve’s code.

The Java stuff has a nice GUI for debugging, thanks to Smack. I just tried a copy. Basically I can run a client and a server instance from the same filetree, passing it my LiveJournal and Google Talk jabber account details. The screenshot here shows the client on the left having the XML-encoded SPARQL results, and the server on the right displaying the query that arrived. That’s about it really. Nothing else ought to be that different from normal SPARQL querying, except that it is being done in an infrastructure that is more socially-grounded and decentralised than the classic HTTP Web service model.

JQbus debug

Ruby client for querying SPARQL REST services

I’ve started a Ruby conversion of Ivan Herman’s Python SPARQL client, itself inspired by Lee Feigenbaum’s Javascript library. These are tools which simply transmit a SPARQL query across the ‘net to a SPARQL-protocol database endpoint, and handle the unpacking of the results. These queries can result in yes/no responses, variable-to-value bindings (rather like in SQL), or in chunks of RDF data. The default resultset notation is a simple XML format; JSON results are also widely available.

All I’ve done so far, is to sit down with the core Python file from Ivan’s package, and slog through converting it brainlessly into rather similar Ruby. Take out “self” and “:” from method definitions, write “nil” instead of “None”, write “end” at the end of each method and class, use Ruby’s iteration idioms, and you’re most of the way there. Of course the fiddly detail is where we use external libraries: for URIs, network access, JSON and XML parsing. I’ve cobbled something together which could be the basis for reconstructing all the original functionality in Ivan’s code. Currently, it’s more proof of concept, but enough to be worth posting.

Files are in SVN and browsable online; there’s also a bundled up download for the curious, brave or helpful. The test script shows all we have at the moment: ability to choose XML or JSON results, and access result set binding rows. Other result formats (notably RDF; there’s no RDF/XML parser currently) aren’t handled, and various bits of the code conversion are incomplete. I’d be super happy if someone came along and helped finish this!

Update:  Ruby’s REXML parser is now clumsily wired in; you get get a REXML Document object (see xml.com writeup) as a way of navigating the resultset.

Commandline PHP for loading RDF URLs into ARC (and Twinkle for query UI)


#!/usr/bin/php
<?php
if ($argc != 2 || in_array($argv[1], array('--help', '-help', '-h', '-?'))) {
?>
This is a command line PHP script with one option: URL of RDF document to load
<?php
} else {

$supersecret = "123rememberme"; #Security analysts recommend using data of birth + social security ID here
# *** be careful with real msql passwords ***

include_once("../arc/ARC2.php");
$config = array( 'db_host' => 'localhost', 'db_name' => 'sg1', 'db_user' => 'sparql',
'db_pwd' => $supersecret, 'store_name' => 'crawl', );
$store = ARC2::getStore($config);
if (!$store->isSetUp()) { $store->setUp(); }
$profile = $argv[1];
echo "Loading data from " . $profile ;
$store->query('DELETE FROM <'.$profile.'>');
$store->query('LOAD <'.$profile.'>');
}
?>

FWIW, this is what I’m using to (re)load data into an ARC store from the commandline. I’ll try wiring up my old RDF crawler to this when I get time. Each loaded source is stored as a named graph, with the URI it is loaded from being the named graph URI. ARC just needs the path to the unpacked PHP libraries, and connection details for a MySQL database, and comes with a handy SPARQL endpoint script too, which I’ve been testing with Twinkle.

My public sandbox data is currently loaded up as follows. No promises it’ll stay there, but anyway, adding the following to Twinkle 2.0’s config file section for SPARQL endpoints works for me. The endpoint also directly offers a basic Web interface too, with HTML, XML, JSON etc.


<http://sandbox.foaf-project.org/2008/foaf/ggg.php>
a sources:Endpoint; rdfs:label "FOAF Social Graph Agggregator sandbox".