Remembering Aaron Swartz

“One of the things the Web teaches us is that everything is connected (hyperlinks) and we all should work together (standards). Too often school teaches us that everything is separate (many different ‘subjects’) and that we should all work alone.” –Aaron Swartz, April 2001.

So Aaron is gone. We were friends a decade ago, and drifted out of touch; I thought we’d cross paths again, but, well, no.

Update: MIT’s report is published.

 I’ll remember him always as the bright kid who showed up in the early data sharing Web communities around RSS, FOAF and W3C’s RDF, a dozen years ago:

"Hello everyone, I'm Aaron. I'm not _that_ much of a coder, (and I don't know
much Perl) but I do think what you're doing is pretty cool, so I thought I'd
hang out here and follow along (and probably pester a bit)."

Aaron was from the beginning a powerful combination of smart, creative, collaborative and idealistic, and was drawn to groups of developers and activists who shared his passion for what the Web could become. He joined and helped the RSS 1.0 and W3C RDF groups, and more often than not the difference in years didn’t make a difference. I’ve seen far more childishness from adults in the standards scene, than I ever saw from young Aaron. TimBL has it right; “we have lost one of our own”. He was something special that ‘child genius’ doesn’t come close to capturing. Aaron was a regular in the early ’24×7 hack-and-chat’ RDF IRC scene, and it’s fitting that the first lines logged in that group’s archives are from him.

I can’t help but picture an alternate and fairer universe in which Aaron made it through and got to be the cranky old geezer at conferences in the distant shiny future. He’d have made a great William Loughborough; a mutual friend and collaborator with whom he shared a tireless impatience at the pace of progress, the need to ask ‘when?’, to always Demand Progress.

I’ve been reading old IRC chat logs from 2001. Within months of his ‘I’m not _that_ much of a coder’ Aaron was writing Python code for accessing experimental RDF query services (and teaching me how to do it, disclaiming credit, ‘However you like is fine… I don’t really care.’). He was writing rules in TimBL’s experimental logic language N3, applying this to modelling corporate ownership structures rather than as an academic exercise, and as ever sharing what he knew by writing about his work in the Web. Reading some old chats, we talked about the difficulties of distributed collaboration, debate and disagreement, personalities and their clashes, working groups, and the Web.

I thought about sharing some of that, but I’d rather just share him as I choose to remember him:

22:16:58 <AaronSw> LOL

Skosdex: SKOS utilities via jruby

I just announced this on the public-esw-thes and public-rdf-ruby lists. I started to make a Ruby API for SKOS.

Example code snippet from the readme.txt (see that link for the corresponding output):

require "src/jena_skos"
s1 = SKOS.new("http://norman.walsh.name/knows/taxonomy")
s1.read("http://www.wasab.dk/morten/blog/archives/author/mortenf/skos.rdf" )
s1.read("file:samples/archives.rdf")
s1.concepts.each_pair do |url,c|
  puts "SKOS: #{url} label: #{c.prefLabel}"
end

c1 = s1.concepts["http://www.ukat.org.uk/thesaurus/concept/1366"] # Agronomy
puts "test concept is "+ c1 + " " + c1.prefLabel
c1.narrower do |uri|
  c2 = s1.concepts[uri]
  puts "\tnarrower: "+ c2 + " " + c2.prefLabel
  c2.narrower do |uri|
    c3 = s1.concepts[uri]
    puts "\t\tnarrower: "+ c3 + " " + c3.prefLabel
  end
end

The idea here is to have a lightweight OO API for SKOS, couched in terms of a network of linked “Concepts”, with broader and narrower relations. But this is backed by a full RDF API (in our case Jena, via Java jruby magic). Eventually, entire apps could be built at the SKOS API level. For now, anything beyond broader/narrower and prefLabel is hidden away in the RDF (and so you’d need to dip into the Jena API to get to this data).

The distinguishing feature is that it uses jruby (a Ruby implementation in pure Java). As such it can call on the full powers of the Jena toolkit, which go far beyond anything available currently in Ruby. At the moment it doesn’t do much, I just parse SKOS and make a tiny object model which exposes little more than prefLabel and broader/narrower.

I think it’s worth exploring because Ruby is rather nice for scripting, but lacks things like OWL reasoners and the general maturity of Java RDF/OWL tools (parsers, databases, etc.).

If you’re interested just to see how Jena APIs look when called from jruby Ruby, see jena_skos.rb in svn. Excuse the mess.

I’m interested to hear if anyone else has explored this topic. Obviously there is a lot more to SKOS than broader/narrower, so I’m very interested to find collaborators or at least a sanity check before taking this beyond a rough demo.

Plans – well my main concern is nothing to do with java or ruby, … but to explore Lucene indexing of SKOS data. I am also very interested in the pragmatic question of where SKOS stops and RDFS/OWL starts, … and how exactly we bridge that gap. See flickr for my most recent sketch of this landscape, where I revisit the idea of an “it” property (skos:it, foaf:it, …) that links things described in SKOS to “the thing itself”. I hope to load up enough overlapping SKOS data to get some practical experience with the tradeoffs.

For query expansion, smarter tagging assistants, etc. So the next step is probably to try building a Lucene index similar to the contrib/wordnet utility that ships with Java lucene. This creates a Lucene index in which every “document” is really a word from Wordnet, with text labels for its synonyms as indexed properties. I also hope to look at the use of SKOS + Lucene for “did you mean?” and auto-completion utilities. It’s also worth noting that Jena ships with LARQ, a Lucene-aware extension to ARQ, Jena’s SPARQL engine.

Cross-browsing and RDF

Cross-browsing and RDF

While cross-searching has been described and demonstrated through this paper and associated work, the problem of cross-browsing a selection of subject gateways has not been addressed. Many gateway users prefer to browse, rather than search. Though browsing usually takes longer than searching, it can be more thorough, as it is not dependent on the users terms matching keywords in resource descriptions (even when a thesaurus is used, it is possible for resources to be “missed” if they are not described in great detail).

As a “quick fix”, a group of gateways may create a higher level menu that points to the various browsable menus amongst the gateways. However, this would not be a truly hierarchical menu system, as some gateways maintain browsable resource menus in the same atomic (or lowest level) subject area. One method of enabling cross-browsing is by the use of RDF.

The World Wide Web Consortium has recently published a preliminary draft specification for the Resource Description Framework (RDF). RDF is intended to provide a common framework for the exchange of machine-understandable information on the Web. The specification provides an abstract model for representing arbitrarily complex statements about networked resources, as well as a concrete XML-based syntax for representing these statements in textual form. RDF relies heavily on the notion of standard vocabularies, and work is in progress on a ‘schema’ mechanism that will allow user communities to express their own vocabularies and classification schemes within the RDF model.

RDF’s main contribution may be in the area of cross-browsing rather than cross-searching, which is the focus of the CIP. RDF promises to deliver a much-needed standard mechanism that will support cross-service browsing of highly-organised resources. There are many networked services available which have classified their resources using formal systems like MeSH or UDC. If these services were to each make an RDF description of their collection available, it would be possible to build hierarchical ‘views’ of the distributed services offering a user interface organised by subject-classification rather than by physical location of the resource.

From Cross-Searching Subject Gateways, The Query Routing and Forward Knowledge Approach, Kirriemuir et. al., D-Lib Magazine, January 1998.

I wrote this over 11 (eleven) years ago, as something of an aside during a larger paper on metadata for distributed search. While we are making progress towards such goals, especially with regard to cross-referenced descriptions of identifiable things (ie. the advances made through linked data techniques lately), the pace of progress can be quite frustrating. Just as it seems like we’re making progress, things take a step backwards. For example, the wonderful lcsh.info site is currently offline while the relevant teams at the Library of Congress figure out how best to proceed. It’s also ten years since Charlotte Jenkins published some great work on auto-classification that used OCLC’s Dewey Decimal Classification. That work also ran into problems, since DDC wasn’t freely available for use in such applications. In the current climate, with Creative Commons, Open source, Web 2.0 and suchlike the rage, I hope we’ll finally see more thesaurus and classification systems opened up (eg. with SKOS) and fully linked into the Web. Maybe by 2019 the Web really will be properly cross-referenced…

Visual SPARQL query tools

Quick links – thinking about tools that allow graphical SPARQL query authoring…

OpenLink Virtuoso: InteractiveSparqlQueryBuilder (in HTML/CSS/.js). Pictured below; extensive documentation and screenshots linked from their main page.

…an ancestor of which was Damian Steer’s RDFAuthor tool for MacOSX, which could generate Squish (a SPARQL precursor) and query services over the ‘array of hashtables’ SOAP-for-rdf-query non spec that Libby Miller and I had implementations of. From the RDFAuthor tutorial:

The old Maryland BINPIQ SHOE knowledgebase query applet is the grandaddy of them all. Sadly I don’t have any screenshots and the applet itself seems to be coderotted. [...] Ah, but here I find an email I wrote about it 8 years ago(!), which has screenshots:

SemanticSoft from Moldova also have some visual SPARQL UI:

No real conclusion here. I just found myself looking around some of these links, and thought I’d share them. I’m sure there’s a lot more related work out there (eg. NIGHTLIGHT from folk at Southampton Uni), and that the rise of fancy HTML-based UIs and JSON for data access makes for an ever-more interesting environment for zero-install graphical query tools.

One thing I remember about the old Maryland applets: as their representational language became more expressive (moving from binary to n-ary), the graphical query UI became somewhat less intuitive. Now since SPARQL itself adds some concepts not in the underlying target language (ie. RDF doesn’t have named graphs, optionals etc), the ability to make a graphical query UI that exploits the “it’s just an RDF graph with bits labelled as missing” (per Guha’s original proposal) perhaps gets a bit strained. In particular, how might named graphs best be represented in visual editors?