Remembering Aaron Swartz

“One of the things the Web teaches us is that everything is connected (hyperlinks) and we all should work together (standards). Too often school teaches us that everything is separate (many different ‘subjects’) and that we should all work alone.” –Aaron Swartz, April 2001.

So Aaron is gone. We were friends a decade ago, and drifted out of touch; I thought we’d cross paths again, but, well, no.

Update: MIT’s report is published.

 I’ll remember him always as the bright kid who showed up in the early data sharing Web communities around RSS, FOAF and W3C’s RDF, a dozen years ago:

"Hello everyone, I'm Aaron. I'm not _that_ much of a coder, (and I don't know
much Perl) but I do think what you're doing is pretty cool, so I thought I'd
hang out here and follow along (and probably pester a bit)."

Aaron was from the beginning a powerful combination of smart, creative, collaborative and idealistic, and was drawn to groups of developers and activists who shared his passion for what the Web could become. He joined and helped the RSS 1.0 and W3C RDF groups, and more often than not the difference in years didn’t make a difference. I’ve seen far more childishness from adults in the standards scene, than I ever saw from young Aaron. TimBL has it right; “we have lost one of our own”. He was something special that ‘child genius’ doesn’t come close to capturing. Aaron was a regular in the early ’24×7 hack-and-chat’ RDF IRC scene, and it’s fitting that the first lines logged in that group’s archives are from him.

I can’t help but picture an alternate and fairer universe in which Aaron made it through and got to be the cranky old geezer at conferences in the distant shiny future. He’d have made a great William Loughborough; a mutual friend and collaborator with whom he shared a tireless impatience at the pace of progress, the need to ask ‘when?’, to always Demand Progress.

I’ve been reading old IRC chat logs from 2001. Within months of his ‘I’m not _that_ much of a coder’ Aaron was writing Python code for accessing experimental RDF query services (and teaching me how to do it, disclaiming credit, ‘However you like is fine… I don’t really care.’). He was writing rules in TimBL’s experimental logic language N3, applying this to modelling corporate ownership structures rather than as an academic exercise, and as ever sharing what he knew by writing about his work in the Web. Reading some old chats, we talked about the difficulties of distributed collaboration, debate and disagreement, personalities and their clashes, working groups, and the Web.

I thought about sharing some of that, but I’d rather just share him as I choose to remember him:

22:16:58 <AaronSw> LOL

Schema.org and One Hundred Years of Search

A talk from London SemWeb meetup hosted by the BBC Academy in London, Mar 30 2012….

Slides and video are already in the Web, but I wanted to post this as an excuse to plug the new Web History Community Group that Max and I have just started at W3C. The talk was part of the Libraries, Media and the Semantic Web meetup hosted by the BBC in March. It gave an opportunity to run through some forgotten history, linking Paul Otlet, the Universal Decimal Classification, schema.org and some 100 year old search logs from Otlet’s Mundaneum. Having worked with the BBC Lonclass system (a descendant of Otlet’s UDC), and collaborated with the Aida Slavic of the UDC on their publication of Linked Data, I was happy to be given the chance to try to spell out these hidden connections. It also turned out that Google colleagues have been working to support the Mundaneum and the memory of this early work, and I’m happy that the talk led to discussions with both the Mundaneum and Computer History Museum about the new Web History group at W3C.

So, everything’s connected. Many thanks to W. Boyd Rayward (Otlet’s biographer) for sharing the ancient logs that inspired the talk (see slides/video for a few more details). I hope we can find more such things to share in the Web History group, because the history of the Web didn’t begin with the Web…

My ’70s Schoolin’ (in RDFa)

I went to Hamsey Green school in the 1970s.

Looking in the UK Govt datasets, I see it is listed there with a homepage of ‘http://www.hamsey-green-infant.surrey.sch.uk’ (which doesn’t seem to work).

Some queries I’m trying via the SPARQL dataset (I’ll update this post if I make them work…)

First a general query, from which I found the URL manually, …

select distinct ?x ?y where { ?x <http ://education.data.gov.uk/def/school/websiteAddress> ?y . }

Then I can go back into the data, and find other properties of the school:

PREFIX sch-ont:  <http://education.data.gov.uk/def/school/>
select DISTINCT ?x ?p ?z WHERE
{
?x sch-ont:websiteAddress "http://www.hamsey-green-infant.surrey.sch.uk" .
?x ?p ?z .
}

Results in json

How to make this presentable? I can’t get output=html to work, but if I run this ‘construct’ query it creates a simple flat RDF document:

PREFIX sch-ont:  <http://education.data.gov.uk/def/school/>
CONSTRUCT {
 ?x ?p ?z .
}
WHERE
{
?x sch-ont:websiteAddress "http://www.hamsey-green-infant.surrey.sch.uk" .
?x ?p ?z .
}

So, where are we here? We see two RDF datasets about the same school. One is the simple claim that I attended the

school at some time in the past (1976-1978, in fact). The other describes many of its current attributes; most of which may be different now from in the past. In my sample RDFa, I used the most popular Web link for the school to represent it; in the Edubase government data, it has a Web site address but it seems not to be current.

Assuming we’d used the same URIs for the school’s homepage (or indeed for the school itself) then these bits of data could be joined.

Perhaps a more compelling example of data linking would be to show this schools data mixed in with something like MySociety’s excellent interactive travel maps? Still, the example above shows that basic “find people who went to my school” queries should be very possible…

Obama for middle-managers

(inspired by the ‘Yes we can’ powerpoint slides…)

We Will

  • act – not only to create new jobs, but to lay a new foundation for growth
  • build the roads and bridges, the electric grids and digital lines that feed our commerce and bind us together
  • restore science to its rightful place, and wield technology’s wonders to raise health care’s quality and lower its cost
  • harness the sun and the winds and the soil to fuel our cars and run our factories
  • transform our schools and colleges and universities to meet the demands of a new age
  • begin to responsibly leave Iraq to its people
  • work tirelessly to lessen the nuclear threat, and roll back the spectre of a warming planet
  • [To those who cling to power through corruption and deceit and the silencing of dissent], extend a hand if you are willing to unclench your fist
  • [for those who seek to advance their aims by inducing terror and slaughtering innocents], defeat you

We Will Not…

  • give [those ideals] up for expedience’s sake
  • apologise for our way of life, nor will we waver in its defence

Family trees, Gedcom::FOAF in CPAN, and provenance

Every wondered who the mother(s) of Adam and Eve’s grand-children were? Me too. But don’t expect SPARQL or the Semantic Web to answer that one! Meanwhile, …

You might nevetheless care to try the Gedcom::FOAF CPAN module from Brian Cassidy. It can read Gedcom, a popular ‘family history’ file format, and turn it into RDF (using FOAF and the relationship and biography vocabularies). A handy tool that can open up a lot of data to SPARQL querying.

The Gedcom::FOAF API seems to focus on turning the people or family Gedcom entries  into their own FOAF XML files. I wrote a quick and horrid Perl script that runs over a Gedcom file and emits a single flattened RDF/XML document. While URIs for non-existent XML files are generated, this isn’t a huge problem.

Perhaps someone would care to take a look at this code and see whether a more RDFa and linked-data script would be useful?

Usage: perl gedcom2foafdump.pl BUELL001.GED > _sample_gedfoaf.rdf

The sample data I tested it on is intriguing, though I’ve not really looked around it yet.

It contains over 9800 people including the complete royal lines of England, France, Spain and the partial royal lines of almost all other European countries. It also includes 19 United States Presidents descended from royalty, including Washington, both Roosevelts, Bush, Jefferson, Nixon and others. It also has such famous people as Brigham Young, William Bradford, Napoleon Bonaparte, Winston Churchill, Anne Bradstreet (Dudley), Jesus Christ, Daniel Boone, King Arthur, Jefferson Davis, Brian Boru King of Ireland, and others. It goes all the way back to Adam and Eve and also includes lines to ancient Rome including Constantine the Great and ancient Egypt including King Tutankhamen (Tut).

The data is credited to Matt & Ellie Buell, “Uploaded By: Eochaid”, 1995-05-25.

Here’s an extract to give an idea of the Gedcom form:

0 @I4961@ INDI
1 NAME Adam //
1 SEX M
1 REFN +
1 BIRT
2 DATE ABT 4000 BC
2 PLAC Eden
1 DEAT
2 DATE ABT 3070 BC
1 FAMS @F2398@
1 NOTE He was the first human on Earth.
1 SOUR Genesis 2:20 KJV
0 @I4962@ INDI
1 NAME Eve //
1 SEX F
1 REFN +
1 BIRT
2 DATE ABT 4000 BC
2 PLAC Eden
1 FAMS @F2398@
1 SOUR Genesis 3:20 KJV

It might not directly answer the great questions of biblical scholarship, but it could be a fun dataset to explore Gedcom / RDF mappings with. I wonder how it compares with Freebase, DBpedia etc.

The Perl module is a good start for experimentation but it only really scratches the surface of the problem of representing source/provenance and uncertainty. On which topic, Jeni Tennison has a post from a year ago that’s well worth (re-)reading.

What I’ve done in the above little Perl script is implement a simplification: instead of each family description being its own separate XML file, they are all squashed into a big flat set of triples (‘graph’). This may or may not be appropriate, depending on the sourcing of the records. It seems Gedcom offers some basic notion of ‘source’, although not one expressed in terms of URIs. If I look in the SOUR(ce) field in the Gedcom file, I see information like this (which currently seems to be ignored in the Gedcom::FOAF mapping):

grep SOUR BUELL001.GED | sort | uniq

1 NOTE !SOURCE:Burford Genealogy, Page 102 Cause of Death; Hemorrage of brain
1 NOTE !SOURCE:Gertrude Miller letter “Harvey Lee lived almost 1 year. He weighed
1 NOTE !SOURCE:Gertrude Miller letter “Lynn died of a ruptured appendix.”
1 NOTE !SOURCE:Gertrude Miller letter “Vivian died of a tubal pregnancy.”
1 SOUR “Castles” Game Manuel by Interplay Productions
1 SOUR “Mayflower Descendants and Their Marriages” pub in 1922 by Bureau of
1 SOUR “Prominent Families of North Jutland” Pub. in Logstor, Denmark. About 1950
1 SOUR /*- TUT
1 SOUR 273
1 SOUR AHamlin777.  E-Mail “Descendents of some guy
1 SOUR Blundell, Sherrie Lea (Slingerland).  information provided on 16 Apr 1995
1 SOUR Blundell, William, Rev. Interview on Jan 29, 1995.
1 SOUR Bogert, Theodore. AOL user “TedLBJ” File uploaded to American Online
1 SOUR Buell, Barbara Jo (Slingerland)
1 SOUR Buell, Beverly Anne (Wenge)
1 SOUR Buell, Beverly Anne (Wenge).  letter addressed to Kim & Barb Buell dated
1 SOUR Buell, Kimberly James.
1 SOUR Buell, Matthew James. written December 19, 1994.
1 SOUR Burnham, Crystal (Harris).  Leter sent to Matt J. Buell on Mar 18, 1995.
1 SOUR Burnham, Crystal Colleen (Harris).  AOL user CBURN1127.  E-mail “Re: [...etc.]

Some of these sources could be tied to cleaner IDs (eg. for books c/o Open Library, although see ‘in search of cultural identifiers‘ from Michael Smethurst).

I believe RDF’s SPARQL language gives us a useful tool (the notion of ‘GRAPH’) that can be applied here, but we’re a long way from having worked out the details when it comes to attaching evidence to claims. So for now, we in the RDF scene have a fairly course-grained approach to data provenance. Databases are organized into batches of triples, ie. RDF statements that claim something about the world. And while we can use these batches – aka graphs – in our queries, we haven’t really figured out what kind of information we want to associate with them yet. Which is a pity, since this could have uses well beyond family history, for example to online journalistic practices and blog-mediated fact checking.

Nearby in the Web: see also the SIOC/SWAN telecons, a collaboration in the W3C SemWeb lifescience community around the topic of modelling scientific discourse.

Underground Victorian street? in Bristol?

Hidden away beneath Lawrence Hill lies a secret world – an underground Victorian street stretching from Ducie Road to the Packhouse pub.

Local historian Dave Stephenson finds out more.

For many years I had heard tales of a Victorian street abandoned beneath busy Lawrence Hill. To add substance to the legend there were people who claimed to have seen it when they were young.

They told me that this underground street stretched from Ducie Road, near the now closed Earl Russel pub, to the Packhorse pub, in tunnels under the road.

All had Victorian shop fronts, some still with their glass intact, and several street lamps still hung on the walls.

From the Underground Bristol forum. Rather vaguely attributed to  ‘a Bristol Newspaper – July 2007′; perhaps the Evening Post, though it would apparently cost a search fee to find out. The quoted article goes on to say that the underground street is pretty messed up now, with rubbish, and has been looted quite comprehensively.

(Back to) The Future of Interactive Media

The Internet is beginning a fundamental transition into the broadband, commercial information superhighway of the future. Today, the Internet offers immediate opportunities for commercial applications by connecting millions of PC, Macintosh and workstation users with businesses and organizations around the world. Tomorrow, as network capabilities and performance increase, this global link will deliver interactive services, information and entertainment into consumers’ homes. Mosaic Communications Corporation intends to support companies and consumers throughout this transition, and to accelerate the coming of this new era with tools that ease and advance online communications.

Mosaic Communications Corporation: Who We Are: Our Story: The Future of Interactive Media

jwz and friends have restored mcom.com to it’s former 1994-era glory, reminding us that the future’s always up for grabs.