OpenStreetMap for disaster response – raw notes from Harry Wood talk

Very raw, sometimes verbatim but doubtless flawed notes from Harry Wood‘s excellent talk at Open Data Institute in London. #odifridays

Many thanks to Harry for a great talk and to ODI for putting together these lunchtime lectures. The ODI have also published slides and audio from the talk.

“An introduction to OpenStreetMap, the UK born project to map the world as open data, and a look at how volunteer mappers helped with disaster response in the Philippines after Typhoon Haiyan, with Harry Wood . Harry is a developer at, and is on the board of the Humanitarian OpenStreetMap Team.”

Note: this is un-checked, very raw notes that I typed while listening. There will be mistakes and confusions; my fault not Harry’s!

Typhoons …phillipines area hammered during typhoon season. The typhoons often meander off, don’t hit coast. But this one hit, and fast, … so a big storm surge. Fastest wind speeds on record ‘biggest storm ever’.

[shows video clip]

More than 6000 died. Role of mapping in disaster responses: food shelter etc; can donate money directly for giving food.  Info / logistics challenge re delivering aid. Lots of ‘where?’ questions. Where are people suffering most;? Where to deliver aid to? team locations etc. Huge value of maps for disaster response.

Q: who has edited OSM? A: lots of hands raised.

Maps … GIS / vector data will always be a bit complex, but we try to dumb it down. The data model is also v stripped down, just tagged nodes and ways. e.g. a pub is a node with amenity=pub. It’s also renderable map -> viewed as a map on, … but we play down that aspect a bit, since there are other map providers around e.g. Google.

But the maps are a important aspect of disaster response.

OSM editing -> appear on map can take ~ 10 mins.

This is quite a technical hit. There’s a rendering server here in London; aspect of providing a feedback loop (editing -> new map).  A shared commons for geo data. AID orgs get excited … coming together sharing same platform. OSM is very much about raw data too, not just the maps. So this is different to pure map providers, … entirely open access to the raw data.

In terms of the humanitarian response, … agencies can take the data unencumbered, use it offline. It is open data. there is an exciting open data story for OSM.

As humanitarian work, it can be a problem that we allow commercial re-use – [not all orgs welcome that]

Community + Raw vector data + simple editing + Updated map — these 4 elements make it very attractive to humanitarian work.

Haiti in 2010, collab came together very quickly, for the two worst-hit cities (port au Prince and …). This speed was v useful for aid orgs; those orgs were printing it out, in tents, response centres. People used it on the web too, Ushahidi too, ie they’re a bit more accurate due to these improvements.

“my favourite use: ” … a Garmin handheld GPS unit, … loaded with data from open ecosystem, used offline  quintessential use case of raw data from OSM but also life-saving. Since haiti, there have been other disasters. Not all of these so suited to OSM helping out – e.g. massive pakistan floods, … harder to map such a larger area. Couldn’t get imagery for that entire area.

To some extent there are pakistan maps already; less so for Haiti. Similarly re Japan, already were maps.

re Sendai tsunami, .. yes there were free maps; yes there were high quality official maps, … but could you get hold of recently updated freely avail high quality maps? so still some role there.

Since then, organizing more: Tasking Manager,

A common Q: ‘where to start mapping?’

Way of coordinating for a large area. Drop a grid, get people to acquire a square, … load into editor, ‘done’ when done. This workflow came into its own during Philippines. Sometimes in resp to an aid agency request, … or as we have imagery, … Visualizing changesets, .. bounding boxes slide, Philippines editing traffic [slide] brand new, made last night,  … got up to almost 300 users involved on 1 day. No of changes (philippines) ~ 40,000 edits.

Peak in interest corresponds in interest, corresponds to general interest [shows google trends], though shows a slightly longer attention span. Want the spike further over to the left,… the sooner the better, e.g. as aid agencies may be taking a snapshot of our data, …

Graph showing new users … ppl who appear to have registered during the time of the disaster response,  shows also ‘old timers’ getting engaged earlier, few days lag for the newer users.

We have a humanitarian mapping style, … not the default OSM view  we tweaked it slightly – e.g. to show a red outline around buildings appear to be damaged. Getting mappers to look at post-disaster imagery, e.g. buildings that have been swept away with water. More examples of data getting used: map posters popular with aid agencies; they fly out now with a cardboard roll full of osm posters. In particular red cross heavily involved.

In UK office down in Moorgate, used tasking manager there, contributing to OSM to improve the printouts they were getting. Ways to help:; comms, blogging, coordination, wiki, promo videos and tutorials, imagery warping / tiling / hosting;

software dev’t, use the open data; build tools to work with it, …

Gateway skill: learn to map!


A Quick demo.

  • Shows tasking manager UI from

  • colour coded squares either mapped, or mapped and validated by a 2nd reviewer

  • click to acquire a square, then to invoke OSM editor of choice e.g. JOSM

  • alternative – edit directly in website via js-based UI

  • we tend to teach new users the JOSM GUI

  • shows workflow of marking a road (nodes/ways) picking up from work in ‘someone else’s square’

Comment from a Nigel of mapaction v supportive, ‘used it all the time’. ‘last few emergencies, … this stuff is pervasive, if it wasn’t there we’d be really struggling’.

Comment from Andrew B… (british red cross) the volunteer aspect as well, … between us and mapaction, it’s the volunteers that make it happen, …

Q to audience, for Haiyan, lessons?

Andew points to row of British red cross mapping volunteers – ‘we’re coordinating w/ US red cross, federation, …  they’re dealing with those in the area; whereas Nigel is using it on the ground in this area that’s going out tomorrow. We were doing situational reports, who-what-where-when eg risk vs need vs capabilities, … understanding that kind of stuff. This gives us underlying map, to support all this.

Q re coordination. Nigel of mapaction “Maps are coordination glue”; Harry  ”everything has a location aspect.”

Ed Parsons Q: “v interested  in task manager element … if you had the info before, that’s hugely valuable how successful ? how do you motivate ppl to map an area they’ve not thought about?

Harry: many people motivated by seeing it on the news, …in  a way a shame as better if happens ahead of time, … work on that under disaster risk reduction. e.g. in Indonesia we have extensive mapping work, as it lies on a fault line, risk assessment, … trying to get a map of every building, get people to draw around buildings . But there’s less enthusiasm for these things before they’re needed.

Harry: gratifying that tasking manager is software we’ve dev’t reasonably, that HOT as an org has matured as a community, we have etiquette ar ound using the tasking manager, fell into place naturally.

Ivan (doctors without borders): 2nds Nigel’s point  re lifesaving we have years, decades of health data. People telling us where ppl are from … To understand epidemic patterns … in haiti we couldn’t find src of the outbreaks (despite Snow/cholera analogy) … because we can’t get raw usable data that correlates to what people report as their place of origin and where they got sick. Wokring w/ OSM. Some success in Phillipines, … more challenge in Congo & other area. Aim to be able to correlate places reported from walk-in patients to a real world place, and get into forecasting and preventative medicine. Struggling to achieve in these situations what europe had 150 years ago.

Harry: importance of geocoding from names

Ivan: every person in the world has some description for where they live If it’s a streetname/number that’s easy; if its’ directions from a b-b tree that’s harder. But can make a start, ideally 1-200M, but kilometers better than nothing. We sit on piles of data that we can’t correlate to anything so far.

Biggest single impediment is access to imagery? get other providers to do as BIng…

Harry: the challenge for imagery providers is that it is worth money, which is why they put satellites or fly the planes, so can’t eat away at that too much. OFten you’ll see data made availaable temporarily after emergency . For example re Pakistan, downgraded/fuzzy quality data was shared. There are some agreements in place, … us govt put in place frameworks to source data. For example that the imagery can only be used in an OSM editor. But need to be able to derrive vector data from it (hence there are issues with using Google imagery in this way).

Luke Cayley(sp?): (missed). concur w/ Nigel, Ivan. Q re imagery: have you tried to get it from European disaster mechanism, Copernicus, which has some provisions for disaster readyness prep.

Harry: will follow up on this.

Luke Q: how aid agencies use the raw data in the field to collect data? eg. MSF, … What’s your feeling for the barriers to making this a well recognised procedure, using OSM as one of the tools to make that happen?

Harry: did start to discuss re Phillipines, … about data into OSM from on the ground teams. With Haiti streetnames you don’t get them from the raw imagery so needs on-the-ground gathering. The process for on the ground gathering is pretty mature around OSM, tools, mobile apps etc. But a case of getting ppl interested in doing that. In diaster response situatoin, it is hard to tell peopel they ought to be writing down names of streets.

Q: for DFID Luke, … a number of funds are available. Because OSM is a global public good, it is the kind of thing DFID would tend to be supportive of in funding proposals (but can’t promise).

Harry: re diaster situation, often it won’t be a priority during the diaster to collect street names. All it takes is geo-located photos, a snap of a street sign.

Q from someone called Chris … you spoke of satellite imagery as source for mapping. Are you exploring use of pro-sumer vs aerial imagery?

Harry: Satellite imagery is now approaching aerial photography quality, but remains expensive due to operational cost. Another cost is the vast amount of disk space, bandwidth, hosting costs. These problems are not insurmountable. OSM and HOT have some resources to help here – ‘talk to us’.  Aerial imagery historically has been better. If you look at Bing or Google ‘satellite’ images they’re often from planes, so yes, that can help. Also new area of drones over small (but maybe important) areas.

Remembering Aaron Swartz

“One of the things the Web teaches us is that everything is connected (hyperlinks) and we all should work together (standards). Too often school teaches us that everything is separate (many different ‘subjects’) and that we should all work alone.” –Aaron Swartz, April 2001.

So Aaron is gone. We were friends a decade ago, and drifted out of touch; I thought we’d cross paths again, but, well, no.

Update: MIT’s report is published.

 I’ll remember him always as the bright kid who showed up in the early data sharing Web communities around RSS, FOAF and W3C’s RDF, a dozen years ago:

"Hello everyone, I'm Aaron. I'm not _that_ much of a coder, (and I don't know
much Perl) but I do think what you're doing is pretty cool, so I thought I'd
hang out here and follow along (and probably pester a bit)."

Aaron was from the beginning a powerful combination of smart, creative, collaborative and idealistic, and was drawn to groups of developers and activists who shared his passion for what the Web could become. He joined and helped the RSS 1.0 and W3C RDF groups, and more often than not the difference in years didn’t make a difference. I’ve seen far more childishness from adults in the standards scene, than I ever saw from young Aaron. TimBL has it right; “we have lost one of our own”. He was something special that ‘child genius’ doesn’t come close to capturing. Aaron was a regular in the early ’24×7 hack-and-chat’ RDF IRC scene, and it’s fitting that the first lines logged in that group’s archives are from him.

I can’t help but picture an alternate and fairer universe in which Aaron made it through and got to be the cranky old geezer at conferences in the distant shiny future. He’d have made a great William Loughborough; a mutual friend and collaborator with whom he shared a tireless impatience at the pace of progress, the need to ask ‘when?’, to always Demand Progress.

I’ve been reading old IRC chat logs from 2001. Within months of his ‘I’m not _that_ much of a coder’ Aaron was writing Python code for accessing experimental RDF query services (and teaching me how to do it, disclaiming credit, ‘However you like is fine… I don’t really care.’). He was writing rules in TimBL’s experimental logic language N3, applying this to modelling corporate ownership structures rather than as an academic exercise, and as ever sharing what he knew by writing about his work in the Web. Reading some old chats, we talked about the difficulties of distributed collaboration, debate and disagreement, personalities and their clashes, working groups, and the Web.

I thought about sharing some of that, but I’d rather just share him as I choose to remember him:

22:16:58 <AaronSw> LOL

Learning WebGL on your iPhone: Radial Blur in GLSL

A misleading title perhaps, since WebGL isn’t generally available to iOS platform developers. Hacks aside, if you’re learning WebGL and have an iPhone it is still a very educational environment. WebGL essentially wraps OpenGL ES in a modern Web browser environment. You can feed data in and out as textures associated with browser canvas areas, manipulating data objects either per-vertex or per-pixel by writing ‘vertex’ and ‘fragment’ shaders in the GLSL language. Although there are fantastic tools out there like Three.js to hide some of these details, sooner or later you’ll encounter GLSL. The iPhone, thanks to tools like GLSL Studio and Paragraf, is a great environment for playing with GLSL. And playing is a great way of learning.

Radial Blur GLSL

GLSL fragment shaders are all about thinking about visuals “per-pixel”. You can get a quick feel for what’s possible by exploring the GLSL Sandbox site. The sandbox lets you live-edit GLSL shaders, which are then applied to a display area with trivial geometry – the viewing area is just two big triangles. See Iñigo Quilez’s livecoding videos or ‘rendering worlds with two triangles‘ for more inspiration.

All of which is still rocket science to me, but I was surprised at how accessible some of these ideas and effects can be. Back to the iPhone: using Paragraf, you can write GLSL fragment shaders, whose inputs include multi-touch events and textures from device cameras and photo galleries. This is more than enough to learn the basics of GLSL, even with realtime streaming video. Meanwhile, back in your Web browser, the new WebRTC video standards work is making such streams accessible to WebGL.

Here is a quick example based on Thibaut Despoulain‘s recent three.js-based tutorials showing techniques for compositing, animation and glow effects in WebGL.  His Volumetric Light Approximation post provides a fragment shader for computing radial blur, see his live demo for a control panel showing all the parameters that can be tweaked. Thanks to Paragraf, we can also adapt that shader to run on a phone, blurring the camera input around the location of the last on-screen touch (‘t1′). Here is the original, embedded within a .js library. And here is a cut down version adapted to use the pre-declared structures from Paragraf (or see gist for cleaner copy):

vec3 draw() {
  vec2 vUv = p;
  float fX=t1.x, fY=t1.y, illuminationDecay = 1.0,
  fExposure = 0.2, fDecay = 0.93,
  fDensity = .3, fWeight = 0.4, fClamp = 1.0;
  const int iSamples = 8;
  vec2 delta = vec2(vUv-vec2(fX,fY))/float(iSamples)*fDensity,
coord = vUv;
  vec4 FragColor = vec4(0.0);
  for(int i=0; i < iSamples ; i++)  {
    coord -= delta;
    vec4 texel = vec4( cam(coord), 0.0);
    texel *= illuminationDecay * fWeight;
    FragColor += texel;
    illuminationDecay *= fDecay;
  FragColor *= fExposure;
  FragColor = clamp(FragColor, 0.0, fClamp);

Cat photo


As I write this, I realise I’m blurring the lines between ‘radial blur’ and its application to create ‘god-rays’ in a richer setting. As I say, I’m not an expert here (and I just post a quick example and two hasty screenshots). My main purpose was rather to communicate that tools for learning more about such things are now quite literally in many people’s hands. And also that using GLSL for real-time per-pixel processing of smartphone camera input is a really fun way to dig deeper.

At this point I should emphasis that draw() here and other conventions are from Paragraf; see any GLSL or WebGL docs, or the original example here, for details. and One Hundred Years of Search

A talk from London SemWeb meetup hosted by the BBC Academy in London, Mar 30 2012….

Slides and video are already in the Web, but I wanted to post this as an excuse to plug the new Web History Community Group that Max and I have just started at W3C. The talk was part of the Libraries, Media and the Semantic Web meetup hosted by the BBC in March. It gave an opportunity to run through some forgotten history, linking Paul Otlet, the Universal Decimal Classification, and some 100 year old search logs from Otlet’s Mundaneum. Having worked with the BBC Lonclass system (a descendant of Otlet’s UDC), and collaborated with the Aida Slavic of the UDC on their publication of Linked Data, I was happy to be given the chance to try to spell out these hidden connections. It also turned out that Google colleagues have been working to support the Mundaneum and the memory of this early work, and I’m happy that the talk led to discussions with both the Mundaneum and Computer History Museum about the new Web History group at W3C.

So, everything’s connected. Many thanks to W. Boyd Rayward (Otlet’s biographer) for sharing the ancient logs that inspired the talk (see slides/video for a few more details). I hope we can find more such things to share in the Web History group, because the history of the Web didn’t begin with the Web…


From LinkedIn’s networking graphing service; see also my map

I’ve been digging around in graph-mining and visualization tools lately, and this use at LinkedIn is one of the few cases where such things actually break through into mainstream usefulness. Well, perhaps not useful, but it’s nice to see how groups overlap.

In my chart here, the big tight-knit, self-referential cluster on the left is Joost, the TV startup I joined in 2006/7. At the top there is another tightly-linked community: the W3C team, where I worked 1999-2005. In between is a fuzzier cluster that I can only label ‘Web 2′, ‘Social Web’, … lots of Web technology standards sort of people. Then there are the linkers, like Max Froumentin and Robin Berjon between the W3C and Joost worlds, or Libby Miller and folk from the Asemantics and Apache scene (Alberto Reggiori, Stefano Mazzocchi) who link Joost through to the Semantic Web scene in the lower right.

The LinkedIn analysis finds distinct clusters that are fairly easy to identify as “Digital Libraries (Museums, Archives…)” and “Linked Data / RDF / Semantic Web”, even while being richly interconnected. I’m not suprised there’s a cluster for the VU University Amsterdam (even though well-linked to SW and digital libraries). However the presence of a BBC cluster was a surprise; either it shows how closely-knit the BBC community is, or just how much I’ve been hanging around with them. And that’s the intriguing thing; each individual map is just a per-person view, a thin slice through the bigger picture. It must be fun to see the whole dataset…

For more on all this, see LinkedIn or the inmaps site.


Most of us around RDF and the Semantic Web have by now probably heard the news about Talis; if not, see Leigh Dodds’ blog post. Talis are shutting down their general activities around Semantic Web and Linked Data, including the Kasabi data marketplace. Failures are usually complex and Twitter is already abuzz with punditry, speculation and ill-judged extrapolation. I just wanted to take a minute aside from all that to say something that I’ve not got around to before: “thanks!”.

Regardless of the business story, we ought to appreciate on a personal level all the hard work that the team (past and present) at Talis have put into popularising the ideas and technology around Linked Data. Talis had an extraordinarily bright, energetic and committed team, who put great passion into their work – and into supporting the work of others. All of us in the community around Linked Data have benefitted enormously from this, and will continue to benefit from the various projects and initiatives that Talis have supported.  Perhaps in a nearby parallel universe, there is a thriving alternate Talis whose efforts benefited the business more, and the commons less. We can only speculate. In this universe, the most appropriate word at this point is just “thanks”…

Everything Still Looks Like A Graph (but graphs look like maps)

Last October I posted a writeup of some experiments that illustrate item-to-item similarities from Apache Mahout using Gephi for visualization. This was under a heading that quotes Ben Fry, “Everything looks like a graph” (but almost nothing should ever be drawn as one). There was also some followup discussion on the Gephi project blog.

I’ve just seen a cluster of related Gephi experiments, which are reinforcing some of my prejudices from last year’s investigations:

These are all well worth a read, both for showing the potential and the limitations of Gephi. It’s not hard to find critiques of the intelligibility or utility of squiggly-but-inspiring network diagrams; Ben Fry’s point was well made. However I think each of the examples I link here (and my earlier experiments) show there is some potential in such layouts for showing ‘similarity neighbourhoods’ in a fairly appealing and intuitive form.

In the case of the history of Philosophy it feels a little odd using a network diagram since the chronological / timeline aspect is quite important to the notion of a history. But still it manages to group ‘like with like’, to the extent that the inter-node connections probably needn’t even be shown.

I’m a lot more comfortable with taking the ‘everything looks like a graph’ route if we’re essentially generating a similarity landscape. Whether these ‘landscapes’ can be made to be stable in the face of dataset changes or re-generation of the visualization is a longer story. Gephi is currently a desktop tool, and as such has memory issues with really large graphs, but I think it shows the potential for landscape-oriented graph visualization. Longer term I expect we’ll see more of a split between something like Hadoop+Mahout for big data crunching (e.g. see Mahout’s spectral clustering component which takes node-to-node affinities as input) and something WebGL and browser-based UI for the front-end. It’s a shame the Gephi efforts in this direction (GraphGL) seem to have gone quiet, but for those of you with modern graphics cards and browsers, take a look at alterqualia’s ‘dynamic terrain‘ WebGL demo to get a feel for how landscape-shaped datasets could be presented…

Also btw look at the griffsgraphs landscape of literature; this was built solely from ‘influences’ relationships from Wikipedia… then compare this with the landscapes I was generating last year from Harvard bibliographic data. They were both built solely using subject classification data from Harvard. Now imagine if we could mutate the resulting ‘map’ by choosing our own weighting composited across these two sources. Perhaps for the music or movies or TV areas of the map we might composite in other sources, based on activity data analysed by recommendation engine, or just different factual relationships.

There’s no single ‘correct’ view of the bibliographic landscape; what makes sense for a phd researcher, a job seeker or a schoolkid will naturally vary. This is true also of similarity measures in general, i.e. for see-also lists in plain HTML as well as fancy graph or landscape-based visualizations. There are more than metaphorical comparisons to be drawn with the kind of compositing tools we see in systems like Blender, and plenty of opportunities for putting control into end-user rather than engineering hands.

In just the last year, Harvard (and most recently OCLC) have released their bibliographic dataset for public re-use, the Wikidata project has launched, and browser support for WebGL has been improving with every release. Despite all the reasonable concerns out there about visualizing graphs as graphs, there’s a lot to be said for treating graphs as maps…

MAMP / MySQL config notes for ‘Repair with keycache’ and table metadata lock

Problem: MySQL taking forever to load some large data dumps. Forever or longer.

“mysql> show processlist;” shows it wedged at “Repair with keycache” and “Waiting for table metadata lock”.

According to a handy Stack Overflow article, this is a known and dreaded condition, which can be addressed by making sure tmp dir has plenty of space, and increasing size of myisam_max_sort_file_size from 2G (2146435072) to 30G (32212254720). Using MAMP 1.9.6 it took some more digging to find out how to add a local my.cnf settings file for MySQL. This now lives in /Applications/MAMP/conf/my.cnf (I added into [mysqld] section a line saying ‘myisam_max_sort_file_size = 30G’ (or there-abouts). Shut down the MySQL server, create that my.cnf and restart; then confirm it read your config using ‘show variables’.

Does this work? Well I don’t know yet. But enough times I’ve searched around before and found my own notes, that I thought I should at least write this much down for my future self to find :)

Update: it worked. A data import that took 2+ weeks (before I gave up) now runs in a few hours. After the bulk of the data was imported, we see ‘Repair by sorting’ in ‘show processlist’ for a while (couple of hours for 15 million records, in my case). This is, as promised, faster than ‘Repair with keycache’. I’ve done this on two machines now (with the same data); on one of them I did notice some ‘Waiting for table metadata lock’ processes in the list, but it still successfully completed overnight.