Last October I posted a writeup of some experiments that illustrate item-to-item similarities from Apache Mahout using Gephi for visualization. This was under a heading that quotes Ben Fry, “Everything looks like a graph” (but almost nothing should ever be drawn as one). There was also some followup discussion on the Gephi project blog.
I’ve just seen a cluster of related Gephi experiments, which are reinforcing some of my prejudices from last year’s investigations:
- Graphing the History of Philosophy
- Graphing Every Idea In History (inspired by, and generalising, the former)
- Visualising related entries in Wikipedia using Gephi (shows SemWeb import plugin for Gephi)
These are all well worth a read, both for showing the potential and the limitations of Gephi. It’s not hard to find critiques of the intelligibility or utility of squiggly-but-inspiring network diagrams; Ben Fry’s point was well made. However I think each of the examples I link here (and my earlier experiments) show there is some potential in such layouts for showing ‘similarity neighbourhoods’ in a fairly appealing and intuitive form.
In the case of the history of Philosophy it feels a little odd using a network diagram since the chronological / timeline aspect is quite important to the notion of a history. But still it manages to group ‘like with like’, to the extent that the inter-node connections probably needn’t even be shown.
I’m a lot more comfortable with taking the ‘everything looks like a graph’ route if we’re essentially generating a similarity landscape. Whether these ‘landscapes’ can be made to be stable in the face of dataset changes or re-generation of the visualization is a longer story. Gephi is currently a desktop tool, and as such has memory issues with really large graphs, but I think it shows the potential for landscape-oriented graph visualization. Longer term I expect we’ll see more of a split between something like Hadoop+Mahout for big data crunching (e.g. see Mahout’s spectral clustering component which takes node-to-node affinities as input) and something WebGL and browser-based UI for the front-end. It’s a shame the Gephi efforts in this direction (GraphGL) seem to have gone quiet, but for those of you with modern graphics cards and browsers, take a look at alterqualia’s ‘dynamic terrain‘ WebGL demo to get a feel for how landscape-shaped datasets could be presented…
Also btw look at the griffsgraphs landscape of literature; this was built solely from ‘influences’ relationships from Wikipedia… then compare this with the landscapes I was generating last year from Harvard bibliographic data. They were both built solely using subject classification data from Harvard. Now imagine if we could mutate the resulting ‘map’ by choosing our own weighting composited across these two sources. Perhaps for the music or movies or TV areas of the map we might composite in other sources, based on activity data analysed by recommendation engine, or just different factual relationships.
There’s no single ‘correct’ view of the bibliographic landscape; what makes sense for a phd researcher, a job seeker or a schoolkid will naturally vary. This is true also of similarity measures in general, i.e. for see-also lists in plain HTML as well as fancy graph or landscape-based visualizations. There are more than metaphorical comparisons to be drawn with the kind of compositing tools we see in systems like Blender, and plenty of opportunities for putting control into end-user rather than engineering hands.
In just the last year, Harvard (and most recently OCLC) have released their bibliographic dataset for public re-use, the Wikidata project has launched, and browser support for WebGL has been improving with every release. Despite all the reasonable concerns out there about visualizing graphs as graphs, there’s a lot to be said for treating graphs as maps…