Visual SPARQL query tools

Quick links – thinking about tools that allow graphical SPARQL query authoring…

OpenLink Virtuoso: InteractiveSparqlQueryBuilder (in HTML/CSS/.js). Pictured below; extensive documentation and screenshots linked from their main page.

…an ancestor of which was Damian Steer’s RDFAuthor tool for MacOSX, which could generate Squish (a SPARQL precursor) and query services over the ‘array of hashtables’ SOAP-for-rdf-query non spec that Libby Miller and I had implementations of. From the RDFAuthor tutorial:

The old Maryland BINPIQ SHOE knowledgebase query applet is the grandaddy of them all. Sadly I don’t have any screenshots and the applet itself seems to be coderotted. [...] Ah, but here I find an email I wrote about it 8 years ago(!), which has screenshots:

SemanticSoft from Moldova also have some visual SPARQL UI:

No real conclusion here. I just found myself looking around some of these links, and thought I’d share them. I’m sure there’s a lot more related work out there (eg. NIGHTLIGHT from folk at Southampton Uni), and that the rise of fancy HTML-based UIs and JSON for data access makes for an ever-more interesting environment for zero-install graphical query tools.

One thing I remember about the old Maryland applets: as their representational language became more expressive (moving from binary to n-ary), the graphical query UI became somewhat less intuitive. Now since SPARQL itself adds some concepts not in the underlying target language (ie. RDF doesn’t have named graphs, optionals etc), the ability to make a graphical query UI that exploits the “it’s just an RDF graph with bits labelled as missing” (per Guha’s original proposal) perhaps gets a bit strained. In particular, how might named graphs best be represented in visual editors?

Fireeagle IRC bot / OAuth / Dopplr

I noticed people in #geo playing with a fireeagle bot earlier, so I had to try too.

danbri: fireup, help?
[13:29] fireeagle: danbri: I have just sent you a URL in privmsg, please click on it to authorize.

[time passes... things happen behind the scenes...]

[13:31] danbri: fireup Leiden, Netherlands
[13:31] fireeagle: Updating danbri to Leiden, Nederland
[13:32] • danbri gives fireeagle a botsnack

Behind the scenes, the bot passed me an URL:

fireeagle: Try Again! You must authorize first: https://fireeagle.yahoo.net/oauth/authorize?oauth_token=XYZABCD

And clicking this took me off into the Fireeagle Web UI, where I had to reconfirm my membership (3 months had lapsed; it times out if you don’t show signs of life). Then I had to give the IRC bot permission to access my Fireeagle data, and finally it send me to a duff return URL,

http://0.0.0.0:3000/dummy_app/callback?name=Honking+Fun&oauth_token=XYZABCD

…which is understandable-ish given that IRC bots aren’t normal Web apps, though I wonder if an IRC URI would be allowed per the OAuth spec. I know there is work on wiring XMPP and OAuth together, after all.

So now Fireeagle and downstream apps (eg. the MacOSX widget pictured here) can learn about my location when I tell this IRC bot where I am.

So logged into Fireeagle Web site now I read:

Fire Eagle last spotted you about 1 hour ago in Leiden, Nederland using Fire Eagle GeoIRC bot. If you’ve gone somewhere else, then you should update your location!

And here’s the corresponding MacOSX widget:

 Fireeagle widget

Since I don’t use nickserv on the OFTC IRC network, this means anyone logged in there as ‘danbri’ can set my location data at Fireeagle. But at least they can’t add Facebook buddies, read my private email or ‘your password has been reset’ messages. And if they do, Fireeagle lets me revoke this token without messing up anything else.

Now, how to convince Dopplr to believe what it oauth-reads at Fireeagle? Logging into Dopplr by OpenID (hurrah :), I read the following falsehood:

…and when I try to tell it I’m actually in the Netherlands for a few days, and that my June 17th Dopplr journal entry “You returned from a trip to Guadalajara.” wasn’t a return to Bristol, it gets stuck: “Sorry, we couldn’t add that trip: You need a return date“. On closer inspection this seems to be my error; Dopplr has a new facility for multi-stop trips and if I’d added a Leiden stage to this last trip before it seemingly finished, Dopplr, Fireeagle and the IRC bot would be in agreement about my location.

So what’s so interesting about all this? Partly that OAuth provides a reasonable and deployable model for wiring up data pipes between socially oriented Web sites (and with some provisos, non-Web UI too), so that each can specialise in some task or domain. But also that, when the specialities of these sites and services do overlap, we’ll get into the business of conflict resolution amongst competing claims. When Dopplr reads that Fireeagle thinks I’m in Leiden, despite the last trip it knows about supposedly returning me to Bristol on the 17th, what should it do? Perhaps update its own state to a “well nobody told me, but it seems he’s in the netherlands” position?

ps. Fireeagle remains invite-only, and I don’t have any invites. Sorry about that, for those who haven’t tried it yet. There are some screenshots and a writeup at TechCrunch at least. Basically it’s an information broker for your location data, with data access mediated largely through OAuth REST APIs. For those with an invite, the developer docs have more detail than I’ve yet managed to digest.

Mashed remote contrib: BBC music genres meet last.fm (meets OAuth)

I’m not at the BBC’s 2008 hackday-like-event, Mashed. But here’s a quick hack based on the data the BBC audio and music team have made available. The data that caught my eye was “Genres for set of MusicBrainz Artists” based on editorial data entered for bbc.co.uk/music. This is a simple file:

0039c7ae-e1a7-4a7d-9b49-0cbc716821a6    Rock and Indie
003abc43-e2bb-40e5-a080-3c4b9e56ea63    Classical
0053dbd9-bfbc-4e38-9f08-66a27d914c38    Classic Pop and Rock

It maps a MusicBrainz artist ID (increasingly the defacto open standard for identifying artists, at least in popular western music) to a simple genre label.

I haven’t yet found corresponding pages on the BBC music site for each of these genres.

Since last.fm expose my last 12 month’s most commonly played artists for all to mock, it is quite easy to cross-reference these sources to get a summary of my alleged musical interests.

A commandline ruby script online for now:

Airbag:mashed danbri$ ruby lastfm-genres.rb
Classic Pop and Rock: 13
Rock and Indie: 17
Hip Hop; RnB and Dance Hall: 1
World: 1
Dance and Electronica: 12

It’s a while since I wrote any code, clearly: this should at least be sorted and trimmed to the top 3 or so. We’d need to look at a few people’s profiles to figure out the best approach to summarising someone’s interests, and a little thought is needed for representing this in RDF/FOAF.

Now where I see OAuth fitting into this picture is the “what do we do next” step. OAuth potentially addresses a problem we’ve had in the FOAF scene, whereby FOAF generators and adaptors produce a chunk of markup, but there’s no easy/natural way to post this back into the Web. I’m hoping that blogs and hosting sites will allow external FOAF sources (like this script) to update/augment the FOAF descriptions we host in our existing Web sites and profiles. I sent some notes on this to the OAuth list (albeit to a deafening silence).

See also:  mashed last.fm / bbc genres ruby script

AllegroGraph RDFStore 3.0: Social Network Analysis

AllegroGraph 3.0 now comes with a Social Network Analysis component, amongst several other interesting features including improved geo support.

example diagram of people and relationships

By viewing interactions as connections a in graph, we can treat a multitude of different situations using the tools of Social Network Analysis (SNA). SNA lets us answer questions like:

    • How closely connected are any two individuals?
    • What are the core groups or clusters within the data?
    • How important is this person (or company) to the flow of information?
    • How likely is it that this person and that person know one another?

The field is full of rich mathematical techniques and powerful algorithms. AllegroGraph’s SNA toolkit includes an array of search methods, tools for measuring centrality and importance, and the building blocks for creating more specialized measures. These tools can be used with any “network” data set whether its connections between companies and directors, predator/prey food webs, chemical interactions, or links between web sites.

Yahoo: RDF and the Monkey

From the Yahoo developer network blog,

Besides the existing support for microformats, we have already shared our plans for supporting other standards for embedding metadata into HTML. Today we are announcing the availability of eRDF metadata for SearchMonkey applications, which will soon be followed by support for RDFa. SearchMonkey applications can make direct use of the eRDF data by choosing the com.yahoo.rdf.erdf data source, while RDFa data will appear under com.yahoo.rdf.rdfa. Nothing changes in the way applications are created: as SearchMonkey applications have already been built on a triple-based model, the same applications can work on both microformat, eRDF or RDFa data.

Very cool. Good news for microformats, good news for RDF. Now to find which spam-trap my SearchMonkey account info got lost in…

Map-reduce-merge and Hadoop/Hbase RDF

 Just found this interesting presentation,

Map-Reduce-Merge:  Simpli?ed Relational  Data Processing on  Large Clusters
by Hung-chih Yang, Ali Dasdan Ruey-Lung Hsiao, D. Stott Parker; as presented by Nate Rober  (PDF)

Excerpts:

Extending MapReduce
1. Change to reduce phase
2. Merge phase
3. Additional user-de?nable operations
a. partition selector
b. processor
c. merger
d. con?gurable iterators

Implementing Relational Algebra Operations
1. Projection
2. Aggregation
3. Selection
4. Set Operations: Union, Intersection, Difference
5. Cartesian Product
6. Rename
7. Join

[for more detail see full slides]

Conclusion
MapReduce & GFS represent a paradigm shift in data processing: use a simpli?ed interface instead of overly general DBMS.
Map-Reduce-Merge adds the ability to execute arbitrary relational algebra queries.
Next steps: develop SQL-like interface and  a query optimizer.

Research paper: Map-reduce-merge: simplified relational data processing on large clusters (PDF for ACM people)

Linked from HRDF page in the Hadoop wiki, where there appears to be a proposal brewing to build an RDF store on top of the Hadoop/Hbase infrastructure.

Nearby: LargeTripleStores in ESW wiki

Not entirely unrelated: Google Social Graph API  (which parsers FOAF/RDF from ‘The Web’ but discards all but the social graph parts currently)

Geographic Queries on Google App Engine

Much cleverness:

In this way, I was able to put together a geographic bounding box query, on top of Google App Engine, using a Geohash-like algorithm as a storage format, and use that query to power a FeatureServer Demo App Engine application, doing geographic queries of non-point features on top of App Engine/BigTable. Simply create a Geoindex object of the bounding box of your feature, and then use lower-left/upper-right points as bounds for your Geohash when querying.

Bruce Schneier: Our Data, Ourselves

Via Libby; Bruce Schneier on data:

In the information age, we all have a data shadow.

We leave data everywhere we go. It’s not just our bank accounts and stock portfolios, or our itemized bills, listing every credit card purchase and telephone call we make. It’s automatic road-toll collection systems, supermarket affinity cards, ATMs and so on.

It’s also our lives. Our love letters and friendly chat. Our personal e-mails and SMS messages. Our business plans, strategies and offhand conversations. Our political leanings and positions. And this is just the data we interact with. We all have shadow selves living in the data banks of hundreds of corporations’ information brokers — information about us that is both surprisingly personal and uncannily complete — except for the errors that you can neither see nor correct.

What happens to our data happens to ourselves.

This shadow self doesn’t just sit there: It’s constantly touched. It’s examined and judged. When we apply for a bank loan, it’s our data that determines whether or not we get it. When we try to board an airplane, it’s our data that determines how thoroughly we get searched — or whether we get to board at all. If the government wants to investigate us, they’re more likely to go through our data than they are to search our homes; for a lot of that data, they don’t even need a warrant.

Who controls our data controls our lives. [...]

Increasingly, we’re going to be seeing this data flow through protocols like OAuth. SemWeb people should get their heads around how this is likely to work. It’s rather likely we’ll see SPARQL data stores with non-public personal data flowing through them; what worries me is that there’s not yet any data management discipline on top of this that’ll help us keep track of who is allowed to see what, and which graphs should be deleted or refreshed at which times.

I recently transcribed some notes from a Robert Scoble post about Facebook and data portability into the FOAF wiki. In it, Scoble reported some comments from Dave Morin of Facebook, regardling data flow. Excerpts:

For instance, what if a user wants to delete his or her info off of Facebook. Today that’s possible. But what about in a really data portable world? After all, in such a world Facebook might have sprayed your email and other data to other social networks. What if those other social networks don’t want to delete your data after you asked Facebook to?

Another case: you want your closest Facebook friends to know your birthday, but not everyone else. How do you make your social network data portable, but make sure that your privacy is secured?

Another case? Which of your data is yours? Which belongs to your friends? And, which belongs to the social network itself? For instance, we can say that my photos that I put on Facebook are mine and that they should also be shared with, say, Flickr or SmugMug, right? How about the comments under those photos? The tags? The privacy data that was entered about them? The voting data? And other stuff that other users might have put onto those photos? Is all of that stuff supposed to be portable? (I’d argue no, cause how would a comment left by a Facebook user on Facebook be good on Flickr?) So, if you argue no, where is the line? And, even if we can all agree on where the line is, how do we get both Facebook and Flickr to build the APIs needed to make that happen?

I’d like to see SPARQL stores that can police their data access behaviour, with clarity for each data graph in the store about the contexts in which that data can be re-exposed, and the schedule by which the data should be refreshed or purged. Making it easy for data to flow is only half the problem…

WHY MIGHT CONNECTING WITH ZANDER JULES BE A GOOD IDEA?

Or: towards evidence-based ‘add a contact’ filtering…

This just in from LinkedIn:

Have a question? Zander Jules’s network will probably have an answer
You can use LinkedIn Answers to distribute your professional questions to Zander Jules and your extended network. You can get high-quality answers from experienced professionals.

Zander Jules requested to add you as a connection on LinkedIn:

Dan,

Dear
My name is Zander Jules a Banker and accountant with Bank Atlantique Cote Ivoire.I contacting u for a business transfer of a large sum of money from a dormant account. Though I know that a transaction of this magnitude will make any one apprehensive,
but I am assuring u all will be well at the end of the day.I am the personal accounts manager to Engr Frank Thompson, a National of ur country, who used to work with an oil servicing company here in Cote Ivoire. My client, his wife & their 3 children were involved in the ill fated Kenya Airways crash in the coasts of Abidjan in January 2000 in which all passengers on board died. Since then I have made several inquiries to ur embassy to locate any of my clients extended relatives but has been unsuccessful.After several attempts, I decided to trace his last name via internet,to see if I could locate any member of his
family hence I contacted u.Of particular interest is a huge deposit with our bank in our country,where the deceased has an account valued at about $16 million USD.They have issued me notice to provide the next of kin or our bank will declare the account unservisable and thereby send the funds to the bank treasury.Since I have been unsuccessful in locating the relatives for past 7 yrs now, I will seek ur consent to present you as the next of kin of the deceased since u have the same last names, so that the proceeds of this account valued at $16million USD can be paid to u and then u and I can share the money.All I require is your honest cooperation to enable us see this deal through. I guarantee that this will be executed under all legitimate arrangement that will protect you from any breach of the law. In your reply mail, I want you to give me your full names, address, D.O.B, tel& fax #.If you can handle this with me, reach me for more details.

Thanking u for ur coperation.
Regards,

I’m suprised we’ve not seen more of this, and sooner. Youtube contacts are pretty spammy, and twitter have also suffered. The other networks are relatively OK so far. But I don’t think they’re anything like as robust as they’ll need to get, particularly since a faked contact can get privileged access to personal details. Definitely an arms race…