Visual SPARQL query tools

Quick links – thinking about tools that allow graphical SPARQL query authoring…

OpenLink Virtuoso: InteractiveSparqlQueryBuilder (in HTML/CSS/.js). Pictured below; extensive documentation and screenshots linked from their main page.

…an ancestor of which was Damian Steer’s RDFAuthor tool for MacOSX, which could generate Squish (a SPARQL precursor) and query services over the ‘array of hashtables’ SOAP-for-rdf-query non spec that Libby Miller and I had implementations of. From the RDFAuthor tutorial:

The old Maryland BINPIQ SHOE knowledgebase query applet is the grandaddy of them all. Sadly I don’t have any screenshots and the applet itself seems to be coderotted. [...] Ah, but here I find an email I wrote about it 8 years ago(!), which has screenshots:

SemanticSoft from Moldova also have some visual SPARQL UI:

No real conclusion here. I just found myself looking around some of these links, and thought I’d share them. I’m sure there’s a lot more related work out there (eg. NIGHTLIGHT from folk at Southampton Uni), and that the rise of fancy HTML-based UIs and JSON for data access makes for an ever-more interesting environment for zero-install graphical query tools.

One thing I remember about the old Maryland applets: as their representational language became more expressive (moving from binary to n-ary), the graphical query UI became somewhat less intuitive. Now since SPARQL itself adds some concepts not in the underlying target language (ie. RDF doesn’t have named graphs, optionals etc), the ability to make a graphical query UI that exploits the “it’s just an RDF graph with bits labelled as missing” (per Guha’s original proposal) perhaps gets a bit strained. In particular, how might named graphs best be represented in visual editors?

Foundation Nation: new orgs for Infocards, Symbian

Via the [IP] list, I read that the Information Card Foundation has launched.

Information Cards are the new way to control your personal data and identity on the web.

The Information Card Foundation is a group of thoughtful designers, architects, and companies who want to make the digital world easier for you by building better products that help you get control of your personal information.

From their blog, where Charles Andres offers a historical account of where they fit in:

And by early 2007, four tribes in the newly discovered continent of user-centric identity had united under the banner of OpenID 2.0 and brought the liberating power of user-controlled identifiers to the digital identity pioneers. The OpenID community formed the OpenID Foundation to serve as a trustee for intellectual property and a host for community activity and by early 2008 had attracted Microsoft, Yahoo, Google, VeriSign, and IBM to join as corporate directors.

Inspired by these efforts, the growing Information Card community realized that to bring this metaphor to full fruition required taking the same step—coming together into a common organization that would unify our efforts to create an interoperable identity layer. From one perspective this could be looked at as completing the “third leg of the stool” of what is often called the Venn of Identity (SAML, OpenID, and Information Cards). But from another perspective, you can see it as one of the logical steps needed towards the cooperative convergence among identity systems and protocols that will be necessary to reach a ubiquitous Internet identity layer—the layer that completes the hat trick.

I’m curious to see what comes of this. There’s some big backing, and I’ve heard good things about Infocard from folks in the know. From an SemWebby perspective, this stuff just gives us another way to figure out the provenance of claim graphs representable in RDF, queryable in SPARQL. And presumably some more core schemas to play with…

Meanwhile in the mobile scene, a Symbian Foundation has been unveiled:

Industry leaders to unify the Symbian mobile platform and set it free
Foundation to be established to provide royalty-free open platform and accelerate innovation

The demand for converged mobile devices is accelerating. By 2010 we expect four billion people to have joined the global mobile conversation. For many of these people, their mobile will be their first Internet experience, not just their first camera, music player or phone.

Open software is the basic building block for delivering this future.

With this in mind, industry leaders are coming together to establish Symbian Foundation, to bring to life a shared vision and to create the most proven, open and complete mobile software platform – available for free. To achieve this, the foundation will unify Symbian, S60, UIQ and
MOAP(S) software to create an unparalleled open software platform for converged mobile devices, enabling the whole mobile ecosystem to accelerate innovation.

The foundation is expected to start operating during the first half of 2009. Membership of the foundation will be open to all organizations, for a low annual membership fee of US $1,500.

I’ll save my pennies for an iPhone. Everybody’s open nowadays, I guess that’s good…

Map-reduce-merge and Hadoop/Hbase RDF

 Just found this interesting presentation,

Map-Reduce-Merge:  Simpli?ed Relational  Data Processing on  Large Clusters
by Hung-chih Yang, Ali Dasdan Ruey-Lung Hsiao, D. Stott Parker; as presented by Nate Rober  (PDF)

Excerpts:

Extending MapReduce
1. Change to reduce phase
2. Merge phase
3. Additional user-de?nable operations
a. partition selector
b. processor
c. merger
d. con?gurable iterators

Implementing Relational Algebra Operations
1. Projection
2. Aggregation
3. Selection
4. Set Operations: Union, Intersection, Difference
5. Cartesian Product
6. Rename
7. Join

[for more detail see full slides]

Conclusion
MapReduce & GFS represent a paradigm shift in data processing: use a simpli?ed interface instead of overly general DBMS.
Map-Reduce-Merge adds the ability to execute arbitrary relational algebra queries.
Next steps: develop SQL-like interface and  a query optimizer.

Research paper: Map-reduce-merge: simplified relational data processing on large clusters (PDF for ACM people)

Linked from HRDF page in the Hadoop wiki, where there appears to be a proposal brewing to build an RDF store on top of the Hadoop/Hbase infrastructure.

Nearby: LargeTripleStores in ESW wiki

Not entirely unrelated: Google Social Graph API  (which parsers FOAF/RDF from ‘The Web’ but discards all but the social graph parts currently)

Bruce Schneier: Our Data, Ourselves

Via Libby; Bruce Schneier on data:

In the information age, we all have a data shadow.

We leave data everywhere we go. It’s not just our bank accounts and stock portfolios, or our itemized bills, listing every credit card purchase and telephone call we make. It’s automatic road-toll collection systems, supermarket affinity cards, ATMs and so on.

It’s also our lives. Our love letters and friendly chat. Our personal e-mails and SMS messages. Our business plans, strategies and offhand conversations. Our political leanings and positions. And this is just the data we interact with. We all have shadow selves living in the data banks of hundreds of corporations’ information brokers — information about us that is both surprisingly personal and uncannily complete — except for the errors that you can neither see nor correct.

What happens to our data happens to ourselves.

This shadow self doesn’t just sit there: It’s constantly touched. It’s examined and judged. When we apply for a bank loan, it’s our data that determines whether or not we get it. When we try to board an airplane, it’s our data that determines how thoroughly we get searched — or whether we get to board at all. If the government wants to investigate us, they’re more likely to go through our data than they are to search our homes; for a lot of that data, they don’t even need a warrant.

Who controls our data controls our lives. [...]

Increasingly, we’re going to be seeing this data flow through protocols like OAuth. SemWeb people should get their heads around how this is likely to work. It’s rather likely we’ll see SPARQL data stores with non-public personal data flowing through them; what worries me is that there’s not yet any data management discipline on top of this that’ll help us keep track of who is allowed to see what, and which graphs should be deleted or refreshed at which times.

I recently transcribed some notes from a Robert Scoble post about Facebook and data portability into the FOAF wiki. In it, Scoble reported some comments from Dave Morin of Facebook, regardling data flow. Excerpts:

For instance, what if a user wants to delete his or her info off of Facebook. Today that’s possible. But what about in a really data portable world? After all, in such a world Facebook might have sprayed your email and other data to other social networks. What if those other social networks don’t want to delete your data after you asked Facebook to?

Another case: you want your closest Facebook friends to know your birthday, but not everyone else. How do you make your social network data portable, but make sure that your privacy is secured?

Another case? Which of your data is yours? Which belongs to your friends? And, which belongs to the social network itself? For instance, we can say that my photos that I put on Facebook are mine and that they should also be shared with, say, Flickr or SmugMug, right? How about the comments under those photos? The tags? The privacy data that was entered about them? The voting data? And other stuff that other users might have put onto those photos? Is all of that stuff supposed to be portable? (I’d argue no, cause how would a comment left by a Facebook user on Facebook be good on Flickr?) So, if you argue no, where is the line? And, even if we can all agree on where the line is, how do we get both Facebook and Flickr to build the APIs needed to make that happen?

I’d like to see SPARQL stores that can police their data access behaviour, with clarity for each data graph in the store about the contexts in which that data can be re-exposed, and the schedule by which the data should be refreshed or purged. Making it easy for data to flow is only half the problem…

Open CellID databases

Via momolondon list: opencellid.org data dumps

The readme.txt file describes the tabular data structure (split into a cells, and a measures file).

I think the cells data is the one most folk will be interested in re-using. Table headings are:

# id,lat,lon,mcc,mnc,lac,cellid,range,nbSamples,created_at,updated_at
For example:
7,44.8802,-0.526878,208,10,18122,32951790,0,2,2008-03-31 15:22:22,2008-04-07 08:57:33

This could be RDFized using something similar to the (802.11-centric) Wireless Ontology. Perhaps even using lqraps

MusicBrainz SQL-to-RDF D2RQ mapping from Yves Raimond

More great music-related stuff from Yves Raimond. He’s just announced (on the Music ontology list) a D2RQ mapping of the MusicBrainz SQL into RDF and SPARQL. There’s a running instance of it on his site. The N3 mapping files are on the  motools sourceforge site.

Yves writes…

Added to the things that are available within the Zitgist mapping:

  •  SPARQL end point
  •  Support for tags
  • Supports a couple of advanced relationships (still working my way  through it, though)
  • Instrument taxonomy directly generated from the db, and related to performance events
  • Support for orchestras

This is pretty cool, since the original MusicBrainz RDF is rather dated (if it’s even still available). The new representations are much richer and probably also easier to maintain.

Nearby in the Web: discussion of RDF views into MySpace; and the RDB2RDF Incubator Group at W3C discussions are getting started (this group is looking at technology such as D2RQ which map non-RDF databases into our strange parallel world…)

Language Expertise in FOAF: Speaks, Reads, Writes revisited

Speaks, reads, writes
Stephanie Booth asks:

 I vaguely remember somebody telling me about some emerging “standard” (too big a word) for encoding language skills. Or was it a dream?

That would’ve been me, showing markup from the FOAFX beta from Paola Di Maio and friends, which explores the extension of FOAF with expertise information. This is part of the ExpertFinder discussions alongside the FOAF project (see also wiki, mailing list). FOAFX and the ExpertFinder community are looking at ways of extending FOAF to better describe people’s expertise; both self-described and externally accredited. This is at once a fascinating, important and terrifyingly hard to scope problem area. It touches on longstanding “Web of trust” themes, on educational metadata standards, and on the various ways of characterising topics or domains of expertise. In other words, in any such problem space, there will always be multiple ways of “doing it”. For example, here is how the Advogato community site characterises my expertise regarding opensource software: foaf.rdf (I’m in the Journeyer group, apparently; some weighted average of people’s judgements about me).

One thing FOAFX attempts is to describe language skills. For this, they extend the idiom proposed by Inkel some years ago in his “Speaks, Reads, Writesschema. In the original (which is Spanish, but see also English version), the classification was effectively binary: one could either speak, read, or write a language; or one couldn’t. You could also say you ‘mastered’ it, meaning that you could speak, read and write it. In FOAFX, this is handled differently: we get a 1-5 score. I like this direction, as it allows me to express that I have some basic capability in Spanish, without appearing to boast that I’m anything like “fluent”. But … am I a “1” or a “2”? Should I poll my long-suffering Spanish-speaking friends? Take an online quiz? Introducing numbers gives the impression of mathematical precision, but in skill characterisation this is notoriously hard (and not without controversy).

My take here is that there’s no right thing to do. So progress and experimentation are to be celebrated, even if the solution isn’t perfect. On language skills, I’d love some way also to allow people to say “I’m learning language X”, or “I’m happy to help you practice your English/Spanish/Japanese/etc.”. Who knows, with more such information available, online Social Network sites could even prove useful…

Here btw is the current RDF markup generated by FOAFX:

<foaf:Person rdf:ID="me">
<foaf:mbox_sha1>6e80d02de4cb3376605a34976e31188bb16180d0</foaf:mbox_sha1>
<foaf:givenname>Dan</foaf:givenname>
<foaf:family_name>Brickley</foaf:family_name>
<foaf:homepage rdf:resource="http://danbri.org/" />
<foaf:weblog rdf:resource="http://danbri.org/words/" />
<foaf:depiction rdf:resource="http://danbri.org/images/me.jpg" />
<foaf:jabberID>danbrickley@gmail.com</foaf:jabberID>
<foafx:language>
<foafx:Language>
<foafx:name>English</foafx:name>
<foafx:speaking>5</foafx:speaking>
<foafx:reading>5</foafx:reading>
<foafx:writing>5</foafx:writing>
</foafx:Language>
</foafx:language>
<foafx:language>
<foafx:Language>
<foafx:name>Spanish</foafx:name>
<foafx:speaking>1</foafx:speaking>
<foafx:reading>1</foafx:reading>
<foafx:writing>1</foafx:writing>
</foafx:Language>
</foafx:language>
<foafx:expertise>
<foafx:Expertise>
<foafx:field>::</foafx:field>
<foafx:fluency>
<foafx:Language>
<foafx:name>English</foafx:name>
</foafx:Language>
</foafx:fluency>
</foafx:Expertise>
</foafx:expertise>
</foaf:Person>

The apparent redundancy in the markup (expertise, Expertise) is due to RDF’s so-called “striped” syntax. I have an old introduction to this idea; in short, RDF lets you define properties of things, and categories of thing. The FOAFX design effectively says, “there is a property of a person called “expertise” which relates that person to another thing, an “Expertise”, which itself has properties like “fluency”.

The FOAFX design tries to navigate between generic and specific, by including language-oriented markup as well as more generic skill descriptions. I think this is probably the right way to go. There are many things that we can say about human languages that don’t apply to other areas of expertise (eg. opensource software development). And there many things we can say about expertise in general (like expressions of willingness to learn, to teach, … indications of formal qualification) which are cross domain. Similarly, there are many things we might say in markup about opensource projects (picking up on my Advogato mention earlier) which have nothing to do with human languages. Yet both human language expertise and opensource skills are things we might want to express via FOAF extensions. For example, the DOAP project already lets us describe opensource projects and our roles in them.

The Semantic Web design challenge here is to provide a melting pot for all these different kinds of data, one that allows each specific problem to be solved adequately in a reasonable time-frame, without precluding the possibility for richer integration at a later date. I have a hunch that the Advogato design, which expresses skills in terms of group membership, could be a way to go here.

This is related to the idea of expressing group-membership criteria through writing SPARQL queries. For example, we can talk about the Group of people who work for W3C. Or we can talk about the Group of people who work for W3C as listed authoritatively on the W3C site. Both rules are expressible as queries; the latter a query that says things about the source of claims, as well as about what those claims assert. This notion of a group defined by a query allows for both flavours; the definition could include criteria relating to the provenance (ie. source) of the claims, but it needn’t. So we could express the idea of people who speak Spanish, or the idea of people who speak french according to having passed some particular test, or being certified by some agency. In either case, the unifying notion is “person X is in group Y”, where Y is a group identified by some URL. What I like about this model, is it allows for a very loose division of labour: skill-related markup is necessarily going to be widely varied. Yet the idea that such scattered evidence boils down to people falling into definable groups, gives some overall cohesion to this diversity. I could for example run a query asking for people with (foafx idiom) “Spanish skills of 2 or more”. I could add a constraint that the person be at least a “Journeyer” regarding their opensource skills, according to Advogato, or perhaps mix in data expressed in DOAP terms regarding their roles in opensource project work. These skills effectively define groups (loosly, sets) of people, and skill search can be pictured in venn diagram terms. Of course all this depends on getting enough data out there for any such queries to be worthwhile. Maybe a Facebook app that re-published data outside of Hotel Facebook would be a way of bootstrapping things here?

Lqraps! Reverse SPARQL

(update: files are now in svn; updated the link here)

I’ve just published a quick writeup (with running toy Ruby code) of a “reverse SPARQL” utility called lqraps, a tool for re-constructing RDF from tabular data.

The idea is that such a tool is passed a tab-separated (eventually, CSV etc.) file, such as might conventionally be loaded into Access, spreadsheets etc. The tool looks for a few lines of special annotation in “#”-comments at the top of the file, and uses these to (re)generate RDF. My simple implementation does this with text-munging of SPARQL construct notation into Turtle. As the name suggests, this could be done against the result tables we get from doing SPARQL queries; however it is more generally applicable to tabular data files.

A richer version might be able to output RDF/XML, but that would require use of libraries for SPARQL and RDF. This is of course close in spirit to D2RQ and SquirrelRDF and other relational/RDF mapping efforts, but the focus here is on something simple enough that we could include it in the top section of actual files.

The correct pronunciation btw is “el craps”. As in the card game