Semantic Web Interest Group f2f meeting proposal (during week of Oct 20-24 2008)

I’ve started a thread on the Semantic Web Interest Group list, proposing that we meet during W3C’s Technical Plenary week this coming October. If you like the idea and plan to attend, please jump in and say so. If you have other ideas, please let us know them!

In the past we have handled this fairly informally, mixing short talks, themed discussion, inter-WG liaison, and lightning talks. This year I would like to theme any meeting around the practicalities of mainstream rollout: obstacles, issues, opportunities that arise as these technologies find their way into wider use. But this is a broad topic. What would you all like to discuss?

Comments and suggestions here or on the list please; although of course you can always ping me privately if needed.

Hope to see you in October…

Opening and closing like flowers (social platform roundupathon)

Closing some tabs…

Stephen Fry writing on ‘social network’ sites back in January (also in the Guardian):

…what an irony! For what is this much-trumpeted social networking but an escape back into that world of the closed online service of 15 or 20 years ago? Is it part of some deep human instinct that we take an organism as open and wild and free as the internet, and wish then to divide it into citadels, into closed-border republics and independent city states? The systole and diastole of history has us opening and closing like a flower: escaping our fortresses and enclosures into the open fields, and then building hedges, villages and cities in which to imprison ourselves again before repeating the process once more. The internet seems to be following this pattern.

How does this help us predict the Next Big Thing? That’s what everyone wants to know, if only because they want to make heaps of money from it. In 1999 Douglas Adams said: “Computer people are the last to guess what’s coming next. I mean, come on, they’re so astonished by the fact that the year 1999 is going to be followed by the year 2000 that it’s costing us billions to prepare for it.”

But let the rise of social networking alert you to the possibility that, even in the futuristic world of the net, the next big thing might just be a return to a made-over old thing.

McSweenys:

Dear Mr. Zuckerberg,

After checking many of the profiles on your website, I feel it is my duty to inform you that there are some serious errors present. [...]

Lest-we-forget. AOL search log privacy goofup from 2006:

No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men” to “dog that urinates on everything.”

And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.”

It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. “Those are my searches,” she said, after a reporter read part of the list to her.

Time magazine punditising on iGoogle, Facebook and OpenSocial:

Google, which makes its money on a free and open Web, was not happy with the Facebook platform. That’s because what happens on Facebook stays on Facebook. Google would much prefer that you come out and play on its platform — the wide-open Web. Don’t stay behind Facebook’s closed doors! Hie thee to the Web and start searching for things. That’s how Google makes its money.

So, last fall, Google rallied all the other major social networks (MySpace, Bebo, Hi5 and so on) and announced a new initiative called OpenSocial. OpenSocial wants to be like Facebook’s platform, only much bigger: Widget makers can write applications for it and they can run anywhere — on MySpace, Bebo and Google’s own social network, Orkut, which is very big in Brazil.

Google’s platform could actually dwarf Facebook — if it ever gets off the ground.

Meanwhile on the widget and webapp security front, we have “BBC exposes Facebook flaw” (information about your buddies is accessible to apps you install; information about you is accessible to apps they install). Also see Thomas Roessler’s comments to my Nokiana post for links to a couple of great presentations he made on widget security. This includes a big oopsie with the Google Mail widget for MacOSX. Over in Ars Technica we learn that KDE 4.1 alpha 1 now has improved widget powers, including “preliminary support for SuperKaramba and Mac OS X Dashboard widgets“. Wonder if I can read my Gmail there…

As Stephen Fry says,  these things are “opening and closing like a flower”. The big hosted social sites have a certain oversimplifying retardedness about them. But the ability for code to go visit data (the widget/gadget model), is I think as valid as the opendata model where data flows around to visit code. I am optimistic that good things will come out of this ferment.

A few weeks ago I had the pleasure of meeting several of the Google OpenSocial crew in London. They took my grumbling about accessibility issues pretty well, and I hope to continue that conversation. Industry politics and punditry aside, I’m impressed with their professionalism and with the tie-in to an opensource implementation through Apache’s ShinDig project. The OpenSocial specs list is open to the public, where Cassie has just announced that “all 0.8 opensocial and gadgets spec changes have been resolved” (after a heroic slog through the issue list). I’m barely tracking the detail of discussion there, things are moving fast. There’s now a proposed REST API, for example; and I learned in London about plans for a formatting/templating system, which might be one mechanism for getting FOAF/RDF out of OpenSocial containers.

If OpenSocial continues to grow and gather opensource mindshare, it’s possible Facebook will throw some chunks of their platform over the wall (ie. “do an Adobe“). And it’ll probably be left to W3C to clean up the ensuring mess and fragmentation, but I guess that’s what they’re there for. Meanwhile there’s plenty yet to be figured out, … I think we’re in a pre-standards experimentation phase, regardless of how stable or mature we’re told these platforms are.

The fundamental tension here is that we want open data, open platforms, … for data and code to flow freely, but to protect the privacy, lives and blushes of those it describes. A tricky balance. Don’t let anyone tell you it’s easy, that we’ve got it figured out, or that all we need to do is “tear down the walls”.

Opening and closing like flowers…

Open CellID databases

Via momolondon list: opencellid.org data dumps

The readme.txt file describes the tabular data structure (split into a cells, and a measures file).

I think the cells data is the one most folk will be interested in re-using. Table headings are:

# id,lat,lon,mcc,mnc,lac,cellid,range,nbSamples,created_at,updated_at
For example:
7,44.8802,-0.526878,208,10,18122,32951790,0,2,2008-03-31 15:22:22,2008-04-07 08:57:33

This could be RDFized using something similar to the (802.11-centric) Wireless Ontology. Perhaps even using lqraps

MusicBrainz SQL-to-RDF D2RQ mapping from Yves Raimond

More great music-related stuff from Yves Raimond. He’s just announced (on the Music ontology list) a D2RQ mapping of the MusicBrainz SQL into RDF and SPARQL. There’s a running instance of it on his site. The N3 mapping files are on the  motools sourceforge site.

Yves writes…

Added to the things that are available within the Zitgist mapping:

  •  SPARQL end point
  •  Support for tags
  • Supports a couple of advanced relationships (still working my way  through it, though)
  • Instrument taxonomy directly generated from the db, and related to performance events
  • Support for orchestras

This is pretty cool, since the original MusicBrainz RDF is rather dated (if it’s even still available). The new representations are much richer and probably also easier to maintain.

Nearby in the Web: discussion of RDF views into MySpace; and the RDB2RDF Incubator Group at W3C discussions are getting started (this group is looking at technology such as D2RQ which map non-RDF databases into our strange parallel world…)

IRC RDF logs and foaf:chatEvent

For many years, the 24×7 IRC chatrooms #swig and #foaf (and previously #rdfig) have been logged to HTML and RDF by Dave Beckett‘s IRC logging code. The RDF idiom used here dates from 2001 or so, and looks like this:

 <foaf:ChatChannel rdf:about=”irc://irc.freenode.net/swig”>
<foaf:chatEventList>
<rdf:Seq>
<rdf:li>
<foaf:chatEvent rdf:ID=”T00-05-11″>
<dc:date>2008-04-03T00:05:11Z</dc:date>

<dc:description>
danbri: do you know of good scutter data for playing with codepiction? would be fun to get back into that (esp. parallelization).
</dc:description>
<dc:creator>
<wn:Person foaf:nick=”kasei”/>
</dc:creator>
</foaf:chatEvent>

Dave has offered to make a one-time search-and-replace fix to the old logs, if we want to agree a new idiom. The main driver for this is that the old logs have a class for ‘chat event’  but use an initial lowercase later for the term, ie. ‘chatEvent’ instead of ‘ChatEvent’. None of these properties are yet documented in the FOAF schema, and since the data for this term is highly concentrated, and its maintainer has offered to change it, I suggest we document these FOAF terms as used, with the fix of having a class foaf:ChatEvent instead of foaf:chatEvent.

Almost all  RDF vocabularies stick to the rule of using initial lowercase letters for properties, and initial capitals for classes. This makes RDF easy to read for those who know this trick; consequently a lowercase class name can be very confusing, for experts and beginners alike. I’d therefore rather not introduce one into FOAF if it can be avoided. But I would like to document the IRC logging data format, and continue to use FOAF for it.

The markup also uses the wordnet:Person class from my old Wordnet 1.6 namespace (currently offline, but will be repaired eventually, albeit with a later Wordnet installation). This follows early FOAF practice, where we minimised terms and used Wordnet a lot more. I suggest Dave updates this to use foaf:Person instead. The dc:creator property used here might also use the new DC Terms notion of ‘creator’, since that flavour of ‘creator’ has a range of DC Terms “Agent” that is a more modern and FOAF-compatible idiom. This btw is a candidate for using instead of foaf:maker, which I introduced with some regret only because the old dc:creator property had such weak usage rules. But then if we change the DC namespace used for ‘creator’ here, should we change the other ones too? Hmm hmm hmm etc.

The main known consumer of this IRC log data is XSLT created and maintained by the ubiquitous Dave. If you know of other downstream consumer apps please let us know by commenting in the blog or by email.

While there are other ways in which such chatlogs might be represented in RDF (eg. SIOC, or using something linked to XMPP), let’s keep the scope of this change quite small. The goal is to make a minimal fix to the current archived instance data  and associated tools, so that the FOAF vocabulary it uses can be documented.

Comments welcomed…

Language Expertise in FOAF: Speaks, Reads, Writes revisited

Speaks, reads, writes
Stephanie Booth asks:

 I vaguely remember somebody telling me about some emerging “standard” (too big a word) for encoding language skills. Or was it a dream?

That would’ve been me, showing markup from the FOAFX beta from Paola Di Maio and friends, which explores the extension of FOAF with expertise information. This is part of the ExpertFinder discussions alongside the FOAF project (see also wiki, mailing list). FOAFX and the ExpertFinder community are looking at ways of extending FOAF to better describe people’s expertise; both self-described and externally accredited. This is at once a fascinating, important and terrifyingly hard to scope problem area. It touches on longstanding “Web of trust” themes, on educational metadata standards, and on the various ways of characterising topics or domains of expertise. In other words, in any such problem space, there will always be multiple ways of “doing it”. For example, here is how the Advogato community site characterises my expertise regarding opensource software: foaf.rdf (I’m in the Journeyer group, apparently; some weighted average of people’s judgements about me).

One thing FOAFX attempts is to describe language skills. For this, they extend the idiom proposed by Inkel some years ago in his “Speaks, Reads, Writesschema. In the original (which is Spanish, but see also English version), the classification was effectively binary: one could either speak, read, or write a language; or one couldn’t. You could also say you ‘mastered’ it, meaning that you could speak, read and write it. In FOAFX, this is handled differently: we get a 1-5 score. I like this direction, as it allows me to express that I have some basic capability in Spanish, without appearing to boast that I’m anything like “fluent”. But … am I a “1” or a “2”? Should I poll my long-suffering Spanish-speaking friends? Take an online quiz? Introducing numbers gives the impression of mathematical precision, but in skill characterisation this is notoriously hard (and not without controversy).

My take here is that there’s no right thing to do. So progress and experimentation are to be celebrated, even if the solution isn’t perfect. On language skills, I’d love some way also to allow people to say “I’m learning language X”, or “I’m happy to help you practice your English/Spanish/Japanese/etc.”. Who knows, with more such information available, online Social Network sites could even prove useful…

Here btw is the current RDF markup generated by FOAFX:

<foaf:Person rdf:ID="me">
<foaf:mbox_sha1>6e80d02de4cb3376605a34976e31188bb16180d0</foaf:mbox_sha1>
<foaf:givenname>Dan</foaf:givenname>
<foaf:family_name>Brickley</foaf:family_name>
<foaf:homepage rdf:resource="http://danbri.org/" />
<foaf:weblog rdf:resource="http://danbri.org/words/" />
<foaf:depiction rdf:resource="http://danbri.org/images/me.jpg" />
<foaf:jabberID>danbrickley@gmail.com</foaf:jabberID>
<foafx:language>
<foafx:Language>
<foafx:name>English</foafx:name>
<foafx:speaking>5</foafx:speaking>
<foafx:reading>5</foafx:reading>
<foafx:writing>5</foafx:writing>
</foafx:Language>
</foafx:language>
<foafx:language>
<foafx:Language>
<foafx:name>Spanish</foafx:name>
<foafx:speaking>1</foafx:speaking>
<foafx:reading>1</foafx:reading>
<foafx:writing>1</foafx:writing>
</foafx:Language>
</foafx:language>
<foafx:expertise>
<foafx:Expertise>
<foafx:field>::</foafx:field>
<foafx:fluency>
<foafx:Language>
<foafx:name>English</foafx:name>
</foafx:Language>
</foafx:fluency>
</foafx:Expertise>
</foafx:expertise>
</foaf:Person>

The apparent redundancy in the markup (expertise, Expertise) is due to RDF’s so-called “striped” syntax. I have an old introduction to this idea; in short, RDF lets you define properties of things, and categories of thing. The FOAFX design effectively says, “there is a property of a person called “expertise” which relates that person to another thing, an “Expertise”, which itself has properties like “fluency”.

The FOAFX design tries to navigate between generic and specific, by including language-oriented markup as well as more generic skill descriptions. I think this is probably the right way to go. There are many things that we can say about human languages that don’t apply to other areas of expertise (eg. opensource software development). And there many things we can say about expertise in general (like expressions of willingness to learn, to teach, … indications of formal qualification) which are cross domain. Similarly, there are many things we might say in markup about opensource projects (picking up on my Advogato mention earlier) which have nothing to do with human languages. Yet both human language expertise and opensource skills are things we might want to express via FOAF extensions. For example, the DOAP project already lets us describe opensource projects and our roles in them.

The Semantic Web design challenge here is to provide a melting pot for all these different kinds of data, one that allows each specific problem to be solved adequately in a reasonable time-frame, without precluding the possibility for richer integration at a later date. I have a hunch that the Advogato design, which expresses skills in terms of group membership, could be a way to go here.

This is related to the idea of expressing group-membership criteria through writing SPARQL queries. For example, we can talk about the Group of people who work for W3C. Or we can talk about the Group of people who work for W3C as listed authoritatively on the W3C site. Both rules are expressible as queries; the latter a query that says things about the source of claims, as well as about what those claims assert. This notion of a group defined by a query allows for both flavours; the definition could include criteria relating to the provenance (ie. source) of the claims, but it needn’t. So we could express the idea of people who speak Spanish, or the idea of people who speak french according to having passed some particular test, or being certified by some agency. In either case, the unifying notion is “person X is in group Y”, where Y is a group identified by some URL. What I like about this model, is it allows for a very loose division of labour: skill-related markup is necessarily going to be widely varied. Yet the idea that such scattered evidence boils down to people falling into definable groups, gives some overall cohesion to this diversity. I could for example run a query asking for people with (foafx idiom) “Spanish skills of 2 or more”. I could add a constraint that the person be at least a “Journeyer” regarding their opensource skills, according to Advogato, or perhaps mix in data expressed in DOAP terms regarding their roles in opensource project work. These skills effectively define groups (loosly, sets) of people, and skill search can be pictured in venn diagram terms. Of course all this depends on getting enough data out there for any such queries to be worthwhile. Maybe a Facebook app that re-published data outside of Hotel Facebook would be a way of bootstrapping things here?

“Stuff I’ve been thinking about” (SocialNetworkPortability WebCamp) – my slides

I’m in Cork, mainly for the excellent Social Network Portability event on Sunday, but am also staying through Blogtalk’08 which has been great. I’ve uploaded my slides from my talk (slideshare in Flash, included inline here, or a pdf). I have some rough speaking notes too,  maybe I’ll get those online. I have no idea how they relate to whatever actually came out of my mouth during the talk :) Apologies to those without PDF or Flash. I haven’t tried Keynote’s HTML output yet.

Basically much of what I was getting at in the talk, and my thoughts are only just congealing on this … is that the idea of a ‘claim’ is a useful bridge between Semantic Web and Social Networking concerns. Also that it helps us understand how technologies fit together. FOAF defines a dictionary of terms for making claims, as does xfn, hCard. RDF/XML, Microformats, RDFa, GRDDL define textual notations for publishing documents that encode claims, and SPARQL gives us a way of asking questions about the claims made in different documents.

Blood money

It’s in the Daily Mail, so it must be true:

Motorists will be targeted by a new generation of road cameras which work out how many people are in a car by measuring the amount of bodily fluid it contains.

The latest snooping device on the nation’s roads aims to penalise lone drivers who abuse car-sharing lanes, and is part of a Government effort to combat congestion at busy times.

The cameras work by sending an infrared beam through the windscreen of vehicles which detects the unique make-up of blood and water content in human skin.

Coincidentally enough, today’s hackery:

 danbri> add ds danbri http://danbri.org/words/sparqlpress/sparql

jamendo> store added

danbri> add template “FIND BLOOD DONORS” “select ?n ?bt where { ?x <http://xmlns.com/foaf/0.1/nick> ?n  . ?x <http://kota.s12.xrea.com/vocab/uranaibloodtype> ?bt . }”

danbri> FIND BLOOD DONORS

jamendo> “danbri” “A+”

This makes use of the Uranai FOAF extension for recording one’s blood type.

Trying xOperator via Yves’s installation; the above dialog conducted purely through Jabber IM. Nice to see XMPP+SPARQL explorations gaining traction. I really like the extra value-adding layers added by the xOperator text chat UI, and the fact that you can wire in HTTP-based SPARQL endpoints too. Similar functionality in IRC comes from Bengee’s sparqlbot (see intro doc). I think the latter has a slightly fancier templating system, but I haven’t really explored all that xOperator can do. I did get the ‘blood’ example working with sparqlbot in IRC too; it seems most people don’t have their bloodtype in their FOAF file. So much for finding handy donors ;)

Blood and XMPP aside, there’s a lot going on here technically. The fact that I can ask Yves’ installation of xOperator to attach to my SparqlPress installation’s database is interesting. At the moment everything in there is public, so ACL is easy. But we could imagine that picture changing. Perhaps I decide that healthcare-related info goes into a special set of named graphs. How could permissioning for this be handled so that Yves’s bot can still provide me with UI? OAuth is probably part of the answer here, although that means bouncing out into http/html UI to some extent at least. And there’s more to say on SPARQL ACL too. Much more, but not right now.

The other technically rich area is on these natural language templates. It is historically rare for public SQL endpoints to be made available, even read only. Some of the reasons for this (local obscure schemas, closed world logic) have gone away with SPARQL, but the biggest reason – ease of writing very expensive queries – is very much still present. Guha and McCool’s Alpiri/TAP work argued that we needed another lighter data interface; they proposed ‘GetData‘. The templates system we see in xOperator and sparqlbot (see also julie, wh4 and foafbot) suggests another take on this problem. Rather than expose all of my personal datastore to random ‘SELECT * WHERE { ?p ?s ?o }’ crawlers, these bots can provide a wrapping layer, allowing in only queries that are much more tightly constrained, as well as being more human-oriented. Hmm I’m suddenly reminded of Metalog; wonder if Massimo is still working on that area.

A nice step here might be if chat-oriented query templates could be shared between xOperator and sparqlbot. Wonder what would be needed to make that happen…

Friend lists on Facebook – “all lists are private”

Friend Lists are here.

 Now you can easily organize your friends into convenient lists for messaging, invites, and more. You can create whatever kinds of lists you want; all lists are private. Click “Make a New List” to get started.

Just noticed this message on Facebook as I logged in. It was rumoured a little while ago, and in general fits with my goals for FOAF, use of Group markup etc. I just expect that “all lists are private” isn’t the end of the privacy story here. Will friend list data be exposed to 3rd party app? There are lots of reasons it would be great to do so (and foaf:Group provides a notation for sharing this data with downstream standards-based apps). On the other hand, we don’t want to see any more “Find out what your friends really think of you” uglyness. Wonder which way they’ll jump.