Small databases, loosly joined

Over the last month or so, I’ve had a SPARQL store (just an ARC instance plus a loader script) running on the Amazon EC2 sandbox I set up for FOAF experiments. Yesterday I installed a fresh SparqlPress bundle into my own blog, which runs on another server. So how to get the data across? Since SparqlPress includes a scutter, which despite Urban Dictionary is just the FOAF term for RDF crawler, we can set it to crawl a Web of linked RDF. Here’s a SPARQL query on the former installation which sets up some pointers to each graph (scutters often follow rdfs:seeAlso, amongst other techniques). It’s interesting too because it summarises the data found in each graph, simply by listing properties.

CONSTRUCT {
<http://danbri.org/foaf.rdf#danbri> <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?g .
?g <http://purl.org/net/scutter/vocabItem> ?p .
}
WHERE {
GRAPH ?g { ?s ?p ?o }
}

The sc:vocabItem property is a work of fiction; I’ve not searched around to see if similar things already exist. The idea of linking together medium-sized repositories of metadata is an old one. The Harvest crawling/indexing system did this with RDM/SOIF. In a similar vein was WHOIS++, a now-obsolete directory system that some of us tried using for resource discovery in the late 90s – collections of Web site descriptions federated through having each server syndicate a summary of its content (record types, field names, and tokenized field values). I hadn’t really thought of it this way until I wrote this query, but SparqlPress could in some ways support a reimplementation of the Harvest and WHOIS++ design, if we have gatherer and broker roles attached to a network of blogs. All these ideas are of course close to the P2P scene too: little datasets exchanging summaries of their content to aid query routing. The main point here is that we can run a pretty simple SPARQL query, and get a summary of the contents of various named data graphs; the results of the query in turn give a summary of a database that has copies of these graphs.

Venn diagrams for Groups UI

Venn groups diagram

This is the result of feeding a relatively small list of groups and their members to the VennMaster java tool. I’ve been looking for swooshy automatic layout tools that might help with interactive visualisation of ‘social graph’ data where people can be clustered by their membership of various groups. I also wanted to explore the possibility of using such a tool as a way of authoring filters from raw evidence, using groups such as “have sent mail to”, “have accepted blog/wiki comment from”.

My gut reaction from this quick experiment is that the UI space is very easily overwhelmed. I used here just a quick hand-coded list of people, in fairly ad-hoc groups (cities, current-and-former workplaces etc.). Real data has more people and more groups. Still, I think there may be something worth investigating here. The venn tool I found was really designed for lifesci data, not people. If anyone knows of other possible software to try here, do let me know. To try this tool, simply run the Java app from the commandline, and use “File >> OpenList” on a file such as people.list.

One other thing I noticed in creating these ad-hoc groups (more or less ‘people tags’), is that representing what people have done felt intuitively as important as what they’re doing right now. For example, places people once lived or worked. This gives another axis of complexity that might need visualising. I’d like the underlying data to know who currently works/lives somewhere, versus “used to”, but in some views the two might appropriately be folded together. Tricky.

Graph URIs in SPARQL: Using UUIDs as named views

I’ve been using the SPARQL query language to access a very ad-hoc collection of personal and social graph data, and thanks to Bengee’s ARC system this can sit inside my otherwise ordinary WordPress installation. At the moment, everything in there is public, but lately I’ve been discussing oauth with a few folk as a way of mediating access to selected subsets of that data. Which means the data store will need some way of categorising the dozens of misc data source URIs. There are a few ways to do this; here I try a slightly non-obvious approach.

Every SPARQL store can have many graphs inside, named by URI, plus optionally a default graph. The way I manage my store is a kind of structured chaos, with files crawled from links in my own data and my friends. One idea for indicating the structure of this chaos is to keep “table of contents” metadata in the default graph. For example, I might load up <http://danbri.org/foaf.rdf> into a SPARQL graph named with that URI. And I might load up <http://danbri.org/evilfoaf.rdf> into another graph, also using the retrieval URI to identify the data within my SPARQL store. Now, two points to make here: firstly, that the SPARQL spec does not mandate that we do things this way. An application that for example wanted to keep historical versions of a FOAF or RSS of schema document, could keep the triples from each version in a different named graph. Perhaps these might be named with UUIDs, for example. The second point, is that there are many different “meta” claims we want to store about our datasets. And that mixing them all into the store-wide “default graph” could be rather limiting, especially if we mightn’t want to unconditionally believe even those claims.

In the example above for example, I have data from running a PGP check against foaf.rdf (which passed) and evilfoaf.rdf (which doesn’t pass a check against my pgp identity). Now where do I store the results of this PGP checking? Perhaps the default graph, but maybe that’s messy. The idea I’m playing with here is that UUIDs are reasonable identifiers, and that perhaps we’ll find ourselves sharing common UUIDs across stores.

Go back to my sent-mail FOAF crawl example from yesterday. How far did I get? Well the end result was a list of URLs which I looped through, and loaded into my big chaotic SPARQL store. If I run the following query, I get a list of all the data graphs loaded:

SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o . } }

This reveals 54 URLs, basically everything I’ve loaded into ARC in the last month or so. Only 30 of these came from yesterday’s hack, which used Google’s new Social Graph API to allow me to map from hashed mailbox IDs to crawlable data URIs. So today’s game is to help me disentangle the 30 from the 54, and superimpose them on each other, but not always mixed with every other bit of information in the store. In other words, I’m looking for a flexible, query-based way of defining views into my personal data chaos.

So, what I tried. I took the result of yesterday’s hack, a file of data URIs called urls.txt. Then I modified my commandline dataloader script (yeah yeah this should be part of wordpress). My default data loader simply takes each URI, gets the data, and shoves it into the store under a graph name which was the URI used for retrieval. What I did today is, additionally, make a “table of contents” overview graph. To avoid worrying about names, I generated a UUID and used that. So there is a graph called <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> which contains simple asserts of the form:

<http://www.advogato.org/person/benadida/foaf.rdf> a <http://xmlns.com/foaf/0.1/Document> .
<http://www.w3c.es/Personal/Martin/foaf.rdf> a <http://xmlns.com/foaf/0.1/Document> .
<http://www.iandickinson.me.uk/rdf/foaf.rdf> a <http://xmlns.com/foaf/0.1/Document> . # etc

…for each of the 30 files my crawler loaded into the store.

This lets us use <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> as an indirection point for information related to this little mailbox crawler hack. I don’t have to “pollute” the single default graph with this data. And because the uuid: was previously meaningless, it is something we might decided makes sense to use across data visibility boundaries, ie. you might use the same UUID in your own SPARQL store, so we can share queries and app logic.

Here’s a simple query. It says, “ask the mailbox crawler table of contents graph (which we call uuid:320d9etc…) for all things it knows about that are a Document”. Then it says “ask each of those documents, for everything in it”. And then the SELECT clause returns all the property URIs. This gives a first level overview of what’s in the Web of data files found by the crawl. Query was:

PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?p WHERE {
GRAPH <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> { ?crawled a :Document . }
GRAPH ?crawled { ?s ?p ?o . }
}

ORDER BY ?p

I’ll just show the first page full of properties it found; for the rest see link to the complete set. Since W3C’s official SPARQL doesn’t have aggregates, we’d need to write application code (or use something like the SPARQL+ extensions) to get property usage counts. Here are some of the properties that were found in the data:

  http://kota.s12.xrea.com/vocab/uranaibloodtype

http://purl.org/dc/elements/1.1/creator

http://purl.org/dc/elements/1.1/description

http://purl.org/dc/elements/1.1/format

http://purl.org/dc/elements/1.1/title

http://purl.org/dc/terms/created

http://purl.org/dc/terms/modifed

http://purl.org/dc/terms/modified

http://purl.org/net/inkel/rdf/schemas/lang/1.1#masters

http://purl.org/net/inkel/rdf/schemas/lang/1.1#reads

http://purl.org/net/inkel/rdf/schemas/lang/1.1/masters

http://purl.org/net/inkel/rdf/schemas/lang/1.1/reads

http://purl.org/net/inkel/rdf/schemas/lang/1.1/speaks

http://purl.org/net/schemas/quaffing/drankBeerWith

http://purl.org/net/schemas/quaffing/drankLagerWith

http://purl.org/net/vocab/2004/07/visit#caregion

http://purl.org/net/vocab/2004/07/visit#country

http://purl.org/net/vocab/2004/07/visit#usstate

http://purl.org/ontology/mo/hasTrack

http://purl.org/ontology/mo/myspace

http://purl.org/ontology/mo/performed

So my little corner of the Web includes properties that extend FOAF documents to include blood types, countries that have been visited, language skills that people have, music information, and even drinking habits. But remember that this comes from my corner of the Web – people I’ve corresponded with – and probably isn’t indicative of the wider network. But that’s what grassroots decentralised data is all about. The folk who published this data didn’t need to ask permission of any committe to do so, they just mixed in what they wanted to say, alongside terms more widely used like foaf:Person, foaf:name. This is the way it should be: ask forgiveness, not permission, from the language lawyers and standardistas.

Ok, so let’s dig deeper into the messy data I crawled up from my sentmail contacts?

Here’s one that finds some photos, either using FOAF’s :img or :depiction properties:

PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT * WHERE {
GRAPH <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> { ?crawled a :Document . }
GRAPH ?crawled {
{ ?x :depiction ?y1 } UNION { ?x :img ?y2 } .
}
}

Here’s another that asks the crawl results for names and homepages it found:

PREFIX : <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT * WHERE { GRAPH <uuid:420d9490-d73f-11dc-95ff-0800200c9a66> { ?crawled a :Document . }
GRAPH ?crawled { { [ :name ?n; :homepage ?h ] } }
}

To recap, the key point here is that social data in a SPARQL store will be rather chaotic. Information will often be missing, and often be extended. It will come from a variety of parties, some of whom you trust, some of whom you don’t know, and a few of whom will be actively malicious. Later down the line, subsets of the data will need different permissioning: if I export a family tree from ancestry.co.uk, I don’t want everyone to be able to do a SELECT for mother’s maiden name and my date of birth.

So what I suggest here, is that we can use UUID-named graphs as an organizing structure within an otherwise chaotic SPARQL environment. The demo here shows how one such graph can be used as a “table of contents” for other graphs associated with a particular app — in this case, the Google-mediated sentmail crawling app I made yesterday. Other named views might be: those data files from colleagues, those files that are plausibly PGP-signed, those that contain data structured according to some particular application need (eg. calendar, addressbook, photos, …).

Public Skype RDF presence service

OK I don’t know how this works, or how it happens (other Asemantics people might know more), but for those who didn’t know:

At http://mystatus.skype.com/danbrickley.xml there is a public RDF/XML document reflecting my status in Skype. There seems to be one for every active account name in the system.

Example markup:

<rdf:RDF>
<Status rdf:about=”urn:skype:skype.com:skypeweb/1.1″>
<statusCode rdf:datatype=”http://www.skype.com/go/skypeweb”>5</statusCode>
<presence xml:lang=”NUM”>5</presence>
<presence xml:lang=”en”>Do Not Disturb</presence>
<presence xml:lang=”fr”>Ne pas déranger</presence>
<presence xml:lang=”de”>Beschäftigt</presence>
<presence xml:lang=”ja”>取り込み中</presence>
<presence xml:lang=”zh-cn”>請勿打擾</presence>
<presence xml:lang=”zh-tw”>请勿打扰</presence>
<presence xml:lang=”pt”>Ocupado</presence>
<presence xml:lang=”pt-br”>Ocupado</presence>
<presence xml:lang=”it”>Occupato</presence>
<presence xml:lang=”es”>Ocupado</presence>
<presence xml:lang=”pl”>Nie przeszkadzać</presence>
<presence xml:lang=”se”>Stör ej</presence>
</Status>
</rdf:RDF>

In general (expressed in FOAF terms), for any :OnlineAccount that has an :accountServiceHomepage of http://www.skype.com/ you can take the :accountName – let’s call it ?a and plug it into the URI Template http://mystatus.skype.com/{a}.xml to get presence information in curiously cross-cultural RDF. In other words, one’s Skype status is part of the public record on the Web, well beyond the closed P2P network of Skype IM clients.

Thinking about RDF vocabulary design and document formats, the Skype representation is roughly akin to FOAF documents (such as those on LiveJournal currently) that don’t indicate explicitly that they’re a :PersonalProfileDocument, nor say who is the :primaryTopic or :maker of the document. Passed the RDF/XML on its own, you don’t have enough context to know what it is telling you. Whereas, if you know the URI, and the URI template rule shown above, you have a better idea of the meaning of the markup. Still, it’s useful. I suspect it might be time to add foaf:skypeID as an inverse-functional (ie. uniquely identifying) property to the FOAF spec, to avoid longwinded markup and make it easier to bridge profile data and up-to-the-minute status data. Thoughts?

Google Social Graph API, privacy and the public record

I’m digesting some of the reactions to Google’s recently announced Social Graph API. ReadWriteWeb ask whether this is a creeping privacy violation, and danah boyd has a thoughtful post raising concerns about whether the privileged tech elite have any right to experiment in this way with the online lives of those who are lack status, knowledge of these obscure technologies, and who may be amongst the more vulnerable users of the social Web.

While I tend to agree with Tim O’Reilly that privacy by obscurity is dead, I’m not of the “privacy is dead, get over it” school of thought. Tim argues,

The counter-argument is that all this data is available anyway, and that by making it more visible, we raise people’s awareness and ultimately their behavior. I’m in the latter camp. It’s a lot like the evolutionary value of pain. Search creates feedback loops that allow us to learn from and modify our behavior. A false sense of security helps bad actors more than tools that make information more visible.

There’s a danger here of technologists seeming to blame those we’re causing pain for. As danah says, “Think about whistle blowers, women or queer folk in repressive societies, journalists, etc.”. Not everyone knows their DTD from their TCP, or understand anything of how search engines, HTML or hyperlinks work. And many folk have more urgent things to focus on than learning such obscurities, let alone understanding the practical privacy, safety and reputation-related implications of their technology-mediated deeds.

Web technologists have responsibilities to the users of the Web, and while media education and literacy are important, those who are shaping and re-shaping the Web ought to be spending serious time on a daily basis struggling to come up with better ways of allowing humans to act and interact online without other parties snooping. The end of privacy by obscurity should not mean the death of privacy.

Privacy is not dead, and we will not get over it.

But it does need to be understood in the context of the public record. The reason I am enthusiastic about the Google work is that it shines a big bright light on the things people are currently putting into the public record. And it does so in a way that should allow people to build better online environments for those who do want their public actions visible, while providing immediate – and sometimes painful – feedback to those who have over-exposed themselves in the Web, and wish to backpedal.

I hope Google can put a user support mechanism on this. I know from our experience in the FOAF community, even with small scale and obscure aggregators, people will find themselves and demand to be “taken down”. While any particular aggregator can remove or hide such data, unless the data is tracked back to its source, it’ll crop up elsewhere in the Web.

I think the argument that FOAF and XFN are particularly special here is a big mistake. Web technologies used correctly (posh – “plain old semantic html” in microformats-speak) already facilitate such techniques. And Google is far from the only search engine in existence. Short of obfuscating all text inside images, personal data from these sites is readily harvestable.

ReadWriteWeb comment:

None the less, apparently the absence of XFN/FOAF data in your social network is no assurance that it won’t be pulled into the new Google API, either. The Google API page says “we currently index the public Web for XHTML Friends Network (XFN), Friend of a Friend (FOAF) markup and other publicly declared connections.” In other words, it’s not opt-in by even publishers – they aren’t required to make their information available in marked-up code.

The Web itself is built from marked-up code, and this is a thing of huge benefit to humanity. Both microformats and the Semantic Web community share the perspective that the Web’s core technologies (HTML, XHTML, XML, URIs) are properly consumed both by machines and by humans, and that any efforts to create documents that are usable only by (certain fortunate) humans is anti-social and discriminatory.

The Web Accessibility movement have worked incredibly hard over many years to encourage Web designers to create well marked up pages, where the meaning of the content is as mechanically evident as possible. The more evident the meaning of a document, the easier it is to repurpose it or present it through alternate means. This goal of device-independent, well marked up Web content is one that unites the accessibility, Mobile Web, Web 2.0, microformat and Semantic Web efforts. Perhaps the most obvious case is for blind and partially sighted users, but good markup can also benefit those with the inability to use a mouse or keyboard. Beyond accessibility, many millions of Web users (many poor, and in poor countries) will have access to the Web only via mobile phones. My former employer W3C has just published a draft document, “Experiences Shared by People with Disabilities and by People Using Mobile Devices”. Last month in Bangalore, W3C held a Workshop on the Mobile Web in Developing Countries (see executive summary).

I read both Tim’s post, and danah’s post, and I agree with large parts of what they’re both saying. But not quite with either of them, so all I can think to do is spell out some of my perhaps previously unarticulated assumptions.

  • There is no huge difference in principle between “normal” HTML Web pages and XFN or FOAF. Textual markup is what the Web is built from.
  • FOAF and XFN take some of the guesswork out of interpreting markup. But other technologies (javascript, perl, XSLT/GRDDL) can also transform vague markup into more machine-friendly markup. FOAF/XFN simply make this process easier and less heuristic, less error prone.
  • Google was not the first search engine, it is not the only search engine, and it will not be the last search engine. To obsess on Google’s behaviour here is to mistake Google for the Web.
  • Deeds that are on the public record in the Web may come to light months or years later; Google’s opening up of the (already public, but fragmented) Usenet historical record is a good example here.
  • Arguing against good markup practice on the Web (accessible, device independent markup) is something that may hurt underprivileged users (with disabilities, or limited access via mobile, high bandwidth costs etc).
  • Good markup allows content to be automatically summarised and re-presented to suit a variety of means of interaction and navigation (eg. voice browsers, screen readers, small screens, non-mouse navigation etc).
  • Good markup also makes it possible for search engines, crawlers and aggregators to offer richer services.

The difference between Google crawling FOAF/XFN from LiveJournal, versus extracting similar information via custom scripts from MySpace, is interesting and important solely to geeks. Mainstream users have no idea of such distinctions. When LiveJournal originally launched their FOAF files in 2004, the rule they followed was a pretty sensible one: if the information was there in the HTML pages, they’d also expose it in FOAF.

We need to be careful of taking a ruthless “you can’t make an omelete without breaking eggs” line here. Whatever we do, people will suffer. If the Web is made inaccessible, with information hidden inside image files or otherwise obfuscated, we exclude a huge constituency of users. If we shine a light on the public record, as Google have done, we’ll embarass, expose and even potentially risk harm to the people described by these interlinked documents. And if we stick our head in the sand and pretend that these folk aren’t exposed, I predict this will come back to bite us in the butt in a few months or years, since all that data is out there, being crawled, indexed and analysed by parties other than Google. Parties with less to lose, and more to gain.

So what to do? I think several activities need to happen in parallel:

  • Best practice codes for those who expose, and those who aggregate, social Web data
  • Improved media literacy education for those who are unwittingly exposing too much of themselves online
  • Technology development around decentralised, non-public record communication and community tools (eg. via Jabber/XMPP)

Any search engine at all, today, is capable of supporting the following bit of mischief:

Take some starting point a collection of user profiles on a public site. Extract all the usernames. Find the ones that appear in the Web less than say 10,000 times, and on other sites. Assume these are unique userIDs and crawl the pages they appear in, do some heuristic name matching, … and you’ll have a pile of smushed identities, perhaps linking professional and dating sites, or drunken college photos to respectable-new-life. No FOAF needed.

The answer I think isn’t to beat up on the aggregators, it’s to improve the Web experience such that people can have real privacy when they need it, rather than the misleading illusion of privacy. This isn’t going to be easy, but I don’t see a credible alternative.

Embedding queries in RDF – FOAF Group example

Is this crazy or useful? Am not sure yet.

This example uses FOAF vocabulary for groups and openid. So the basic structure here is that Agents (including persons) can have an :openid and can be a :member of a :Group.

From an openid-augmented WordPress, we get a list of all the openids my blog knows about. From an openid-augmented MediaWiki, we get a list of all the openids that contribute to the FOAF project wiki. I dumped each into a basic RDF file (not currently an automated process). But the point here is to explore enumerated groups using queries.

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns=”http://xmlns.com/foaf/0.1/”>
<Group rdf:about=’#both’>
<!– enumerated membership –>
<member><Agent><openid rdf:resource=’http://danbri.org/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://tommorris.org/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://kidehen.idehen.net/dataspace/person/kidehen’/></Agent></member>
<member><Agent><openid rdf:resource=’http://www.wasab.dk/morten/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://kronkltd.net/’/></Agent></member>
<member><Agent><openid rdf:resource=’http://www.kanzaki.com/’/></Agent></member>

<!– rule-based membership –>

<constructor><![CDATA[
PREFIX : <http://xmlns.com/foaf/0.1/>
CONSTRUCT {
<http://danbri.org/yasns/danbri/both.rdf#thegroup> a :Group; :member [ a :Agent; :openid ?id ]
}
WHERE {
GRAPH <http://wiki.foaf-project.org/_users.rdf> { [ a :Group; :member [ a :Agent; :openid ?id ]. ] }
GRAPH <http://danbri.org/yasns/danbri/_group.rdf> { [ a :Group; :member [ a :Agent; :openid ?id ]. ] }
}
]]></constructor>
</Group>
</rdf:RDF>

This RDF description does it both ways. It enumerates (for simple clients) a list of members of a group whose members are those individuals that are both commentators on my blog, and contributors to the FOAF wiki. At least, to the extent they’re detectable via common use of OpenID URIs. But the RDF group description also embeds a SPARQL query, the kind which generates RDF rather than an SQL-like resultset. The RDF essentially regenerates the enumerated list, assuming the query is run against an RDF dataset with the data graphs appropriately populated.

Now I sorta like this, and I sorta don’t. It may be incredibly powerful, or it may be a bit to clever for its own good.

Certainly there’s scope overlap with the W3C RIF rules work, and with the capabilities of OWL. FOAF has long contained an experimental method for using OWL to do something similar, but it hasn’t found traction. The motivation I have here for trying SPARQL here is that it has built-in machinery for talking about the provenance of data; so I could write a group description this way that says “members are anyone listed as a colleague in http://myworkplace.example.com/stafflist.rdf”. Or I could mix in arbitrary descriptive vocabularies; family tree stuff, XFN, language abilities (speaks-reads-writes) etc.

Where I think this could fall down is in the complexity of the workflow. The queries need executing against some SPARQL installation with a configured dataset, and the query lists URIs of data graphs. But I doubt database admins will want to randomly load any/every RDF file mentioned in these shared queries. Perhaps something like SparqlPress, attached to one’s weblog, and social filters to load only files in queries eg. from friends? Also, authoring these kinds of query isn’t something non-geek users are going to do often, and the sorts of queries that will work will depend of course on the data actually available. Sure I could write a query based on matching the openids of former colleagues, but the group will be empty unless the data listing people as former colleagues is actually out there and in the Web, and written in the terms anticipated by the query.

On the other hand, this mechanism does appeal, and could go way beyond FOAF group definitions. We could see a model where people post data in the Web but also post queries, eg. revisiting the old work Libby and I explored around RSS query. On the other other hand, who wants to make their Web queries public? All that said, the same goes for the data being queried. And since this technique embeds queries inside ordinary RDF data, however we deal with the data visibility issue for RDF/FOAF should also work for the query stuff. Perhaps. Can’t blame me for trying…
I realise this isn’t the clearest of explanations. Let’s try again:

RDF is normally for publishing collections of simple claims about the world. This is an experiment in embedding data-generating-queries amongst these claims, where the query is configured to output more RDF claims (aka statements, triples etc), but only when executed against some appropriate body of RDF data. Since the query is written in SPARQL, it allows the data-generation rules to mention interesting things, such as properties of the source of the data being queried.

This particular experiment is couched in terms of FOAF’s “Group” construct, but the technique is entirely general. The example above defines a group of agents called the “both” group, by saying that an Agent is in that group if it its OpenID URI is listed in each of two RDF documents specified, ie. both a commentator on my blog, and a contributor to the FOAF Wiki. Other examples could be “(fe)male employees” or “family members sharing a blood type” or in fact, any descriptive pattern that can match against the data to hand and be expressed in SPARQL.

OpenID and Wireless sharing

via Makenshi in #openid chat on Freenode IRC:

<Makenshi>: I found a wireless captive portal solution that supports openid.

With the newest release of CoovaAP, some new features in Chilli are demonstrated in combination with RADIUS to allow OpenID based authentication. (coova.org)

I’m happy to see this. It’s very close to some ideas I was discussing with Schuyler Earle and Jo Walsh some years ago around NoCatAuth, FOAF and community wireless. Some semweb stories may yet come to life.

At the moment, the options available for wireless ‘net sharing are typically: let everyone in, have a widely known secret for accessing your network, or let more or less nobody in without individually approving them. Although the likes of Bruce Schneier argue the merits of open wireless, most 802.11 kit now comes out of the box closed by default, and usually stay that way. Having a standards-based and decentralised way of saying “you can use my network, but only if you login with some identifiable public persona first” would be interesting.

OpenID takes away a significant part of the problem space, allowing experimentation with a whole range of socially oriented policies on top. Doubtless there are legal risks, big privacy issues, and lurking security concerns. But there is also potential for humanising interactions that are currently rather anonymous. In the city I live in, Bristol, there’s a community wireless effort, Bristol Wireless, as well as wireless Internet in countless local cafes. Plus commercial hotspots and whatever the city council are up to. Currently these are fragmented, and offer a variety of approaches. Could OpenID offer a common approach for Bristolians to connect? I like the idea that (for those that choose to ‘go public’) OpenIDs could link scattered presence across community sites. Having OpenID-based login used eg. for cafe-based access could be a nice step in that direction. But would people trust their local cafe to know what they’re doing online any more than they trust Google? Should they?

Flickr (Yahoo) upcoming support for OpenID

According to Simon Willison, Flickr look set to support OpenID by allowing your photostream URL (eg. for me, http://www.flickr.com/photos/danbri/) to serve as an OpenID, ie. something you can type wherever you see “login using OpenID” and be bounced to Flickr/Yahoo to provide credentials instead of remembering yet another password. This is rather good news.

For the portability-minded, it’s worth remembering that OpenID lets you put markup in your own Web page to devolve to such services. So my main OpenID is “danbri.org” , which is a document I control, on a domain that I own. In the HTML header I have the following markup:



<link rel="meta" type="application/rdf+xml" title="FOAF" href="http://danbri.org/foaf.rdf" />
<link rel="openid.server" href="http://www.livejournal.com/openid/server.bml" />
<link rel="openid.delegate" href="http://danbri.livejournal.com/" />

…which is enough to defer the details of being an OpenID provider to LiveJournal (thanks, LiveJournal!). Flickr are about to join the group of sites you can use in this way, it seems.

As an aside, this means that the security of our own websites becomes yet more important. Last summer, DreamHost (my webhosting provider) were compromised, and my own homepage was briefly decorated with viagra spam. Fortunately they didn’t touch seem to touch the OpenID markup, but you can see the risk. That’s the price of portability here. As Simon points out, we’ll probably all have several active OpenIDs, and there’s no need to host your own, just as there’s no need for people who want to publish online to buy and host their own domains or HTML sites.

The Flickr implementation, coupled with their existing API, means we could all offer things like “log into my personal site for family (or friends)” and defer buddylist – and FOAF – management to the well-designed Flickr site, assuming all your friends or family have Flickr accounts. Implementing this in a way that works with other providers (eg. LJ) is left as an excercise for the reader ;)