Waving not Drowning? groups as buddylist filters

I’ve lately started writing up and prototyping around a use-case for the “Group” construct in FOAF and for medium-sized, partially private data aggregators like SparqlPress. I think we can do something interesting to deal with the social pressure and information load people are experiencing on sites like Flickr and Twitter.

Often people have rather large lists of friends, contacts or buddys – publically visible lists – which play a variety of roles in their online life. Sometimes these roles are in tension. Flickr, for example, allow their users to mark some of their contacts as “friend” or “family” (or both). Real life isn’t so simple, however. And the fact that this classification is shared (in Flickr’s case with everyone) makes for a potentially awkward dynamic. Do you really want to tell everyone you know whether they are a full “friend” or a mere “contact”? Let alone keep this information up to date on every social site that hosts it. I’ve heard a good few folk complain about the stress of adding yet another entry to their list of “twitter follows” people.  Their lists are often already huge through some sense of social obligation to reciprocate. I think it’s worth exploring some alternative filtering and grouping mechanisms.

On the one hand, we have people “bookmarking” people they find interesting, or want to stay in touch with, or “get to know better”. On the other, we have the bookmarked party sometimes reciprocating those actions because they feel it polite; a situation complicated by crude categories like “friend” versus “contact”. What makes this a particularly troublesome combination is when user-facing features, such as “updates from your buddies” or “photos from your friends/contacts” are built on top of these buddylists.

Take my own case on Flickr, I’m probably not typical, but I’m a use case I care about. I have made no real consistent use of the “friend” versus “contact” distinction; to even attempt do so would be massively time consuming, and also a bit silly. There are all kinds of great people on my Flickr contacts list. Some I know really well, some I barely know but admire their photography, etc. It seems currently my Flickr account has 7 “family”, 79 “friends” and 604 “contacts”.

Now to be clear, I’m not grumbling about Flickr. Flickr is a work of genius, no nitpicking. What I ask for is something that goes beyond what Flickr alone can do. I’d like a better way of seeing updates from friends and contacts. This is just a specific case of a general thing (eg. RSS/Atom feed management), but let’s stay specific for now.

Currently, my Flickr email notification settings are:

  • When people comment on your photos: Yes
  • When your friends and family upload new photos: Yes (daily digest)
  • When your other contacts upload new photos: No

What this means is that I selected to see notifications photos from those 86 people who I haveflagged as “friend” or “family”. And I chose not to be notified of new photos from the other 604 contacts. Even though that list contains many people I know, and would like to know better. The usability question here is: how can we offer more subtlety in this mechanism, without overwhelming users with detail? My guess is that we approach this by collecting in a more personal environment some information that users might not want to state publically in an online profile. So a desktop (or weblog, e.g. SparqlPress) aggregation of information from addressbooks, email send/receive patterns, weblog commenting behaviours, machine readable resumes, … real evidence of real connections between real people.

And so I’ve been mining around for sources of “foaf:Group” data. As a work in progress testbed I have a list of OpenIDs of people whose comments I’ve approved in my blog. And another, of people who have worked on the FOAF wiki. I’ve been looking at the machine readable data from W3C’s Tech Reports digital archive too, as that provides information about a network of collaboration going back to the early days of the Web (and it’s available in RDF). I also want to integrate my sent-mail logs, to generate another group, and extract more still from my addressbook. On top of this of course I have access to the usual pile of FOAF and XFN, scraped or API-extracted information from social network sites and IM accounts. Rather than publish it all for the world to see, the idea here is to use it to generate simple personal-use user interfaces couched in terms of these groups. So, hopefully I shall be able to use those groups as a filter against the 600+ members of my flickr buddylist, and make some kind of basic RSS-filtered view of their photos.

If this succeeds, it may help people to have huge buddylists without feeling they’ve betrayed their “real friends” by doing so, or forcing them to put their friends into crudely labelled public buckets marked “real friend” and “mere contact”. The core FOAF design principle here is our longstanding bias towards evidence-based rather than proclaimed information.  Don’t just ask people to tell you who their friends are, and how close the relationship is (especially in public, and most especially in a format that can be imported into spreadsheets for analysis). Instead… take the approach of  “don’t say it, show it“. Create tools based on the evidence that friendship and collaboration leaves in the Web. The public Web, but also the bits that don’t show up in Google, and hopefully never will. If we can find a way to combine all that behind a simple UI, it might go a little way towards “waving not drowning” in information.

Much of this is stating the obvious, but I thought I’d leave the nerdly details of SPARQL and FOAF/RDF for another post.

Instead here’s a pointer to a Scoble article on Mr Facebook, where they’re grappling with similar issues but from a massive aggregation rather than decentralised perspective:

Facebook has a limitation on the number of friends a person can have, which is 4,999 friends. He says that was due to scaling/technical issues and they are working on getting rid of that limitation. Partly to make it possible to have celebrities on Facebook but also to make it possible to have more grouping kinds of features. He told me many members were getting thousands of friends and that those users are also asking for many more features to put their friends into different groups. He expects to see improvements in that area this year.

ps. anyone know if Facebook group membership data is accessible through their APIs? I know I can get group data from LiveJournal and from My Opera in FOAF, but would be cool to mix in Facebook too.

OpenID and Wireless sharing

via Makenshi in #openid chat on Freenode IRC:

<Makenshi>: I found a wireless captive portal solution that supports openid.

With the newest release of CoovaAP, some new features in Chilli are demonstrated in combination with RADIUS to allow OpenID based authentication. (coova.org)

I’m happy to see this. It’s very close to some ideas I was discussing with Schuyler Earle and Jo Walsh some years ago around NoCatAuth, FOAF and community wireless. Some semweb stories may yet come to life.

At the moment, the options available for wireless ‘net sharing are typically: let everyone in, have a widely known secret for accessing your network, or let more or less nobody in without individually approving them. Although the likes of Bruce Schneier argue the merits of open wireless, most 802.11 kit now comes out of the box closed by default, and usually stay that way. Having a standards-based and decentralised way of saying “you can use my network, but only if you login with some identifiable public persona first” would be interesting.

OpenID takes away a significant part of the problem space, allowing experimentation with a whole range of socially oriented policies on top. Doubtless there are legal risks, big privacy issues, and lurking security concerns. But there is also potential for humanising interactions that are currently rather anonymous. In the city I live in, Bristol, there’s a community wireless effort, Bristol Wireless, as well as wireless Internet in countless local cafes. Plus commercial hotspots and whatever the city council are up to. Currently these are fragmented, and offer a variety of approaches. Could OpenID offer a common approach for Bristolians to connect? I like the idea that (for those that choose to ‘go public’) OpenIDs could link scattered presence across community sites. Having OpenID-based login used eg. for cafe-based access could be a nice step in that direction. But would people trust their local cafe to know what they’re doing online any more than they trust Google? Should they?

Bouncin’ around

A couple weeks ago, in a pathetic and greedsome bid to monetize you all, I added Google Adsense ads to this blog. Needless to say, the jaded foaftards who read this thing aren’t the kind to go all clicky-buyey on adverts. When I finally earn a whole dollar, I’ll celebrate by having it converted to hard currency and spend it on, er I dunno, charity. Meanwhile, the ad targetting engine is entertaining if nothing else. I’ll leave the thing running on archive pages, but it should be gone from the main page now.

Thanks to Bob DuCharme for pointing out that I’m apparently now endorsing rebounders.com, your one stop solution for post-breakup adult dating social networking. Thanks, Google. Maybe they caught me looking at the Internet Dating Conference website (they’re still looking for speakers btw)? Well I think not – I’ve decided that the ad targetting reflects more on their knowledge of danbri.org’s readership than on its author. There’s some logic to that, so thanks everyone…

Seriously, I am really suprised at how weak the Adsense targetting is. Google too often are treated as infallible gods, but the ad targetting I’ve seen so far feels like it’s done with grep. But I don’t have anything better right now, so I should shut up with the complaining. Here’s that ad again for those that missed it.

google ad

Imagemap magic

I’ve always found HTML imagemaps to be a curiously neglected technology. They seem somehow to evoke the Web of the mid-to-late 90s, to be terribly ‘1.0’. But there’s glue in the old horse yet…

A client-side HTML imagemap lets you associate links (and via Javascript, behaviour) with regions of an image. As such, they’re a form of image metadata that can have applications including image search, Web accessibility and social networking. They’re also a poor cousin to the Web’s new vector image format, SVG. This morning I dug out some old work on this (much of which from Max, Libby, Jim all of whom btw are currently working at Joost; as am I, albeit part-time).

The first hurdle you hit when you want to play with HTML imagemaps is finding an editor that produces them. The fact that my blog post asking for MacOSX HTML imagemap editors is now top Google hit for “MacOSX HTML imagemap” pretty much says it all. Eventually I found (and paid for) one called YokMak that seems OK.

So the first experiment here, was to take a picture (of me) and make a simple HTML imagemap.

danbri being imagemapped

As a step towards treating this as re-usable metadata, here’s imagemap2svg.xslt from Max back in 2002. The results of running it with xsltproc are online: _output.svg (you need an SVG-happy browser). Firefox, Safari and Opera seem more or less happy with it (ie. they show the selected area against a pink background). This shows that imagemap data can be freed from the clutches of HTML, and repurposed. You can do similar things server-side using Apache Batik, a Java SVG toolkit. There are still a few 2002 examples floating around, showing how bits of the image can be described in RDF that includes imagemap info, and then manipulated using SVG tools driven from metadata.

Once we have this ability to pick out a region of an image (eg. photo) and tag it, it opens up a few fun directions. In the FOAF scene a few years ago, we had fun using RDF to tag image region parts with information about the things they depicted. But we didn’t really get into questions of surface-syntax, ie. how to maker rich claims about the image area directly within the HTML markup. These days, some combination of RDFa or microformats would probably be the thing to use (or perhaps GRDDL). I’ve sent mail to the RDFa group looking for help with this (see that message for various further related-work links too).

Specifically, I’d love to have some clean HTML markup that said, not just “this area of the photo is associated with the URI http://danbri.org/”, but “this area is the Person whose openid is danbri.org, … and this area depicts the thing that is the primary topic of http://en.wikipedia.org/wiki/Eiffel_Tower”. If we had this, I think we’d have some nice tools for finding images, for explaining images to people who can’t see them, and for connecting people and social networks through codepiction.


Chocolate Teapot

chocolate teapot

Michael Sparks in the BBC Backstage permathread on DRM:

However any arguments based on open standards do need to take facts into account though. Such as this one: The BBC is currently required by the rights holders to use DRM.

Tell me how you can have a DRM system that’s completely free software, and I’ll readily listen and push for such an approach. (I doubt I’ll get anywhere, but I’ll try)

The two ideas strike me as fundamentally opposed concepts. After all one tries to protect your right to use your system in any way you like (for me modifying it the system is a use), whereas the other tries to prevent you using your computer to do something. I’ve said this before of course. No-one has yet said how they’d securely prevent trivial access to the keys and trivially prevent data dumping (ala vlc’s dump to disk option).

So personally I can’t actually see how you can have a completely free software DRM system and have that system viewed as _sufficiently secure_ from the DRM proponents side of things. Kinda like a chocolate tea pot. I like to be proved wrong about things things. (chocolate is good too)

I like this explanation. It kinda captures why I’ve been happy working at Joost. Content’s the thing, and vast amounts of content simply won’t go online in a stable, referenceable, linkable, annotable, decoratable form unless the people who made it or own it are happy. Which, in the current climate (including background conditions like capitalism and a legal system) I think means DRM and closed source. I love Creative Commons, grassroots content, free, alternative and webcam-sourced content. But I also want the telly that I grew up watching to find it’s way into more public spaces. If the price of this happening now rather than in a decade’s time is DRM and closed source, I’m ok with that. Software is a means to an end, not an end in itself.

All that said, I’d also like a chocolate teapot, of course. Who wouldn’t?

ps. and yes, I did Google to make sure that “chocolate teapot” isn’t some terrifying sexual practice. It doesn’t appear to be, and I’m left wondering whether it’s Englishness or the Internet that has warped my brain to the extend that I’d even consider such an interpretation of this lovely phrase… I blame the “Carry On” films….

“The World is now closed”

Facebook in many ways is pretty open for a ‘social networking’ site. It gives extension apps a good amount of access to both data and UI. But the closed world language employed in their UI betrays the immodest assumption “Facebook knows all”.

  • Eric Childress and Stuart Weibel are now friends with Charles McCathienevile.
  • John Doe is now in a relationship.
  • You have 210 friends.

To state the obvious: maybe Eric, Stu and Chaals were already friends. Maybe Facebook was the last to know about John’s relationship; maybe friendship isn’t countable. As the walls between social networking sites slowly melt (I put Jabber/XMPP first here, with OpenID, FOAF, SPARQL and XFN as helper apps), me and my 210 closest friends will share fragments of our lives with a wide variety of sites. If we choose to make those descriptions linkable, the linked sites will increasingly need to refine their UI text to be a little more modest: even the biggest site doesn’t get the full story.

Closed World Assumption (Abort/Retry/Fail)
Facebook are far from alone in this (see this Xbox screenshot too, “You do not have any friends!”); but even with 35M users, the mistake is jarring, and not just to Semantic Web geeks of the missing isn’t broken school. It’s simply a mistake to fail to distinguish the world from its description, or the territory from the map.

A description of me and my friends hosted by a big Web site isn’t “my social network”. Those sites are just a database containing claims made by different people, some verified, some not. And with, inevitably, lots missing. My “social network” is an abstractification of a set of interlinked real-world histories. You could make the case that there has only ever been one “social network” since the distant beginnings of human society; certainly those who try to do geneology with Web data formats run into this in a weaker form, including the need to balance competing and partial information. We can do better than categorised “buddylists” when describing people, their inter-connections and relationships. And in many ways Facebook is doing just great here. Aside from the Pirates-vs-Ninjas noise, many extension applications on Facebook allow arbitrary events from elsewhere in the Web to bubble up through their service and be seen (or filtered) by others who are linked to me in their database. For example:

Facebook is good at reporting events, generally. Especially those sourced outside the system. Where it isn’t so great is when reporting internal-events, eg. someone telling it about a relationship. Event descriptions are nice things to syndicate btw since they never go out of date. Syndicating descriptions of the changeable properties of the world, on the other hand, is more slippery since you need to have all other relevant facts to be able to say how the world is right now (or implicitly, how it used to be, before). “Dan has painted his car red” versus “Dan’s car is now red”. “Dan has bookmarked the Jabber user profile spec” versus “Dan now has 1621 bookmarks”. “Dan has added Charles to his Facebook profile” versus “Dan is now friends with Charles”.

We need better UI that reflects what’s really going on. There will be users who choose to live much of their lives in public view, spread across sites, sharing enough information for these accounts to be linked. Hopefully they’ll be as privacy-smart and selective as Pew suggests. Personas and ‘characters’ can be spread across sites without either site necessarily revealing a real-world identity; secrets are keepable, at least in theory. But we will see people’s behaviour and claims from one site leak into another, and with approval. I don’t think this will be just through some giant “social graph” of strictly enumerated relationships, but through a haze of vaguer data.

What we’re most missing is a style of end-user UI here that educates users about this world that spans websites, couching things in terms of claims hosted in sites, rather than in absolutist terms. I suppose I probably don’t have 210 “friends” (whatever that means) in real life, although I know a lot of great people and am happy to be linked to them online. But I have 210 entries in a Facebook-hosted database. My email whitelist file has 8785 email addresses in it currently; email accounts that I’m prepared to assume aren’t sending me spam. I’m sure I can’t have 8785 friends. My Google Mail (and hence GTalk Jabber) account claims 682 contacts, and has some mysterious relationship to my Orkut account where I have 200+ (more randomly selected) friends. And now the OpenID roster on my blog gives another list (as of today, 19 OpenIDs that made it past the WordPress spam filter). Modern social websites shouldn’t try to tell me how many friends I have; that’s just silly. And they shouldn’t assume their database knows it all. What they can do is try to tell me things that are interesting to me, with some emphasis on things that touch my immediate world and the extended world of those I’m variously connected to.

So what am I getting at here? I guess it’s just that we need these big social sites to move away from making teen-talk claims about how the world is – “Sally (now) loves John” – and instead become reflectors for the things people are saying, “Sally announces that she’s in love with John”; “John says that he used to work for Microsoft” versus “John worked for Microsoft 2004-2006″; “Stanford University says Sally was awarded a PhD in 2008″. Today’s young internet users are growing up fast, and the Web around them needs also to mature.

One of the most puzzling criticisms you’ll often hear about the Semantic Web initiative is that is requires a single universal truth, a monolithic ontology to model all of human knowledge. Those of us in the SW community know that this isn’t so; we’ve been saying for a long time that our (meta)data architecture is designed to allow people to publish claims “in which
statements can draw upon multiple vocabularies that are managed in a decentralised fashion by various communities of expertise.”
As the SemWeb technology stack now has a much better approach to representing data provenance (SPARQL named graphs replacing RDF’99 statement reification) I believe we should now be putting more emphasis on a related theme: Semantic Web data can represent disputes, competing claims, and contradictions. And we can query it in an SQL-like language (SPARQL) that allows us to ask questions not just of some all-knowing database, but about what different databases are telling us.

The closed world approach to data gives us a lot, don’t get me wrong. I’m not the only one with a love-hate relationship with SQL. There are many optimisations we can do in a traditional SQL or XML Schema environment which become hard in an RDF context. In particular, going “open world” makes for a harder job when hosting and managing data rather than merely aggregating and integrating it. Nevertheless, if you’re looking for a modern Web data environment for aggregating claims of the “Stanford University says Sally was awarded a PhD in 1995″ form, SPARQL has a lot to offer.

When we’re querying a single, all-knowing, all-trusted database, SQL will do the job (eg. see Facebook’s FQL for example). When we need to take a bit more care with “who said what” and “according to whom?” aspects, coupled with schema extensibility and frequently missing data, SQL starts to hurt. If we’re aggregating (and building UI for) ‘social web’ claims about the world rather than simple buddylists (which XMPP/Jabber gives us out of the box), I suspect aggregators will get burned unless they take care to keep careful track of who said what, whether using SPARQL or some home-grown database system in the same spirit. And I think they’ll find that doing so will be peculiarly rewarding, giving us a foundation for applications that do substantially more than merely listing your buddies…

Who, what, where, when?

A “Who? what? where? when?” of the Semantic Web is taking shape nicely.

Danny Ayers shows some work with FOAF and the hCard microformat, picking up a theme first explored by Dan Connolly back in 2000: inter-conversion between RDF and HTML person descriptions. Danny generates hCards from SPARQL queries of FOAF, an approach which would pair nicely with GRDDL for going in the other direction.

Meanwhile at W3C, the closing days of the SW Best Practices Group have recently produced a draft of an RDF/OWL Representation of Wordnet. Wordnet is a fantastic resource, containing descriptions of pretty much every word in the English language. Anyone who has spent time in committees, deciding which terms to include in some schema/vocabulary, must surely feel the appeal of a schema which simply contains all possible words. There are essentially two approaches to putting Wordnet into the Semantic Web. A lexically-oriented approach, such as the one published at W3C for Wordnet 2.0, presents a description of words. It mirrors the structure of wordnet itself (verbs, nouns, synsets etc.). Consequently it can be a complete and unjudgemental reflection into RDF of all the work done by the Wordnet team.

The alternate, and complementary, approach is to explore ways of projecting the contents of Wordnet into an ontology, so that category terms (from the noun hierarchy) in Wordnet become classes in RDF. I made a simplistic approach at this some time back (see overview). It has appeal (alonside the linguistic version) because it allows RDF to be used to describe instances of classes for each concept in wordnet, with other properties of those instances. See WhyWordnetIsCool in the FOAF wiki for an example of Wordnet’s coverage of everyday objects.

So, getting Wordnet moving into the SW is great step. It gives us URIs to identify a huge number of everyday concepts. It’s coverage isn’t complete, and it’s ontology is kinda quirky. Aldo Gangemi and others have worked on tidying up the hierarchy; I believe only for version 1.6 of Wordnet so far. I hope that work will eventually get published at W3C or elsewhere as stable URIs we can all use.

In addition to Wordnet there are various other efforts that give types that can be used for the “what” of “who/what/where/when”. I’ve been talking with Rob McCool about re-publishing a version of the old TAP knowledge base. The TAP project is now closed, with Rob working for Yahoo and Guha at Google. Stanford maintain the site but aren’t working on it. So I’ve been working on a quick cleanup (wellformed RDF/XML etc.) of TAP that could get it into more mainstream use. TAP, unlike Wordnet, has more modern everyday commercial concepts (have a look), as well as a lot of specific named instances of these classes.

Which brings me to (Semantic) Wikipedia; another approach to identifying things and their types on the Semantic Web. A while back we added isPrimaryTopicOf to FOAF, to make it easier to piggyback on Wikipedia for RDF-identifying things that have Wiki (and other) pages about them. The Semantic Mediawiki project goes much much further in this direction, providing a rich mapping (classes etc.) into RDF for much of Wikipedia’s more data-oriented content. Very exciting, especially if it gets into the main codebase.

So I think the combination of things like Wordnet, TAP, Wikipedia, and instance-identifying strategies such as “isPrimaryTopicOf”, will give us a solid base for identifying what the things are that we’re describing in the Semantic Web.

And regarding. “Where?” and “when?” … on the UI front, we saw a couple of announcements recently: OpenLayers v1.0, which provides Google-maps-like UI functionality, but opensource and standards friendly. And for ‘when’, a similar offering: the timeline widget. This should allow for fancy UIs to be wired in with RDF calendar or RDF-geo tagged data.

Talking of which… good news of the week: W3C has just announced a Geo incubator group (see detailed charter), whose mission includes updates for the basic Geo (ie. lat/long etc) vocabulary we created in the SW Interest Group.

Ok, I’ve gone on at enough length already, so I’ll talk about SKOS another time. In brief – it fits in here in a few places. When extended with more lexical stuff (for describing terms, eg. multi-lingual thesauri) it could be used as a base for representing the lexically-oriented version of Wordnet. And it also fits in nicely with Wikipedia, I believe.

Last thing, don’t get me wrong — I’m not claiming these vocabs and datasets and bits of UI are “the” way to do ‘who/what/where/when’ on the Semantic Web. They’re each one of several options. But it is great to have a whole pile of viable options at last :)