OpenSocial schema extraction: via Javascript to RDF/OWL

OpenSocial’s API reference describes a number of classes (‘Person’, ‘Name’, ‘Email’, ‘Phone’, ‘Url’, ‘Organization’, ‘Address’, ‘Message’, ‘Activity’, ‘MediaItem’, ‘Activity’, …), each of which has various properties whose values are either strings, references to instances of other classes, or enumerations. I’d like to make them usable beyond the confines of OpenSocial, so I’m making an RDF/OWL version. OpenSocial’s schema is an attempt to provide an overarching model for much of present-day mainstream ‘social networking’ functionality, including dating, jobs etc. Such a broad effort is inevitably somewhat open-ended, and so may benefit from being linked to data from other complementary sources.

With a bit of help from the shindig-dev list, #opensocial IRC, and Kevin Brown and Kevin Marks, I’ve tracked down the source files used to represent OpenSocial’s data schemas: they’re in the opensocial-resources SVN repository on code.google.com. There is also a downstream copy in the Apache Shindig SVN repo (I’m not very clear on how versioning and evolution is managed between the two). They’re Javascript files, structured so that documentation can be generated via javadoc. The Shindig-PHP schema diagram I posted recently is a representation of this schema.

So – my RDF version. At the moment it is merely a list of classes and their properties (expressed using via rdfs:domain), written using RDFa/HTML. I don’t yet define rdfs:range for any of these, nor handle the enumerated values (opensocial.Enum.Smoker, opensocial.Enum.Drinker, opensocial.Enum.Gender, opensocial.Enum.LookingFor, opensocial.Enum.Presence) that are defined in enum.js.

The code is all in the FOAF SVN, and accessible via “svn co http://svn.foaf-project.org/foaftown/opensocial/vocab/”. I’ve also taken the liberty of including a copy of the OpenSocial *.js files, and Mozilla’s Rhino Javascript interpreter js.jar in there too, for self-containedness.

The code in schemarama.js will simply generate an RDFA/XHTML page describing the schema. This can be checked using the W3C validator, or converted to RDF/XML with the pyRDFa service at W3C.

I’ve tested the output using the OwlSight/pellet service from Clark & Parsia, and with Protege 4. It’s basic but seems OK and a foundation to build from. Here’s a screenshot of the output loaded into Protege (which btw finds 10 classes and 99 properties).

An example view from protege, showing the class browser in one panel, and a few properties of Person in another.

OK so why might this be interesting?

  • Using OpenSocial-derrived vocabulary, OpenSocial-exported data in other contexts
    • databases (queryable via SPARQL)
    • mixed with FOAF
    • mixed with Microformats
    • published directly in RDFa/HTML
  • Mapping OpenSocial terms with other contact and social network schemas

This suggests some goals for continued exploration:

It should be possible to use “OpenSocial markup” in an ordinary homepage or blog (HTML or XHTML), drawing on any of the descriptive concepts they define, through using RDFa’s markup notation. As Mark Birbeck pointed out recently, RDFa is an empty vessel – it does not define any descriptive vocabulary. Instead, the RDF toolset offers an environment in which vocabulary from multiple independent sources can be mixed and merged quite freely. The hard work of the OpenSocial team in analysing social network schemas and finding commonalities, or of the Microformats scene in defining simple building-block vocabularies … these can hopefully be combined within a single environment.

Foundation Nation: new orgs for Infocards, Symbian

Via the [IP] list, I read that the Information Card Foundation has launched.

Information Cards are the new way to control your personal data and identity on the web.

The Information Card Foundation is a group of thoughtful designers, architects, and companies who want to make the digital world easier for you by building better products that help you get control of your personal information.

From their blog, where Charles Andres offers a historical account of where they fit in:

And by early 2007, four tribes in the newly discovered continent of user-centric identity had united under the banner of OpenID 2.0 and brought the liberating power of user-controlled identifiers to the digital identity pioneers. The OpenID community formed the OpenID Foundation to serve as a trustee for intellectual property and a host for community activity and by early 2008 had attracted Microsoft, Yahoo, Google, VeriSign, and IBM to join as corporate directors.

Inspired by these efforts, the growing Information Card community realized that to bring this metaphor to full fruition required taking the same step—coming together into a common organization that would unify our efforts to create an interoperable identity layer. From one perspective this could be looked at as completing the “third leg of the stool” of what is often called the Venn of Identity (SAML, OpenID, and Information Cards). But from another perspective, you can see it as one of the logical steps needed towards the cooperative convergence among identity systems and protocols that will be necessary to reach a ubiquitous Internet identity layer—the layer that completes the hat trick.

I’m curious to see what comes of this. There’s some big backing, and I’ve heard good things about Infocard from folks in the know. From an SemWebby perspective, this stuff just gives us another way to figure out the provenance of claim graphs representable in RDF, queryable in SPARQL. And presumably some more core schemas to play with…

Meanwhile in the mobile scene, a Symbian Foundation has been unveiled:

Industry leaders to unify the Symbian mobile platform and set it free
Foundation to be established to provide royalty-free open platform and accelerate innovation

The demand for converged mobile devices is accelerating. By 2010 we expect four billion people to have joined the global mobile conversation. For many of these people, their mobile will be their first Internet experience, not just their first camera, music player or phone.

Open software is the basic building block for delivering this future.

With this in mind, industry leaders are coming together to establish Symbian Foundation, to bring to life a shared vision and to create the most proven, open and complete mobile software platform – available for free. To achieve this, the foundation will unify Symbian, S60, UIQ and
MOAP(S) software to create an unparalleled open software platform for converged mobile devices, enabling the whole mobile ecosystem to accelerate innovation.

The foundation is expected to start operating during the first half of 2009. Membership of the foundation will be open to all organizations, for a low annual membership fee of US $1,500.

I’ll save my pennies for an iPhone. Everybody’s open nowadays, I guess that’s good…

Mashed remote contrib: BBC music genres meet last.fm (meets OAuth)

I’m not at the BBC’s 2008 hackday-like-event, Mashed. But here’s a quick hack based on the data the BBC audio and music team have made available. The data that caught my eye was “Genres for set of MusicBrainz Artists” based on editorial data entered for bbc.co.uk/music. This is a simple file:

0039c7ae-e1a7-4a7d-9b49-0cbc716821a6    Rock and Indie
003abc43-e2bb-40e5-a080-3c4b9e56ea63    Classical
0053dbd9-bfbc-4e38-9f08-66a27d914c38    Classic Pop and Rock

It maps a MusicBrainz artist ID (increasingly the defacto open standard for identifying artists, at least in popular western music) to a simple genre label.

I haven’t yet found corresponding pages on the BBC music site for each of these genres.

Since last.fm expose my last 12 month’s most commonly played artists for all to mock, it is quite easy to cross-reference these sources to get a summary of my alleged musical interests.

A commandline ruby script online for now:

Airbag:mashed danbri$ ruby lastfm-genres.rb
Classic Pop and Rock: 13
Rock and Indie: 17
Hip Hop; RnB and Dance Hall: 1
World: 1
Dance and Electronica: 12

It’s a while since I wrote any code, clearly: this should at least be sorted and trimmed to the top 3 or so. We’d need to look at a few people’s profiles to figure out the best approach to summarising someone’s interests, and a little thought is needed for representing this in RDF/FOAF.

Now where I see OAuth fitting into this picture is the “what do we do next” step. OAuth potentially addresses a problem we’ve had in the FOAF scene, whereby FOAF generators and adaptors produce a chunk of markup, but there’s no easy/natural way to post this back into the Web. I’m hoping that blogs and hosting sites will allow external FOAF sources (like this script) to update/augment the FOAF descriptions we host in our existing Web sites and profiles. I sent some notes on this to the OAuth list (albeit to a deafening silence).

See also:  mashed last.fm / bbc genres ruby script

AllegroGraph RDFStore 3.0: Social Network Analysis

AllegroGraph 3.0 now comes with a Social Network Analysis component, amongst several other interesting features including improved geo support.

example diagram of people and relationships

By viewing interactions as connections a in graph, we can treat a multitude of different situations using the tools of Social Network Analysis (SNA). SNA lets us answer questions like:

    • How closely connected are any two individuals?
    • What are the core groups or clusters within the data?
    • How important is this person (or company) to the flow of information?
    • How likely is it that this person and that person know one another?

The field is full of rich mathematical techniques and powerful algorithms. AllegroGraph’s SNA toolkit includes an array of search methods, tools for measuring centrality and importance, and the building blocks for creating more specialized measures. These tools can be used with any “network” data set whether its connections between companies and directors, predator/prey food webs, chemical interactions, or links between web sites.

Bruce Schneier: Our Data, Ourselves

Via Libby; Bruce Schneier on data:

In the information age, we all have a data shadow.

We leave data everywhere we go. It’s not just our bank accounts and stock portfolios, or our itemized bills, listing every credit card purchase and telephone call we make. It’s automatic road-toll collection systems, supermarket affinity cards, ATMs and so on.

It’s also our lives. Our love letters and friendly chat. Our personal e-mails and SMS messages. Our business plans, strategies and offhand conversations. Our political leanings and positions. And this is just the data we interact with. We all have shadow selves living in the data banks of hundreds of corporations’ information brokers — information about us that is both surprisingly personal and uncannily complete — except for the errors that you can neither see nor correct.

What happens to our data happens to ourselves.

This shadow self doesn’t just sit there: It’s constantly touched. It’s examined and judged. When we apply for a bank loan, it’s our data that determines whether or not we get it. When we try to board an airplane, it’s our data that determines how thoroughly we get searched — or whether we get to board at all. If the government wants to investigate us, they’re more likely to go through our data than they are to search our homes; for a lot of that data, they don’t even need a warrant.

Who controls our data controls our lives. [...]

Increasingly, we’re going to be seeing this data flow through protocols like OAuth. SemWeb people should get their heads around how this is likely to work. It’s rather likely we’ll see SPARQL data stores with non-public personal data flowing through them; what worries me is that there’s not yet any data management discipline on top of this that’ll help us keep track of who is allowed to see what, and which graphs should be deleted or refreshed at which times.

I recently transcribed some notes from a Robert Scoble post about Facebook and data portability into the FOAF wiki. In it, Scoble reported some comments from Dave Morin of Facebook, regardling data flow. Excerpts:

For instance, what if a user wants to delete his or her info off of Facebook. Today that’s possible. But what about in a really data portable world? After all, in such a world Facebook might have sprayed your email and other data to other social networks. What if those other social networks don’t want to delete your data after you asked Facebook to?

Another case: you want your closest Facebook friends to know your birthday, but not everyone else. How do you make your social network data portable, but make sure that your privacy is secured?

Another case? Which of your data is yours? Which belongs to your friends? And, which belongs to the social network itself? For instance, we can say that my photos that I put on Facebook are mine and that they should also be shared with, say, Flickr or SmugMug, right? How about the comments under those photos? The tags? The privacy data that was entered about them? The voting data? And other stuff that other users might have put onto those photos? Is all of that stuff supposed to be portable? (I’d argue no, cause how would a comment left by a Facebook user on Facebook be good on Flickr?) So, if you argue no, where is the line? And, even if we can all agree on where the line is, how do we get both Facebook and Flickr to build the APIs needed to make that happen?

I’d like to see SPARQL stores that can police their data access behaviour, with clarity for each data graph in the store about the contexts in which that data can be re-exposed, and the schedule by which the data should be refreshed or purged. Making it easy for data to flow is only half the problem…

WHY MIGHT CONNECTING WITH ZANDER JULES BE A GOOD IDEA?

Or: towards evidence-based ‘add a contact’ filtering…

This just in from LinkedIn:

Have a question? Zander Jules’s network will probably have an answer
You can use LinkedIn Answers to distribute your professional questions to Zander Jules and your extended network. You can get high-quality answers from experienced professionals.

Zander Jules requested to add you as a connection on LinkedIn:

Dan,

Dear
My name is Zander Jules a Banker and accountant with Bank Atlantique Cote Ivoire.I contacting u for a business transfer of a large sum of money from a dormant account. Though I know that a transaction of this magnitude will make any one apprehensive,
but I am assuring u all will be well at the end of the day.I am the personal accounts manager to Engr Frank Thompson, a National of ur country, who used to work with an oil servicing company here in Cote Ivoire. My client, his wife & their 3 children were involved in the ill fated Kenya Airways crash in the coasts of Abidjan in January 2000 in which all passengers on board died. Since then I have made several inquiries to ur embassy to locate any of my clients extended relatives but has been unsuccessful.After several attempts, I decided to trace his last name via internet,to see if I could locate any member of his
family hence I contacted u.Of particular interest is a huge deposit with our bank in our country,where the deceased has an account valued at about $16 million USD.They have issued me notice to provide the next of kin or our bank will declare the account unservisable and thereby send the funds to the bank treasury.Since I have been unsuccessful in locating the relatives for past 7 yrs now, I will seek ur consent to present you as the next of kin of the deceased since u have the same last names, so that the proceeds of this account valued at $16million USD can be paid to u and then u and I can share the money.All I require is your honest cooperation to enable us see this deal through. I guarantee that this will be executed under all legitimate arrangement that will protect you from any breach of the law. In your reply mail, I want you to give me your full names, address, D.O.B, tel& fax #.If you can handle this with me, reach me for more details.

Thanking u for ur coperation.
Regards,

I’m suprised we’ve not seen more of this, and sooner. Youtube contacts are pretty spammy, and twitter have also suffered. The other networks are relatively OK so far. But I don’t think they’re anything like as robust as they’ll need to get, particularly since a faked contact can get privileged access to personal details. Definitely an arms race…

Opening and closing like flowers (social platform roundupathon)

Closing some tabs…

Stephen Fry writing on ‘social network’ sites back in January (also in the Guardian):

…what an irony! For what is this much-trumpeted social networking but an escape back into that world of the closed online service of 15 or 20 years ago? Is it part of some deep human instinct that we take an organism as open and wild and free as the internet, and wish then to divide it into citadels, into closed-border republics and independent city states? The systole and diastole of history has us opening and closing like a flower: escaping our fortresses and enclosures into the open fields, and then building hedges, villages and cities in which to imprison ourselves again before repeating the process once more. The internet seems to be following this pattern.

How does this help us predict the Next Big Thing? That’s what everyone wants to know, if only because they want to make heaps of money from it. In 1999 Douglas Adams said: “Computer people are the last to guess what’s coming next. I mean, come on, they’re so astonished by the fact that the year 1999 is going to be followed by the year 2000 that it’s costing us billions to prepare for it.”

But let the rise of social networking alert you to the possibility that, even in the futuristic world of the net, the next big thing might just be a return to a made-over old thing.

McSweenys:

Dear Mr. Zuckerberg,

After checking many of the profiles on your website, I feel it is my duty to inform you that there are some serious errors present. [...]

Lest-we-forget. AOL search log privacy goofup from 2006:

No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men” to “dog that urinates on everything.”

And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.”

It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. “Those are my searches,” she said, after a reporter read part of the list to her.

Time magazine punditising on iGoogle, Facebook and OpenSocial:

Google, which makes its money on a free and open Web, was not happy with the Facebook platform. That’s because what happens on Facebook stays on Facebook. Google would much prefer that you come out and play on its platform — the wide-open Web. Don’t stay behind Facebook’s closed doors! Hie thee to the Web and start searching for things. That’s how Google makes its money.

So, last fall, Google rallied all the other major social networks (MySpace, Bebo, Hi5 and so on) and announced a new initiative called OpenSocial. OpenSocial wants to be like Facebook’s platform, only much bigger: Widget makers can write applications for it and they can run anywhere — on MySpace, Bebo and Google’s own social network, Orkut, which is very big in Brazil.

Google’s platform could actually dwarf Facebook — if it ever gets off the ground.

Meanwhile on the widget and webapp security front, we have “BBC exposes Facebook flaw” (information about your buddies is accessible to apps you install; information about you is accessible to apps they install). Also see Thomas Roessler’s comments to my Nokiana post for links to a couple of great presentations he made on widget security. This includes a big oopsie with the Google Mail widget for MacOSX. Over in Ars Technica we learn that KDE 4.1 alpha 1 now has improved widget powers, including “preliminary support for SuperKaramba and Mac OS X Dashboard widgets“. Wonder if I can read my Gmail there…

As Stephen Fry says,  these things are “opening and closing like a flower”. The big hosted social sites have a certain oversimplifying retardedness about them. But the ability for code to go visit data (the widget/gadget model), is I think as valid as the opendata model where data flows around to visit code. I am optimistic that good things will come out of this ferment.

A few weeks ago I had the pleasure of meeting several of the Google OpenSocial crew in London. They took my grumbling about accessibility issues pretty well, and I hope to continue that conversation. Industry politics and punditry aside, I’m impressed with their professionalism and with the tie-in to an opensource implementation through Apache’s ShinDig project. The OpenSocial specs list is open to the public, where Cassie has just announced that “all 0.8 opensocial and gadgets spec changes have been resolved” (after a heroic slog through the issue list). I’m barely tracking the detail of discussion there, things are moving fast. There’s now a proposed REST API, for example; and I learned in London about plans for a formatting/templating system, which might be one mechanism for getting FOAF/RDF out of OpenSocial containers.

If OpenSocial continues to grow and gather opensource mindshare, it’s possible Facebook will throw some chunks of their platform over the wall (ie. “do an Adobe“). And it’ll probably be left to W3C to clean up the ensuring mess and fragmentation, but I guess that’s what they’re there for. Meanwhile there’s plenty yet to be figured out, … I think we’re in a pre-standards experimentation phase, regardless of how stable or mature we’re told these platforms are.

The fundamental tension here is that we want open data, open platforms, … for data and code to flow freely, but to protect the privacy, lives and blushes of those it describes. A tricky balance. Don’t let anyone tell you it’s easy, that we’ve got it figured out, or that all we need to do is “tear down the walls”.

Opening and closing like flowers…

Videos from SemanticCamp Paris 1

Spotted on the websemantique list, a Youtube playlist of videos from SemanticCamp Paris 1,  16 février. I think they’ve just had a second SemanticCamp already, but these five videos are from the earlier event. Lots of FOAF, RDFa etc.

 L’objectif du SemanticCamp Paris est de créer les conditions pour que les développeurs, les étudiants, les managers et les chercheurs se rencontrent sur le thème du Web Sémantique.