What kind of Semantic Web researcher are you?

It’s hard to keep secrets in today’s increasingly interconnected, networked world. Social network megasites, mobile phones, webcams and  inter-site syndication can broadcast and amplify the slightest fragment of information. Data linking and interpretation tools can put these fragments together, to paint a detailed picture of your life, both online and off.

This online richness creates offline risk. For example, if you’re going away on holiday, there are hundreds of ways in which potential thieves could learn that your home is vacant and therefore a target for crime: shared calendars, twittered comments from friends or family, flickr’d photographs. Any of these could reveal that your home and possessions sit unwatched, unguarded, presenting an easy target for criminals.

Q: What research challenge does this present to the Semantic Web community? How can we address the concern that Semantic and Social Web technology have more to offer Burglar Bill than to his victims?

A1: We need better technology for limiting the flow of data, proving a right to legitimate access to information, cross-site protocols for deleting leaked or retracted data that flows between sites, and calculating trust metrics for parties requesting data access.

A2: We need to find ways to reconnect people with their neighbours and neighbourhoods, so that homes don’t sit unwatched when their occupants are away.

ps. Dear Bill, I have my iphone, laptop, piggy bank and camera with me…

NoTube scenario: Facebooks groups and TV recommendation

Short version: If the Web knows I like a TV show, why can’t my TV be more useful?

So I have just joined a Facebook group, “Spaced Appreciation Society“:

Basic Info
Type: Common Interest – Pets & Animals
Description: If you’ve ever watched (and therefore loved) the TV series Spaced, then come and pay homage to the great Simon Pegg and Jess Stevenson. “You f’ing plum”
Contact Details
Website: http://www.spaced-out.org.uk/
Location: Meteor Street

That URL is (as with many of these groups) from a site whose primary topic is the thing the group’s about. In this case, about a TV show. It’s even in the public page for that group:

<tr><td class=”label”>Website:</td>
<td class=”data”><div class=”datawrap”><a href=”http://www.spaced-out.org.uk/” onmousedown=”return wait_for_load(this, event, function() { UntrustedLink.bootstrap($(this), &quot;&quot;, event) });” target=”_blank” rel=”nofollow”>http://www.spaced-out.org.uk/</a></div></td></tr>

If I search Google (Yahoo BOSS might be wiser, they have APIs) with:

link:http://www.spaced-out.org.uk/ site:wikipedia.org

It finds me:

http://en.wikipedia.org/wiki/Spaced

Although “link:http://www.spaced-out.org.uk/ site:dbpedia.org” doesn’t find anything, some URL rewriting gets me to:

http://dbpedia.org/page/Spaced

“Spaced is a British television situation comedy written by and starring Simon Pegg and Jessica Stevenson, and directed by Edgar Wright. It is noted for its rapid-fire editing, frequent dropping of pop-culture references, and occasional displays of surrealism. Two series of seven episodes were broadcast in 1999 and 2001 on Channel 4.”

dbpedia-owl:author
* dbpedia:Jessica_Hynes
* dbpedia:Simon_Pegg

dbpedia-owl:completionDate
* 2001-04-13 (xsd:date)

dbpedia-owl:director
* dbpedia:Edgar_Wright

dbpedia-owl:episodenumber
* 14

dbpedia-owl:executiveproducer
* dbpedia:Humphrey_Barclay

dbpedia-owl:genre
* dbpedia:Situation_comedy

dbpedia-owl:language
* dbpedia:English_language

dbpedia-owl:network
* dbpedia:Channel_4

dbpedia-owl:producer
* dbpedia:Gareth_Edwards
* dbpedia:Nira_Park

dbpedia-owl:releaseDate
* 1999-09-24 (xsd:date)

dbpedia-owl:runtime
* 24

dbpedia-owl:starring
* dbpedia:Jessica_Hynes
* dbpedia:Simon_Pegg

There are also links from here to Cyc (but an incorrect match) and to Freebase (to http://www.freebase.com/view/en/spaced).

Unfortunately, the Wikipedia “external links” section, with the URL for http://www.spaced-out.org.uk/ (marked “offical, fan-operated site” is not part of the DBpedia RDF export. I guess as it is not in an infobox. Extracting these external-link URLs at least for the TV, Actor and Movie related sections of Wikipedia might be worthwhile. And DBpedia would be useful for identifying the relevant subset to re-extract.

This idea of using such URLs as keys into Wikipedia/dbpedia data would also work with Identi.ca groups and others. In fact the matching might be easier in Identi.ca – I’m not sure how the Facebook APIs expose this stuff.

Anyway, if a show is about to be broadcast that includes eg. an interview with dbpedia:Jessica_Hynes or dbpedia:Simon_Pegg I’d like to hear about it.

So… is there any way I can use BBC’s /programmes to get upcoming information about who will be on the radio or telly, in a way that could be matched against dbpedia URIs?

Edit: I should’ve mentioned that Facebook in particular also has a more explicit “is a fan of” construct, with Products, Celebs, TV shows and Stores as types of thing you can be a fan of. Furthermore these show up on your public page, eg. here’s mine. I’m certainly interested in using that data, but also in a model that uses  general groups, since it is applicable to other sites that allow a group to indicate itself with a topical URL.

Twitter Iran RT chaos

From Twitter in the last few minutes, a chaos of echo’d posts about army moves. Just a few excerpts here by copy/paste, mostly without the all-important timestamps. Without tools to trace reports to their source, to claims about their source from credible intermediaries, or evidence, this isn’t directly useful. Even grassroots journalists needs evidence. I wonder how Witness and Identi.ca fit into all this. I was thinking today about an “(person) X claims (person) Y knows about (topic) Z” notation, perhaps built from FOAF+SKOS. But looking at this “Army moving in…” claim, I think something couched in terms of positive claims (along lines of the old OpenID showcase site Jyte) might be more appropriate.

The following is from my copy/paste from Twitter a few minutes ago. It gives a flavour of the chaos. Note also that observations from very popular users (such as stephenfry) can echo around for hours, often chased by attempts at clarification from others.

(“RT” is Twitter notation for re-tweet, meaning that the following content is redistributed, often in abbreviated or summarised form)

plotbunnytiff: RT @suffolkinace: RT From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
r0ckH0pp3r: RT .@AliAkbar: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection
jax3417: RT @ktyladie: RT @GennX: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection #iran
ktladie: RT @GennX: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection #iran
MellissaTweets: RT @AliAkbar: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection
GennX: RT @MelissaTweets: RT @AliAkbar: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection

The above all arrived at around the same time, and cite two prior “sources”:

suffolkinnace: RT From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection   18 minutes ago from web

Who is this? Nobody knows of course, but there’s a twitter bio:

http://twitter.com/suffolkinace # Bio Some-to-be Royal Military Policeman in the British Army. Also a massive Xbox geek and part-time comedian

The other “source” seems to be http://twitter.com/AliAkbar
AliAkbar: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection
about 1 hour ago from web
url http://republicmodern.com

This leads us to   http://republicmodern.com/about where we’re told
“Ali Akbar is the founder and president of Republic Modern Media. A conservative blogger, he is a contributor to Right Wing News, Hip Hop Republican, and co-host of The American Resolve online radio show. He was also the editor-in-chief of Blogs for McCain.”

I should also mention that a convention emerged in the last day two replace the names of specific local Twitter users in Tehran with a generic “from Iran”, to avoid getting anyone into trouble. Which makes plenty of sense, but without any in the middle vouching for sources makes it even harder to know which reports to take seriously.
More… back to twitter search, what’s happened since I started this post?

http://twitter.com/#search?q=iranelection%20army

badmsm: RT @dpbkmb @judyrey: RT From Iran: CONFIRMED!! Army moving into Tehran against protesters! PLZ RT! URGENT! #IranElection #gr88
SimaoC: RT @parizot: CONFIRMÉ! L’armée se dirige vers Téhéran contre les manifestants! #IranElection #gr88
SpanishClash: RT @mytweetnickname: RT From Iran:ARMY movement NOT confirmed in last 2:15, plz RT this until confrmed #IranElection #gr88
artzoom: RT @matyasgabor @humberto2210: RT CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! #IranElection #iranrevolution
sjohnson301: RT @RonnyPohl From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection #iran9
dauni: RT @withoutfield: RT: @tspe: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
interdigi: RT @ivanpinozas From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
PersianJustice: Once again, stop RT army movements until source INSIDE Iran verifies! Paramilitary is the threat anyway. #iranelection #gr88
Klungtveit Anyone: What’s the origin of reports of “army moving in” on protesters? #iranelection
Eruethemar: RT @brianlltdhq: RT @lumpuckaroo: Only IRG moving, not national ARMY… this is confirmed for real #IranElection #gr88
SAbbasRaza: RT @bymelissa: RT @alexlobov: RT From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
timnilsson: RT @Iridium24: CONFIRMED!! Army moving into Tehran against protesters! PLEASE RT! URGENT! #IranElection
edmontalvo: RT @jasona: RT @Marble68: RT From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
stevelabate: RT army moving into Tehran against protesters. Please RT. #iranelection
ivanpinozas: From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection
bschh: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection (via @dlayphoto)
dlayphoto: RT From Iran: CONFIRMED!! Army moving into Tehran against protestors! PLEASE RT! URGENT! #IranElection

In short … chaos!

Is this just a social / information problem, or can different tooling and technology help filter out what on earth is happening?

goo go opensocial

The Japanese portal / search engine goo, have gone live with their Shindig-based OpenSocial container. See example user page, goo labs site, developer’s kitchen and documentation (in Japanese). See also announcement from Eiji Kitamura on the shindig (Apache opensocial) list.

Rick Jelliffe on XML Schema

From the TAG list:

XML Schemas is like using a Swiss Army knife to cook with. Most Asian kitchens get by with a handful of simple tools: chopsticks, hatchet, a good knife, perhaps even a spoon. But the logic of  the XSD WG is “Oh, the French need to make quenelles, we must have a quenelling spoon as a grave matter of Internationalization because it is not our business to judge what people need… as long it is more stuff.”    So XSD 1.1 welds another Swiss Army knife onto the existing one, so that no kitchen should suffer without a quenelling spoon.

See also earlier comments on the Schema Experience Workshop from W3C.

So tool-makers blame users for generating non-standard schemas, and users blame the spec for being to difficult to know whether their schemas are standard or not, and spec makers blame tool makers for not implementing the spec properly. Who will free us from this cycle of sin and death?

[…] The only way that XML Schemas can be refactored is with a different core XML Schemas working group. My current expectation is that a lot of nothing will happen until XQuery/XSLT2 becomes seen as a more central technology than XML Schemas; the goal will then be how to support XQuery most minimally.

XSD doesn’t trouble me as much as it troubles Rick, but I have long sympathised with the approach he advocates with Schematron. The RDF equivalent of this is the approach Libby and I called “Schemarama”, expressing constraints against RDF instance data using queries. See original 2001 demo using SquishQL, and a later reworking by Alistair Miles using SPARQL (currently offline?). Recent work from the OWL experts at Clark & Parsia (blog post; another blog post) is heading in the same direction. I wonder whether Rick’s observation about XML applies to RDF too, and that at some point, SPARQL querying facilities will be so ubiquitous in RDF tools that it becomes second nature to apply it to data checking tasks too…?

Update: see also SpinRDF from Holger & co. at Top Quadrant

Site recovery

Busy sysadmin week. The main FOAF site is back, now hosted on Amazon EC2. Thanks to Stephane Corlosquet for all the time he spent fixing up the Drupal installation, after the recent server compromise. I’ve also moved over danbri.org (well, DNS is propagating), and migrated my blog into a completely fresh WordPress installation. The FOAF namespace site and Subversion server are safe, and not yet migrated to new hosting. Various documents from danbri.org are still offline while I scrub all the HTML, .js, .php etc for mischief. The old rdfweb.org site is also offline. I’d rather move slowly and carefully than mess up this process. This is a test post from the new WordPress to see if it works. Note that I’ve stripped all plugins and addons and will be much more conservative with trying extensions in the future. In particular, OpenID-based commenting isn’t working right now, but it’s on the todo list. One of the most disconcerting things about being hacked is when the site is also your OpenID. I’m wondering how to better partition things in the future; perhaps using id.danbri.org might give some more options?

The House that Jack Built

<Farmer> <sowed> <Corn> <kept> <Cock> <woke> <Priest> <married> <Man> <kissed> <Maiden> <milked> <Cow> <tossed> <Dog> <worried> <Cat> <killed> <Rat> <ate> <Malt> <in> <House> <builtBy> <Person foaf:name=”Jack” /> </builtBy> </House> </in> </Malt> </ate> </Rat> </killed> </Cat> </worried> </Dog> </tossed> </Cow> </milked> </Maiden> </kissed> </Man> </married> </Priest> </woke> </Cock> </kept> </Corn> </sowed> </Farmer>

FOAF super-connectivity daydreams from 2002.

“indirectly inspired by John Pilger’s ‘The New Rulers of the World’, FOAFCorp and www.theyrule.net/

This is the man, all tattered and torn, ...

Hal Varian on information sharing

Back in the early days of the Web, every document had at the bottom, “Copyright 1997. Do not redistribute.” Now every document has at the bottom, “Copyright 2008. Click here to send to your friends.”

–from an interview with Google’s chief economist, via flowingdata.org

Flickr & MusicBrainz Machine tags: If you’ve got it, flaunt it

From Sander van Zoest at Uncensored Interview, a convention for representing MusicBrainz identifiers using Flickr’s Machine Tag mechanism.

Example:

A photo of Matthew Dear, tagged as follows:

It also includes a Wikipedia identifier which could be used to link to DBpedia (though this might duplicate information also available within MusicBrainz’s advanced relationships system). There must be many 1000s of artist photos on Flickr, perhaps we’ll see tools to improve their tagging so they can be re-used more easily…

Nearby: Matthew Dear in DBpedia(RDF), in Freebase (RDF), …

On the internet, no-one knows.

“Because most of the targeted employees were male between the ages of 20 and 40 we decided that it would be best to become a very attractive 28 year old female. We found a fitting photograph by searching google images and used that photograph for our fake Facebook profile. We also populated the profile with information about our experiences at work by using combined stories that we collected from real employee facebook profiles.” […]

Skosdex progress: basic lucene search

I now have a crude Lucene index derrived from SKOS data. It is more or less a toy example, but somehow promising also.

Example below is a test against FAO‘s AGROVOC. Each concept becomes a “document”, with a “word” field containing the prefLabel, and a “uri” field for the concept URI. I don’t index anything else yet.

The hope here is to have a handy prototyping environment for testing different indexing regimes. The code takes about 4-5 mins to index AGROVOC on my MacBook, running under Jruby.

The data I’m using is a SKOS dump from the FAO Web site, post-processed with “grep -v” to skip the Farsi lines, due to a Unicode error. The transcript below comes from running Lucli, a handy command line tool for Lucene.

Next steps with indexing? Not sure. Probably make sure altLabel is handled. But I’m also curious about possibility of including fields that pull in labels from nearby concepts, so they can be matched in weighted searches. Would be hard to evaluate the effectiveness though.

lucli> search uri:”http://www.fao.org/aims/aos/agrovoc#c_47934″
Searching for: uri:”http www.fao.org aims aos agrovoc c_47934″
1 total matching documents
————————————–
—————- 1 score:1.0———————
word:Pteria hirundo
uri:http://www.fao.org/aims/aos/agrovoc#c_47934
#################################################
lucli> search word:”Leiocottus hirundo”
Searching for: word:”leiocottus hirundo”
1 total matching documents
————————————–
—————- 1 score:1.0———————
word:Leiocottus hirundo
uri:http://www.fao.org/aims/aos/agrovoc#c_45393
#################################################
lucli> search word:”hirundo”
Searching for: word:hirundo
2 total matching documents
————————————–
—————- 1 score:1.0———————
word:Pteria hirundo
uri:http://www.fao.org/aims/aos/agrovoc#c_47934
—————- 2 score:1.0———————
word:Leiocottus hirundo
uri:http://www.fao.org/aims/aos/agrovoc#c_45393
#################################################

Facebook problem statement

People want full ownership and control of their information so they can turn off access to it at any time. At the same time, people also want to be able to bring the information others have shared with them—like email addresses, phone numbers, photos and so on—to other services and grant those services access to those people’s information. These two positions are at odds with each other. There is no system today that enables me to share my email address with you and then simultaneously lets me control who you share it with and also lets you control what services you share it with.
“On Facebook, People Own and Control Their Information”, Mark Zuckerberg, Facebook blog.