Cross-browsing and RDF

Cross-browsing and RDF

While cross-searching has been described and demonstrated through this paper and associated work, the problem of cross-browsing a selection of subject gateways has not been addressed. Many gateway users prefer to browse, rather than search. Though browsing usually takes longer than searching, it can be more thorough, as it is not dependent on the users terms matching keywords in resource descriptions (even when a thesaurus is used, it is possible for resources to be “missed” if they are not described in great detail).

As a “quick fix”, a group of gateways may create a higher level menu that points to the various browsable menus amongst the gateways. However, this would not be a truly hierarchical menu system, as some gateways maintain browsable resource menus in the same atomic (or lowest level) subject area. One method of enabling cross-browsing is by the use of RDF.

The World Wide Web Consortium has recently published a preliminary draft specification for the Resource Description Framework (RDF). RDF is intended to provide a common framework for the exchange of machine-understandable information on the Web. The specification provides an abstract model for representing arbitrarily complex statements about networked resources, as well as a concrete XML-based syntax for representing these statements in textual form. RDF relies heavily on the notion of standard vocabularies, and work is in progress on a ‘schema’ mechanism that will allow user communities to express their own vocabularies and classification schemes within the RDF model.

RDF’s main contribution may be in the area of cross-browsing rather than cross-searching, which is the focus of the CIP. RDF promises to deliver a much-needed standard mechanism that will support cross-service browsing of highly-organised resources. There are many networked services available which have classified their resources using formal systems like MeSH or UDC. If these services were to each make an RDF description of their collection available, it would be possible to build hierarchical ‘views’ of the distributed services offering a user interface organised by subject-classification rather than by physical location of the resource.

From Cross-Searching Subject Gateways, The Query Routing and Forward Knowledge Approach, Kirriemuir et. al., D-Lib Magazine, January 1998.

I wrote this over 11 (eleven) years ago, as something of an aside during a larger paper on metadata for distributed search. While we are making progress towards such goals, especially with regard to cross-referenced descriptions of identifiable things (ie. the advances made through linked data techniques lately), the pace of progress can be quite frustrating. Just as it seems like we’re making progress, things take a step backwards. For example, the wonderful lcsh.info site is currently offline while the relevant teams at the Library of Congress figure out how best to proceed. It’s also ten years since Charlotte Jenkins published some great work on auto-classification that used OCLC’s Dewey Decimal Classification. That work also ran into problems, since DDC wasn’t freely available for use in such applications. In the current climate, with Creative Commons, Open source, Web 2.0 and suchlike the rage, I hope we’ll finally see more thesaurus and classification systems opened up (eg. with SKOS) and fully linked into the Web. Maybe by 2019 the Web really will be properly cross-referenced…

OpenID – a clash of expectations?

Via Dan Connolly, this from the mod_auth_openid FAQ:

Q: Is it possible to limit login to some users, like htaccess/htpasswd does?

A: No. It is possible to limit authentication to certain identity providers (by using AuthOpenIDDistrusted and AuthOpenIDTrusted, see the main page for more info). If you want to restrict to specific users that span multiple identity providers, then OpenID probably isn’t the authentication method you want. Note that you can always do whatever vetting you want using the REMOTE_USER CGI environment variable after a user authenticates.

Funny, this is just what I thought was most interesting about OpenID: it lets you build sites where you can offer a varying experiences (including letting them in or not) to differ users based on what you know about them. OpenID itself doesn’t do everything out of the box, but by associating public URIs with people, it’s a very useful step.

A year ago I sketched a scenario in this vein (and it seems to have survived sanity check from Simon Willison, or at least he quotes it). It seems perhaps that OpenID is all things to all people…?

SKOS deployment stats from Sindice

This cropped up in yesterday’s W3C Semantic Web Coordination Group telecon, as we discussed the various measures of SKOS deployment success.

I suggested drawing a distinction between the use of SKOS to publish thesauri (ie. SKOS schemes), and the use of SKOS in RDFS/OWL schemas, for example subclassing of skos:Concept or defining properties whose range or domain are skos:Concept. A full treatment would look for a variety of constructs (eg. new properties that declare themselves subPropertyOf something in SKOS).

An example of such a use of SKOS is the new sioc:Category class, recently added to the SIOC namespace.

Here are some quick experiments with Sindice.

Search results for advanced “* <http://www.w3.org/2000/01/rdf-schema#domain> <http://www.w3.org/2004/02/skos/core#Concept>”, found 10

Search results for advanced “* <http://www.w3.org/2000/01/rdf-schema#range> <http://www.w3.org/2004/02/skos/core#Concept>”, found 10

Search results for advanced “* <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://www.w3.org/2004/02/skos/core#Concept>”, found 18

Here’s a query that finds all mentions of skos:Concept in an object role within an RDF statement:

Search results for advanced “* * <http://www.w3.org/2004/02/skos/core#Concept>”, found about 432.32 thousand

This all seems quite healthy, although I’ve not clicked through to explore many of these results yet.

BTW I also tried using the proposed (but retracted – see CR request notes) new SKOS namespace,
http://www.w3.org/2008/05/skos#Concept (unless I’m mistaken). I couldn’t find any data in Sindice yet that was using this namespace.

Semantic Web – Use it or lose it

ESWC2009 Semantic Web In Use track

While papers submitted to the scientific track may provide evidence of scientific contribution through applications and evaluations (see 4. and 5. of the conference Topics of Interest), papers submitted to the Semantic Web In Use Track should be organised around some of or all of the following aspects:- Description of concrete problems in specific application domains, for which Semantic Web technologies can provide a solution.

  • Description of concrete problems in specific application domains, for which Semantic Web technologies can provide a solution.
  • Description of an implemented application of Semantic Web technologies in a specific domain
  • Assessment of the pros and cons of using Semantic Web technologies to solve a particular business problem in a specific domain
  • Comparison with alternative or competing approaches using conventional or competing technologies
  • Assessment of the costs and benefits of the application of Semantic Web Technologies, e.g. time spent on implementation and deployment, efforts involved, user acceptance, returns on investment
  • Evidence of deployment of the application, and assessment/evaluation of usage/uptake.

One thing I would encourage here (in the tradition of the Journal of Negative Results), is that people remember that negative experience is still experience. While the SemWeb technology stack has much to recommend it, there are also many circumstances when it isn’t quite the right fit. Or when alternative SemWeb approaches (GRDDL, SQL2SPARQL, …) can bring similar advantages with lower costs. I would like to see some thoughtful and painfully honest writeups of cases where Semantic Web technologies haven’t quite worked out as planned. Technology projects fail all the time; there’s nothing to be ashamed of. But when it’s a technology project that uses standards and tools I’ve contributed to, I really want to know more about what went wrong, if anything went wrong. ESWC2009 seems a fine place to share experiences and to learn how to better use these technologies…

OpenID, OAuth UI and tool links

A quick link roundup:

From ‘Google OAuth & Federated Login Research‘:

“The following provides some guidelines for the user interface define of becoming an OAuth service provider”

Detailed notes on UI issues, with screenshots and links to related work (opensocial etc.).

Myspace’s OAuth Testing tool:

The MySpace OAuth tool creates examples to show external developers the correct format for constructing HTTP requests signed according to OAuth specifications

Google’s OAuth playground tool (link):

… to help developers cure their OAuth woes. You can use the Playground to help debug problems, check your own implementation, or experiment with the Google Data APIs.

If anyone figures out how to post files to Blogger via their AtomPub/OAuth API, please post a writeup! We should be able to use it to post RDFa/FOAF etc hopefully…

Yahoo’s OpenID usability research. Really good to see this made public, I hope others do likewise. There’s a summary page and a full report in PDF, “Yahoo! OpenID: One Key, Many Doors“.

Finally, what looks like an excellent set of introductory posts on OAuth: a Beginner’s Guide to OAuth from Eran Hammer-Lahav.

Drupal social/data Web developments

Dries Buytaert  on ‘Drupal, the Semantic Web, and search’.

Discusses RDFa, SearchMonkey and more. Great stuff! Excerpt:

This kind of technology is not limited to global search. On a social networking site built with Drupal, it opens up the possibility to do all sorts of deep social searches – searching by types and levels of relationships while simultaneously filtering by other criteria. I was talking with David Peterson the other day about this, and if Drupal core supported FOAF and SIOC out of the box, you could search within your network of friends or colleagues. This would be a fundamentally new way to take advantage of your network or significantly increase the relevance of certain searches.

Meanwhile, PHP Shindig (Apache’s OpenSocial widget container) has now been integrated into Drupal. Currently version 5 but things are moving to get it working in 6.x too. See docs (this link has some issues) and the announcement/discussion from shindig-dev. Also great news…

Problem statement

A Pew Research Center survey released a few days ago found that only half of Americans correctly know that Mr. Obama is a Christian. Meanwhile, 13 percent of registered voters say that he is a Muslim, compared with 12 percent in June and 10 percent in March.

More ominously, a rising share — now 16 percent — say they aren’t sure about his religion because they’ve heard “different things” about it.

When I’ve traveled around the country, particularly to my childhood home in rural Oregon, I’ve been struck by the number of people who ask something like: That Obama — is he really a Christian? Isn’t he a Muslim or something? Didn’t he take his oath of office on the Koran?

It was in the NYTimes, so it must be true. Will the last one to leave the Web please turn off the lights.