cvs2svn migration tips

Some advice from Garrett Rooney on Subversion. I was asking about moving the historical records for the FOAF project ( etc) from dusty old CVS into shiny new Subversion. In particular, from CVS where I own and control the box, to Subversion where I’m a non-root normal mortal user (ie. Dreamhost customer).

The document records are pretty simple (no branches etc.). In fact only the previous versions of the FOAF RDF namespace document are of any historical interest at all. But I wanted to make sure that I could get things out again easily, without owning the Svn repository or begging the Dreamhost sysadmins.

Answer: “…a slightly qualified yes. assuming the subversion server is running svn 1.4 or newer a non-root user can use the svnsync tool to copy the contents (i.e. entire history they have read access to) of one repository into a new one. with servers before 1.4 it’s still possible to extract the information, but it’s more difficult.

And finding the version number? “if they’re letting you access svn via http you can just browse a repository and by default it’ll tell you the server version in the html view of the repository“.

Easily done, we’re on 1.4.2. That’s good.

Q: Any recommendations for importing from CVS?
A: “Converting from cvs to svn is a hellish job, due to the amount of really necessary data that cvs doesn’t record. cvs2svn has to infer a lot of it, and it does a damn good job of it considering what a pain in the ass it is. I’m immediately skeptical of anyone who goes out and writes another conversion tool ;-)

If you don’t have the ability to load a dumpfile into the repository you can load it into a local repos and then user svnsync to copy that into a remote empty repository. svnsync unfortunately won’t let you load into a non-empty repository, if you want to do that you need to use a svnadmin load, which requires direct access to the repository. most hosting sites will give you some way to do that sort of thing though.

Thanks Garrett!

SPARQLing Protégé-OWL Jena integration

The Jena ARQ SPARQL engine has been very rapidly integrated into Protégé. Nice work from Holger Knublauch, and from Andy Seaborne who explained how Protégé’s native RDF Java structures could manifest themselves via Jena interfaces so that the ARP query engine could work against Protégé data. He also gave a handy overview of the ARP architecture, describing where it has dependencies on Jena, and how it could be attached to other RDF Java libraries instead.

The most amazing thing was how fast it all happened. As a protege-owl lurker, I had been following some discussions on RDF “named graphs”, and jumped in to suggest they take a look at SPARQL’s ability to query against such things.

From my original post

I’d also encourage you to take a look at the SPARQL work on RDF querying, if you haven’t already.

…to Holger’s “This is working indeed!” in less than a day. Holger summarises:

We now have an implementation that wraps a live Protege OWL triple store as a Jena Graph (and Model). This means that arbitrary Jena query services can be executed within Protege.

The relevant call is

OWLModel owlModel = ...; // Protege model
Model model = JenaModelFactory.createModel(owlModel); // Jena model

I also added a quick-and-dirty SPARQL query tab to Protege (see screenshot). This is extremely primitive yet, but hopefully useful on the long run. All this is on CVS and part of the next beta.

Here’s a thumbnail of the screenshot, linking to the full image:
Protégé screenshot showing a SPARQL query and a tabular resultset

I don’t see Andy’s explanation in the list archives, but it is quoted in full in Holger’s post, and is worth reading for those with an interest in Jena and ARQ.

There’s now a Jena Integration of Protege-OWL page explaining the details, and providing a diagram illustrating the integration architecture.

Jena protege integration architecture

The key to this integration is the fact that both systems operate on a low-level “triple” representation of the model. Protege has its native frame store mechanism, which has been wrapped in Protege-OWL with the TripleStore classes. In the Jena world, the corresponding interfaces are called Graph and Model. The Protege TripleStore has been wrapped into a Jena Graph, so that any read access from the Jena API in fact operates on the Protege triples. In order to modify these triples, the conventional Protege-OWL API must be used. However, this mechanisms allows to use Jena methods for querying while the ontology is edited inside Protege.

The details can be explored in CVS, for example see the new SPARQLQueryResults class.