Ruby dup() and clone() with frozen strings

I’m exhuming some 5-year old Ruby RDF code, and in the process finding a few things got broken while I was in the time capsule. Here in the shiny future, I found myself hitting an unfamiliar “can’t modify frozen string” error. It all worked just fine back in the hazy summer of 2002, I’m sure. I think. Back then SPARQL didn’t exist, Redland’s rapper utility was called rdfdump, the RDFCore syntax cleanup wasn’t finished, Ruby was more permissive about good parenthesis habits, and so on. But that’s another pile of trouble. This frozen string thing was a bit puzzling, but I got to learn about dup() and clone() at least.

On some investigation, it seems the string in question is being passed in from the command line, and apparently the contents of ARGV are frozen. My first guess was that I’d just clone the string object in question, and all would be well. After some fiddling around, I learned the difference between clone() and dup(): the latter doesn’t preserve frozen-ness: I wanted dup(). All seems well. Back to fixing the real problems; details below for the curious, and as a memory jog next time I run into this.

Maybe next time I emerge from the time capsule, Ruby will be fully Unicode happy?

 a="foo"
=> "foo"

a.frozen?
=> false

a.freeze
=> "foo"

b=a.clone
=> "foo"

a.frozen?
=> true

b=a.dup
=> "foo"

b.frozen?
=> false

c=a.clone
=> "foo"
c.frozen?
=> true

Loosly joined

find . -name danbri-\*.rdf -exec rapper –count {} \;


rapper: Parsing file ./facebook/danbri-fb.rdf
rapper: Parsing returned 2155 statements
rapper: Parsing file ./orkut/danbri-orkut.rdf
rapper: Parsing returned 848 statements
rapper: Parsing file ./dopplr/danbri-dopplr.rdf
rapper: Parsing returned 346 statements
rapper: Parsing file ./tribe.net/danbri-tribe.rdf
rapper: Parsing returned 71 statements
rapper: Parsing file ./my.opera.com/danbri-opera.rdf
rapper: Parsing returned 123 statements
rapper: Parsing file ./advogato/danbri-advogato.rdf
rapper: Parsing returned 18 statements
rapper: Parsing file ./livejournal/danbri-livejournal.rdf
rapper: Parsing returned 139 statements

I can run little queries against various descriptions of me and my friends, extracted from places in the Web where we hang out.

Since we’re not yet in the shiny OpenID future, I’m matching people only on name (and setting up the myfb: etc prefixes to point to the relevant RDF files). I should probably take more care around xml:lang, to make sure things match. But this was just a rough test…


SELECT DISTINCT ?n
FROM myfb:
FROM myorkut:
FROM dopplr:
WHERE {
GRAPH myfb: {[ a :Person; :name ?n; :depiction ?img ]}
GRAPH myorkut: {[ a :Person; :name ?n; :mbox_sha1sum ?hash ]}
GRAPH dopplr: {[ a :Person; :name ?n; :img ?i2]}
}

…finds 12 names in common across Facebook, Orkut and Dopplr networks. Between Facebook and Orkut, 46 names. Facebook and Dopplr: 34. Dopplr and Orkut: 17 in common. Haven’t tried the others yet, nor generated RDF for IM and Flickr, which I probably have used more than any of these sites. The Facebook data was exported using the app I described recently; the Orkut data was done via the CSV format dumps they expose (non-mechanisable since they use a CAPCHA), while the Dopplr list was generated with a few lines of Ruby and their draft API: I list as foaf:knows pairs of people who reciprocally share their travel plans. Tribe.net, LiveJournal, my.opera.com and Advogato expose RDF/FOAF directly. Re Orkut, I noticed that they now have the option to flood your GTalk Jabber/XMPP account roster with everyone you know on Orkut. Not sure the wisdom of actually doing so (but I’ll try it), but it is worth noting that this quietly bridges a large ‘social network ing’ site with an open standards-based toolset.

For the record, the names common to my Dopplr, Facebook and Orkut accounts were: Liz Turner, Tom Heath, Rohit Khare, Edd Dumbill, Robin Berjon, Libby Miller, Brian Kelly, Matt Biddulph, Danny Ayers, Jeff Barr, Dave Beckett, Mark Baker. If I keep adding to the query for each other site, presumably the only person in common across all accounts will be …. me.