Yahoo: RDF and the Monkey

From the Yahoo developer network blog,

Besides the existing support for microformats, we have already shared our plans for supporting other standards for embedding metadata into HTML. Today we are announcing the availability of eRDF metadata for SearchMonkey applications, which will soon be followed by support for RDFa. SearchMonkey applications can make direct use of the eRDF data by choosing the com.yahoo.rdf.erdf data source, while RDFa data will appear under com.yahoo.rdf.rdfa. Nothing changes in the way applications are created: as SearchMonkey applications have already been built on a triple-based model, the same applications can work on both microformat, eRDF or RDFa data.

Very cool. Good news for microformats, good news for RDF. Now to find which spam-trap my SearchMonkey account info got lost in…

Obama Web jobs in Boston

How could this not be a fun way to spend 6 months?

Obama for America is looking for exceptionally talented web developers who want to play a key role in a historic political campaign and help elect Barack Obama as the next President of the United States.

This six-month opportunity will allow you to:

  • Create software tools which will enable an unprecedented nationwide voter contact and mobilization effort
  • Help build and run the largest online, grassroots fundraising operation in the history of American politics
  • Introduce cutting-edge social networking and online organizing to the democratic process by empowering everyday people to participate on My.BarackObama

They also have a security expert position open.

Successful candidates will join the development team in Boston, MA.

Almost makes me wish I was a US Citizen. Sorry ma’am

Job ad: The National Center for Biomedical Ontology (NCBO)

Interesting applied SemWeb job, from public-semweb-lifesci:

The National Center for Biomedical Ontology (NCBO) is one of the seven National Centers for Biomedical Computing supported by the NIH Roadmap.  The NCBO is administered at Stanford University, with partners at the Mayo Clinic, the University at Buffalo, the University of Victoria, and UCSF.  The Center provides national technological infrastructure to support the creation, dissemination, and management of biomedical information and knowledge in machine-processable form.

The laboratory of Dr. Mark Musen, principal investigator of the NCBO, is seeking a highly motivated and independent post-doctoral trainee to conduct research projects at the interface of the life sciences and the Semantic Web.  The post-doc will be involved in ongoing collaborative work that concerns archiving, querying, and reasoning about biological data over the Web.

See Mark Musen’s post for full text. I guess the majority of likely candidates will already be on public-semweb-lifesci, but I thought I’d air this more widely just in case. BTW I don’t have any further information than offered here, except to say it seems like a great project to be involved in.

Map-reduce-merge and Hadoop/Hbase RDF

 Just found this interesting presentation,

Map-Reduce-Merge:  Simpli?ed Relational  Data Processing on  Large Clusters
by Hung-chih Yang, Ali Dasdan Ruey-Lung Hsiao, D. Stott Parker; as presented by Nate Rober  (PDF)

Excerpts:

Extending MapReduce
1. Change to reduce phase
2. Merge phase
3. Additional user-de?nable operations
a. partition selector
b. processor
c. merger
d. con?gurable iterators

Implementing Relational Algebra Operations
1. Projection
2. Aggregation
3. Selection
4. Set Operations: Union, Intersection, Difference
5. Cartesian Product
6. Rename
7. Join

[for more detail see full slides]

Conclusion
MapReduce & GFS represent a paradigm shift in data processing: use a simpli?ed interface instead of overly general DBMS.
Map-Reduce-Merge adds the ability to execute arbitrary relational algebra queries.
Next steps: develop SQL-like interface and  a query optimizer.

Research paper: Map-reduce-merge: simplified relational data processing on large clusters (PDF for ACM people)

Linked from HRDF page in the Hadoop wiki, where there appears to be a proposal brewing to build an RDF store on top of the Hadoop/Hbase infrastructure.

Nearby: LargeTripleStores in ESW wiki

Not entirely unrelated: Google Social Graph API  (which parsers FOAF/RDF from ‘The Web’ but discards all but the social graph parts currently)

Geographic Queries on Google App Engine

Much cleverness:

In this way, I was able to put together a geographic bounding box query, on top of Google App Engine, using a Geohash-like algorithm as a storage format, and use that query to power a FeatureServer Demo App Engine application, doing geographic queries of non-point features on top of App Engine/BigTable. Simply create a Geoindex object of the bounding box of your feature, and then use lower-left/upper-right points as bounds for your Geohash when querying.

Underground Victorian street? in Bristol?

Hidden away beneath Lawrence Hill lies a secret world – an underground Victorian street stretching from Ducie Road to the Packhouse pub.

Local historian Dave Stephenson finds out more.

For many years I had heard tales of a Victorian street abandoned beneath busy Lawrence Hill. To add substance to the legend there were people who claimed to have seen it when they were young.

They told me that this underground street stretched from Ducie Road, near the now closed Earl Russel pub, to the Packhorse pub, in tunnels under the road.

All had Victorian shop fronts, some still with their glass intact, and several street lamps still hung on the walls.

From the Underground Bristol forum. Rather vaguely attributed to  ‘a Bristol Newspaper – July 2007′; perhaps the Evening Post, though it would apparently cost a search fee to find out. The quoted article goes on to say that the underground street is pretty messed up now, with rubbish, and has been looted quite comprehensively.

Bruce Schneier: Our Data, Ourselves

Via Libby; Bruce Schneier on data:

In the information age, we all have a data shadow.

We leave data everywhere we go. It’s not just our bank accounts and stock portfolios, or our itemized bills, listing every credit card purchase and telephone call we make. It’s automatic road-toll collection systems, supermarket affinity cards, ATMs and so on.

It’s also our lives. Our love letters and friendly chat. Our personal e-mails and SMS messages. Our business plans, strategies and offhand conversations. Our political leanings and positions. And this is just the data we interact with. We all have shadow selves living in the data banks of hundreds of corporations’ information brokers — information about us that is both surprisingly personal and uncannily complete — except for the errors that you can neither see nor correct.

What happens to our data happens to ourselves.

This shadow self doesn’t just sit there: It’s constantly touched. It’s examined and judged. When we apply for a bank loan, it’s our data that determines whether or not we get it. When we try to board an airplane, it’s our data that determines how thoroughly we get searched — or whether we get to board at all. If the government wants to investigate us, they’re more likely to go through our data than they are to search our homes; for a lot of that data, they don’t even need a warrant.

Who controls our data controls our lives. [...]

Increasingly, we’re going to be seeing this data flow through protocols like OAuth. SemWeb people should get their heads around how this is likely to work. It’s rather likely we’ll see SPARQL data stores with non-public personal data flowing through them; what worries me is that there’s not yet any data management discipline on top of this that’ll help us keep track of who is allowed to see what, and which graphs should be deleted or refreshed at which times.

I recently transcribed some notes from a Robert Scoble post about Facebook and data portability into the FOAF wiki. In it, Scoble reported some comments from Dave Morin of Facebook, regardling data flow. Excerpts:

For instance, what if a user wants to delete his or her info off of Facebook. Today that’s possible. But what about in a really data portable world? After all, in such a world Facebook might have sprayed your email and other data to other social networks. What if those other social networks don’t want to delete your data after you asked Facebook to?

Another case: you want your closest Facebook friends to know your birthday, but not everyone else. How do you make your social network data portable, but make sure that your privacy is secured?

Another case? Which of your data is yours? Which belongs to your friends? And, which belongs to the social network itself? For instance, we can say that my photos that I put on Facebook are mine and that they should also be shared with, say, Flickr or SmugMug, right? How about the comments under those photos? The tags? The privacy data that was entered about them? The voting data? And other stuff that other users might have put onto those photos? Is all of that stuff supposed to be portable? (I’d argue no, cause how would a comment left by a Facebook user on Facebook be good on Flickr?) So, if you argue no, where is the line? And, even if we can all agree on where the line is, how do we get both Facebook and Flickr to build the APIs needed to make that happen?

I’d like to see SPARQL stores that can police their data access behaviour, with clarity for each data graph in the store about the contexts in which that data can be re-exposed, and the schedule by which the data should be refreshed or purged. Making it easy for data to flow is only half the problem…

OAuth support in Google Accounts and Contacts API

From Wei on the oauth list:

We are happy to announce that the Google Contacts Data API now supports OAuth. This is our first step towards OAuth enabling all Google Data APIs. Please note that this is an alpha release and we may make changes to the protocol before the official release.

See announcement thread for endpoint URL details, supporting tools and implentation discussion.

For more on the Contacts API, see the developer’s guide (although bear in mind it may not be up to date w.r.t. oauth).