2010-01-21

E-Blob and NoSQL options

Morton Tocker just recently wrote the blog article When Should You Store Objects in the Database where he talks about what Josh Berkus calls the 'E-blob antipattern'. There is much disagreement on whether this is a pattern or an antipattern.

This pattern is where you take all the information about something, and serialize it into a string, usually in JSON format, and then insert them into a very simple MySQL InnoDB table that contains only a primary key column and a "data" BLOB column to keep that serialized string in.

I can understand (and even make) the arguments pro and con.

What bugs me about this pattern is that you have basically created a document store or an object store, but with all the complexity and brittleness of running a SQL database that you can't run SQL queries against. The only reason to be using MySQL here is because your organization's IT infrastructure is already knows how to run and maintain MySQL servers.

If you are going to keep your data in semi-regular JSON format, instead consider actually running a real document store instead, such as CouchDB or MongoDB. You can keep the data in the same JSON documents, still do the fast primary key lookups, and you can do more sophisticated queries, plus take advantage of the scaling solutions provided by those systems.

If you dont need the document store features and just want an object store, then use an object store. Get the SQL parsing and serialization overhead out of the way. Consider using Tokyo Tyrant, or Memcached, or Redis, or a web server that handles PUT, or even just a bunch of NFS mounted volumes. If you are running in AWS, keep them in S3.

The libmemcached library has an API for bolting the memcached protocol onto any sort of object store implementation. It's brand new, and pretty buggy, but it is improving quickly, and soon may be THE way to access an object store.

2010-01-20

Machines Plus Minds - Welcome

Welcome!

I write about the techne that is the amazing machine we call "The Net".

At present, I am mostly interested in Memcached, the Drizzle DB fork of MySQL, NoSQL, and in open standards, but I will be writing about other stuff as well.

I used to do my public geek blogging in my personal LiveJournal, but for various reasons it makes sense to separate my public writings about technology, my other public writings, and writings that are of interest only to my closer friends and family.


My day job title is "Director of Community Development" for Gear6, which means that my main paid interest is memcached and the memcached community.  I read and write technical articles, research nosql databases and other web-scale open source software, and go to conferences to attend and to speak.