Friday, March 19, 2010

NoNoSQL

I'm really looking forward to seeing the "NoSQL" buzzword head over the top of the hype-cycle and start heading downwards, since I think it's really doing damage in what should be a fairly straightforward discussion of matching customer use cases to appropriate technology.

NoNoSQLBecause the term is framed ("No!") in opposition to the almost the entire family of existing data persistence technology, anyone who comes to the discussion fresh assumes there's a replacement process going on, wherein NoSQL stands in opposition to SQL.

That's a shame, because the term is only one letter away from a (slightly) less polarizing buzzword: NotSQL. Even then, though, the core discussion would be lost, because the big difference isn't programmatic API versus 4GL. The big difference is use case matching, in particular the high-volume, high-availability use case which has emerged in the age of consumer web services.

People with public-facing web applications face a potentially unconstrained read/write load (in their happiest dreams) and the techniques necessary to scale a traditional RDBMS to match that load proceed from the straightforward at the low end to the increasingly byzantine at the high end.

The scaling story for traditional RDBMS technology just is not great: start by adding servers and extra technology to hook them together; get increasingly smart people to handle your increasingly complex infrastructure; finally, start hacking at your data model to allow even further partitioning and duplication.

The new breed of databases have a great scaling story: once you get set up, scaling requires plugging in new nodes and turning them on. That's it. No model changes, no extra replication and high availability technology.

There's no free lunch though. In exchange for the high-throughput/high-availability you lose the expressiveness and power of SQL. Henceforth you will write your joins and summaries yourself, at the application level. Henceforth performing an ad hoc query may require you to build and populate a whole new "table" (the terminology is highly variable at this point) in your model. And of course this technology is all pretty fresh meat, so just learning enough to get started can be a bit of a slog – kids aren't exactly coming out of school with a course in this stuff under their belt.

So, it's a whole new world, and if you are planning on serving an application where the numbers (hits, pages, requests, whatever) are heading into the 7-digits, it might be good idea to start with this technology (can you tell I can't stand to use the "NoSQL" term? It is just too awful, there really needs to be a non-pejorative term for this application category).

Of course, there's nothing new under the sun. Getting the best performance for a specialized use case requires (a) modifying your data model and (b) using technology that can leverage your specialized data model. This is exactly what OLAP databases have been doing for a generation to provide data analysis on multi-billion record historical databases (special data model, special technology).

Database guru Michael Stonebraker wrote a nice article about the brave new world of databases, called "One Size Fits All: An Idea Whose Time has Come and Gone" in 2005, and the conclusion is that we are going to see increasing fragmentation of database technology based on use case. "NoSQL" (*shudder*) is just the latest iteration in this process.

Meanwhile, I'll put my oar in for the general purpose database: it's easy to run OLAP queries on a general purpose database, you just can't do it once your table size gets over a billion; it's easy to run a public web site on a general purpose database, you just can't do it once your load gets over a million. On the other hand it's well nigh impossible to run even a small web site on an OLAP database and pretty darn hard to build even a small OLAP system on a NoSQL foundation.

Horses for courses folks, horses for courses.

For more of this kind of geekery, see the several articles linked off of "The Case for the Bit Bucket" at the Oracle Nerd blog.
 

6 comments:

Chad B said...

It's nice to hear a sane argument for SQL. I've tried to build applications using AppEngine and BigTable and you just end up pushing a whole lot of complexity to the application level. That would be fine, except that there are no good frameworks or even books that will teach you how to deal with that.

A lot of the popular new tech for building web applications (dynamic languages, high-level frameworks, javascript widget libraries) are focused on trading off performance for better productivity and increasing levels of abstraction. "NoSQL" is the exact opposite.

The technology has it's place, but starting a new project with it is likely premature optimization.

Sam said...

Smarter people than myself seem to be running with "structured storage". That works for me.

Sam

dm said...

@chad i think it's fair to say that some of these solutions sometimes require more coding. You might want to try MongoDB, it errs more toward the full-featured end of the spectrum (at least in this space). While it does leaves out some RDBMS features (so horizontal scalability is still possible), its JSON data model and schemaless nature maps easily to most programming languages - often I would say it is easier, not harder, with it to write an app than it would have been with an RDBMS (except if you are doing something super transactional, like a banking system). For example, ad hoc queries, secondary index, sorting are still there.

Tobin said...

It is a bad name for that set of technologies. When I first saw it I thought it was a group of people philosophically opposed to select statements.

Tobin said...

Ah. Wikipedia to the rescue.

The term NoSQL was first used in 1998 as the name for a lightweight open source relational database that did not expose an SQL interface. Its author, Carlo Strozzi, claims that as the NoSQL movement "departs from the relational model altogether it should therefore have been called more appropriately "NoREL", or something to that effect"

yvesm said...

"And of course this technology is all pretty fresh meat,"

ZODB (http://en.wikipedia.org/wiki/Zope_Object_Database) is more than a decade old :-).

About Me

My Photo
Victoria, British Columbia, Canada

Followers

Blog Archive

Labels