Friday, 31 October 2008

Application specific database engines.

I've already written about the fact that databases are increasingly being created to support a single application and Michael Stonebraker has written about the drivers for multiple types of database engines. However, these trends extend much deeper than that.

When packaged applications or services need a storage solution, choosing a traditional relational database system is not necessarily the first choice. Many document management systems, workflow systems, CRM solutions, application servers, etc. exist with their own specialized storage systems. This enables these applications' developers to build a storage engine to meet their own very specific requirements which in turn means that a lot of unnecessary overhead and complexity can be removed from the storage component to deliver the application consistent high performance.

A great example of this that exists on the web is Facebook and the approach that it takes to storing and serving the enormous number of photos that they hold. If you take a look at the presentation needle in a haystack: efficient storage of billions of photos then you'll see how they've built a database engine for images driven by some extreme scaleability requirements. Could they have used a traditional database? ...No. They wouldn't have been able to meet been their performance criteria. They needed to ensure that every I/O that was made was necessary.

Even when a traditional database engine is involved, there can be database-like code sitting in the application to extend the capabilities of the underlying database engine. Database sharding is a good example of this. In this approach, data is federated over a collection of cheap servers to increase scalability and performance. Typically the applications that use sharding have the code that distributes the data over the shards and combines the results from the shards within their application code. I've used similar techniques myself before most of the commercial database engines starting supporting partitioning and clustering natively. (Something that MySQL - which most of the sharding practitioners seem to use - has only just started to support.)

Now, not all applications need to go to the extremes of building or augmenting a database storage engine, but for those that just aren't getting enough out of an off-the-shelf database solution, more and more brave souls seem to be taking on the challenge.