Hitting The Books: Short Review Of The "NoSQL Distilled" Book

2021-12-16

hitting the books

databases

NoSQL

books

system design

distributed systems

What Book?

Another result of an effort to keep the personal knowledge base up-to-date (and a reference from another recent read, Building Microservices) is the "NoSQL Distilled" book by Martin Fowler and Pramod Sadalage.

It's a fairly short and concise read of 152 pages, and it was published almost a decade ago in 2012 - which made me somewhat cautious about how relevant and useful it is for the modern day systems.

General Impression

I'll start with a summary - the main idea of the book can be put as promoting the Polyglot Persistence - in the book's own words,

"Polyglot persistence is about using different data storage technologies to handle varying data storage needs"

To achieve the goal, the authors first cover basic principles of distributed systems such as distribution models, consistency, data models and so on. After that they describe main types of the NoSQL storages: key-value, document, column-family and graph (also mentioning some others like filesystem). The book finishes with a short guidance on the storage selection, which could be summarized like this (quote from the book):

The two main reasons to use NoSQL technology are:
- To improve programmer productivity by using a database that better matches an application's needs
- To improve data access performance via some combination of handling larger data volumes, reducing latency, and improving throughput.

It's essential to test your expectations about programmer productivity and/or performance before committing to use NoSQL technology.

There's definitely much more than that in the book, namely the overview and suggestions of what's the best use cases for each storage type, and what are the options of the specific storage systems (such as Riak, Cassandra, MongoDB, DynamoDB and so on).

There's also a part that says that most of the application should still stick to relational DBs "at least until the NoSQL technology ecosystem becomes more mature", which after a decade of active usage and development it definitely is - but I think that consideration is still valid: it's totally fine to use a relational database if it fits the data model (or if the data model is not clear yet). I would definitely start with PostgreSQL for any common project, as it provides a lot of functionality in a very mature and highly supported package.

One more important thought shared on those pages is to encapsulate the database work inside an abstraction - could be a class, could be a service - to make switching storage if not smooth then at least possible in future (and likely transparent for the system's user.

Conclusion

After almost 10 years of its publication, it's still impressively relevant and useful. And the format is really good - it's very palatable and yet rich on ideas and details. There's certainly much more to learn about specific database technologies (preferably by using them), but this book provides a great guidance on the basics and gives ideas on where to go next.