Making the Slow Explicit: Dynamodb vs SQL

Written on February 26, 2023.

SQL databases like MySQL, MariaDB, and PostgreSQL are highly performant and can scale well. However in practice it’s not rare that people run into performance issues with these databases, and run to NoSQL solutions like DynamoDB.

Proponents of DynamoDB like Alex DeBrie, the author of “The DynamoDB Book” point to a few things for this difference: HTTP-based APIs of NoSQL databases are more efficient than TCP connections used by SQL databases, table joins are slow, SQL databases are designed to save disk space while NoSQL databases take advantage of large modern disks.[^1]

[^1]: I don’t have my copy of the book handy, so I wrote these arguments from memory. I’m confident that I remember them correctly, but apologies if I misremembered some details.

These claims don’t make a lot of sense to me though. HTTP runs over TCP, it’s not going to be magically faster. Table joins do make queries complex, but they are a common feature that SQL engines are designed to optimize. And I don’t understand the point about SQL databases being designed to save space. While disk capacities have skyrocketed, even the fastest disks are extremely slow compared to how fast CPUs can crunch numbers. A single cache miss can stall a CPU core for millions of cycles, so it’s critical to fit data in cache. That means making your data take up as little space as possible. Perhaps Alex is talking about data normalization which is a property of database schemas and not the database itself, but normalization is not about saving space either, it’s about keeping a single source of truth for everything. I feel like at the end of the day, these arguments just boil down to “SQL is old and ugly, NoSQL is new and fresh”.

That being said, I think there is still the undeniable truth that people in practice do hit performance issues with SQL databases far more often than they hit performance issues with NoSQL databases like DynamoDB. And I think I know why: it’s because DynamoDB makes what is slow explicit.

Look at these 2 SQL queries, can you spot the performance difference between them?

SELECT * FROM users WHERE user_id = ?;
SELECT * FROM users WHERE group_id = ?;

It’s a trick question, of course you can’t! Not without looking at the table schema to check if there are indexes on user_id or group_id. And you’d likely have to run DESCRIBE ... if the query was more complex to make sure the database will actually execute it the way you think it will.

I think this makes it easy to write bad queries. Look at Jesse Skinner’s article about the time where he found a web app where all the SELECT queries were using LIKE instead of = which meant that the queries were not using indexes at all! While it’s easy to think that the developer who made the mistake of using LIKE everywhere was just a bad developer, I think the realization we need to come to is that it is too easy to make these mistakes. The same SELECT query could be looking up a single item by its primary key, or it could be doing a slow table scan. The same syntax could return you a single result, or it could return you a million results. If you make a mistake, there is no indication that you made a mistake until your application has been live for months or even years and your database has grown to a size where these queries are now choking.

On one hand I think this speaks to how high performance SQL databases are. You can write garbage queries and still get decent performance until your tables grow to hundreds of thousands of rows! But at the same time I think this is exactly why DynamoDB ends up being more scalable in production: because bad queries are explicit.

With DynamoDB, if you want to get just one item by its unique key, then you use a Get operation that makes this explicit. If you make a query that selects items based on a key condition, that’s an explicit Query operation. And your query will return you only a small number of results and require you to paginate with a cursor. Again making it explicit that you could be querying for many items! And a query never falls back to scanning an entire table, you do a Scan operation for that which makes it explicit that you are doing something wrong.

Rather than any magic about table joins or differences in connection types, I think this is really the biggest difference in what makes DynamoDB more scalable. It’s not because DynamoDB is magic, it’s because it makes bad patterns more visible. I think it’s critical that we make our tools be explicit and even painful when using them in bad patterns, because we will accidentally follow bad patterns if it’s easy to do so.

I want to add though, DynamoDB is not perfect in this regard either. I particularly see this with filters. It’s easy to see why Amazon added filters, but it’s not rare that people use filters without understanding how they work and end up making mistakes (for example, here).