Read on Twitter

12,399 views

@dvassallo

, 14 tweets, 3 min read Read on Twitter

Friday's food for thought: If all your database queries have a WHERE user_id = ? (or something equivalent) you might not need a database. You could probably get by with a flat file per user. (But you also probably shouldn't do that and just use a database for now 😁)

More: 👇

You could be storing terabytes of data, but if every query only needs to address a few megabytes, there's usually no need for all the sophisticated database stuff: indexes, table schemas, DDL, query optimizers, statistics, caching, etc. All you might need are partitions.

All those sophisticated data structures and heuristics not only make databases complex to work with, but they also bring with them some big drawbacks:

1. Lot of random IO & expensive HW.
2. Data overhead & reduced compression.
3. Hard/impossible end-to-end encryption.

About 1: Imagine if you could run your database off S3 at just 2.3 cents per GB/mo. A terabyte would only cost you $23/mo. RDS storage on multi-AZ EBS SSDs would cost $250/mo, 10X more.

About 2: Indexes add overhead. Sometimes doubling or more the amount of space used. And because of how they are persisted, they are rarely compressed, and if so the compression rates tend to be small.

But data written to flat files compresses spectacularly well, often at ~10:1.

This means that a 1 terabyte database could get stored as 100 GB in S3, at just $2 per month! That's less than 1% the cost of RDS, even if indexes add zero overhead (unlikely).

About 3: Traditional databases can't do what they do if all the data was encrypted by the client. Maybe some of the fields can be, but you lose capabilities (can't search on them, etc).

Flat files can be used as a dumb store. Just append whatever byte [] the client gives you.

Obviously the trade-off of this approach is that all queries have to run on the client (typically the end user device), and a large-ish amount of data has to be transferred from storage to the client. Luckily, recent advances in consumer device performance is making this viable.

Today you could easily scan & filter data at >0.5 GB/s using 1 CPU core on any modern device, including web browsers on phones. And if clients have some space, you could cache query results and only work on fresh data since the last query.

My bet is that for the vast majority of web, mobile, and desktop applications, the user experience would be unaffected compared to running queries on server side. In fact, I actually anticipate slightly better performance for many types of queries (from 100s of ms to 10s of ms).

This doesn't mean you should stop using databases. Not yet!

Databases still handle data durability, data consistency, concurrency control, access control, and other things that are not trivial to do with plain flat files partitioned by user.

But once the "database" part gets extracted out of this: github.com/encrypted-dev/… you'll get an alternative option:

A low-burden, low-cost database equivalent. End-to-end encryption will be the 🍒 on the 🎂

Have a good weekend!

A few ppl rightly pointed out that filesystems are much harder to use properly than databases. I should have been clearer. I meant a logical concept of a file, rather than a real file on a filesystem. S3 objects are good candidates.

Right now I’m using DynamoDB as a first-stage append-only transaction log. Then the transaction log gets compacted periodically and moved to S3. This setup gets me immediate high durability and immediate strong consistency. 👍

Like this thread? Get email updates or save it to PDF!

Subscribe to Daniel Vassallo

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Daniel Vassallo

This content may be removed anytime!

Try unrolling a thread yourself!

More from @dvassallo see all

Related threads

Trending hashtags

Did Thread Reader help you today?