v Profile picture
v
breaking databases @tursodatabase. W1 '21 @recursecenter excited about databases, storage engines and message queues
Dec 29, 2024 27 tweets 8 min read
Collection of insane and fun facts about SQLite. Let's go!

SQLite is the most deployed and most used database. There are over one trillion (1000000000000 or a million million) SQLite databases in active use.

It is maintained by three people. They don't allow outside contributions. SQLite is likely used more than all other database engines combined. Billions and billions of copies of SQLite exist in the wild. It's everywhere! Image
Dec 27, 2024 20 tweets 6 min read
Did you know that if SQLite performs better, then the lifetime of your mobile increases?

But how?

Notes on my paper: SQL Statement Logging for Making SQLite Truly Lite SQL Statement Logging for Making SQLite Truly Lite This paper appeared in VLDB, 2017 and I am reposting my old notes. I will also explain some necessary database internals.

tl;dr is instead of physical logging, they propose using logical logging to reduce write amplification

vldb.org/pvldb/vol11/p5…
Dec 21, 2024 11 tweets 3 min read
Some notes on how SQLite implements MVCC using WAL and provides Snapshot Isolation

Let's dive in SQLite DB and WAL mode First, let's understand the transaction model of SQLite, which is quite different (and more limited) than other databases.

What SQLite offers:
- Concurrent transactions
- Multiple readers AND a writer simultaneously
- But no multiple concurrent writers
Dec 15, 2024 23 tweets 7 min read
Here's a fascinating story of how researchers used Bloom filters cleverly to make SQLite 10x faster.

I'll also explain some database internals and how databases implement joins.

Let's dive in! SQLite: Past, Present, and Future paper cover - https://www.vldb.org/pvldb/vol15/p3535-gaffney.pdf I have posted about this paper earlier and also about filesystem-blob benchmarks. But let's discuss again for our new friends.

The results of this research have been applied to SQLite already and were released in v3.38.0.

This is going to be a long thread ☕️

Link to the paper: vldb.org/pvldb/vol15/p3…
Nov 1, 2024 16 tweets 5 min read
I watched @penberg 's talk "Patterns of Low Latency" at the P99 conference.

This talk is dense with practical insights and to the point. I was wondering why watch a talk on latency, but Pekka's background as a Linux Kernel contributor and his work on OSv and ScyllaDB made it worth hearing what he got to say.

Here are my notes. The highest ROI tips from the talk:

- Disable Nagle's algorithm (stops TCP from buffering small packets, reduces cold start)
- Eliminate dynamic memory allocation (use static/pool allocation instead)

Here is the link to the talk:
Oct 29, 2024 4 tweets 2 min read
if you discover a bug after shipping, would you leave it to maintain backward compatibility? how far would you go?

SQLite allows NULL values in primary key columns. This was an oversight by the developers, but since it was already shipped, they kept it to maintain backward compatibility. If you want standard behaviour, you need to explicitly add a NOT NULL constraint to the primary key column. In contrast, PostgreSQL automatically adds NOT NULL constraints to primary keys.

The bug lives on happily.NOT NULL is enforced on every column of the PRIMARY KEY in a WITHOUT ROWID table. This is in accordance with the SQL standard. Each column of a PRIMARY KEY is supposed to be individually NOT NULL. However, NOT NULL was not enforced on PRIMARY KEY columns by early versions of SQLite due to a bug. By the time that this bug was discovered, so many SQLite databases were already in circulation that the decision was made not to fix this bug for fear of breaking compatibility. So, ordinary rowid tables in SQLite violate the SQL standard and allow NULL values in PRIMARY KEY fields. But WITHOUT ROWI... read more: sqlite.org/withoutrowid.h…
Jun 15, 2024 6 tweets 2 min read
SQLite is cracked SQLite is cracked  Fun fact: SQLite is the most deployed and most used database. There are over one trillion (1000000000000 or a million million) SQLite databases in active use.  It is maintained by three people. They don’t allow outside contributions.  It is everywhere  SQLite is likely used more than all other database engines combined. Billions and billions of copies of SQLite exist in the wild. SQLite is in:  Every Mobile (Android, iOS, Windows) device Every Mac or Windows10 machine Every Web browser Every instance of Skype, iTunes, and Dropbox client PHP and Python Most TV, set-top cab... text phrases stolen from:

sqlite.org/mostdeployed.h…
dl.acm.org/doi/abs/10.147…
Mar 31, 2024 8 tweets 3 min read
fun fact: SQLite is the most deployed and most used database. There are over one trillion (1e12) SQLite databases in active use.

It is maintained by three people. They don't allow outside contributions. It's pretty much everywhere - sqlite.org/mostdeployed.h…
Image
Oct 31, 2023 4 tweets 2 min read
I always wondered how adblockers for YouTube worked, this post nicely explains it. Also, the cat-and-mouse chase between YouTube and adblockers. A fascinating read Some Background  Here’s how adblockers (used to) block YouTube ads. Before playing a video, YouTube would check its API, and the server would send back something like this:  {   "video": "something.mp4",   "ads": [ad1, ad2, ad3],   "etc": { ... } }  And the adblockers would override JSON.parse and Response.json to make it return this instead:  {   "video": "something.mp4",   "ads": [],   "etc": { ... } }  This trick worked for a few years. But earlier this year, YouTube started making fake requests to see if the... link to the post - andadinosaur.com/youtube-s-anti…
Sep 30, 2023 31 tweets 8 min read
Here is a fascinating story of how researchers teamed up with SQLite core developers to make it faster using Bloom filters!

Let's also dive into database internals and understand how databases implement joins.

📄Paper: SQLite - Past, Present, and Future (2022) link -

I presented this paper earlier at Papers We Love, Bangalore and I had been promising people to post the summary. So, here it is!

This is going to be a long thread ☕️☕️vldb.org/pvldb/vol15/p3…
May 9, 2022 4 tweets 1 min read
For the past few months, I have been learning about internals of databases. I found many excellent articles on writing compilers, but I could not find many practical resources for databases. So I wrote one. CaskDB is the project I wish I had started with.

github.com/avinassh/py-ca… CaskDB is based on Riak's Bitcask paper. The idea of Bitcask is brilliant yet straightforward, which makes it attractive for newbies to learn about key-value store internals and implement one.