@ethereum is the World Computer; born a rudimentary number cruncher, on a journey to (inevitably) becoming the dominant global settlement layer. And soon, Ethereum will outgrow the Merkle tree.
Tomorrow's solution: Verkle Trees
(2/25) We begin with a hash function, which transforms an arbitrary dataset into a unique, fixed-length string. A good hash function is irreversible and creates outputs that are indistinguishable from random data.
Think of a hash like a serial number for data of any kind/size.
(3/25) Merkle Trees are made by successive rounds of hashing.
All the data begins on the bottom row (leaf nodes). Then they are grouped together and fed into a hash function. These hashes are grouped (same size) and hashed again, continuing until there is a single (root) node.
(4/25) The purpose a Merkle tree is enable Merkle proofs, which allow verification that a piece of data exists in the underlying dataset (leaf nodes) without transmitting the entire dataset.
This has profound implications for both privacy and bandwidth requirements.
(5/25) The privacy aspects aren't useful for this discussion, but they are worth nothing.
Merkle proofs can verify a piece of data without having to articulate the entire dataset. You do need component pieces of the Merkle tree, but these are just (non-sensitive) hashes.
(6/25) We are here for the bandwidth implications.
Let's say you want to post a ton of data. No one is interested in all the data, and every one has access to Merkle proofs generation (big assumption).
Instead of posting the entire dataset, you can just post the Merkle root.
(7/25) Don't get me wrong, a Merkle proof is significantly more efficient than naïve approach... but it has its limits.
Tl;dr Merkle proofs scale much less quickly than dataset size, but they still scale proportionally. Eventually proofs will outgrow bandwidth capacity.
(8/25) What happens if we have a dataset too big for a Merkle tree?
IRL Example: the internal state of @ethereum is stored in a Merkle tree. This tree grows as more accounts & smart contracts are created. Eventually proofs are going to grow too big to push through the network.
(9/25) Let's start by asking the question "what's the purpose of a Merkle tree?"
A Merkle root is a unique string that is generated from a large dataset. With the root, anyone can verify the original dataset and/or an individual piece of data.
(12/25) Let's simplify how we think of Merkle proofs. The point of sharing all the intermediate branches is to generate the unique path from the Merkle root to your data point.
Let's think about expressing this value as a vector.
(13/25) This will form the foundation of Verkle tree. Here's more than you want to know on the name.
V = vector
erkle Tree = Merkle Tree
We will also be interested in Verkle tries which are trees organized by their keys (in example above, you can trace keys through inner nodes)
(14/25) Again, we've already built a scheme around a vector. The challenge is to build it so that the vector can be (trustlessly) transmitted with a constant size.
Fortunately, modern cryptography has developed lots of tools to solve this exact problem.
(15/25) For example, we could deploy KZG commitments for this purpose.
Tl;dr the KZG commitment scheme uses elliptic curve cryptography to commit to a polynomial.
A polynomial commitment is actually much more powerful than a vector commitment, but can also be used as one.
(16/25) KZG commitments require too much computation for the Verkle trees we are interested in. We will use Pedersen commitments to commit to a vector.
I haven't gotten to that thread, but it's worth reading ahead. We will see Pedersen commitments soon.
(17/25) We will pick the particular scheme based on the application, but the point is that we directly prove the vector. Unlike Merkle trees, we do not need to first rebuild the vector before we prove it.
Individual vector components are of constant size.
(18/25) Remember, Merkle proofs grew in size as the underlying dataset grew (albeit more slowly). When we tried to decrease the number of vertical levels, we got even more horizontal components.
(19/25) Verkle trees do not have this problem. Shortening the tree decreases proof size.
Here's a rough idea, but don't dwell too long on it. In fact, the cryptographic schemes we are looking are so powerful that we can simplify even further.
(20/25) KZG commitments, Pedersen commitments and all the other schemes we are considering can condense an arbitrary number of proofs into a single value.
Put another way, we can use the magic of elliptic curve cryptographic to create a single vector that serves for all proofs.
(21/25) @dankrad lays out the technique here in his blog. It's pretty in depth and technical, not for the feint of heart.
Verkle proofs don't scale with tree width and therefore can be incredibly wide. The current proposal for the @ethereum state is looking at a width of 256, but some are pushing for 1024.
(23/25) One might ask "can we make it infinitely wide?" We can, we can create a "Verkle tree" that is just a single level, making proofs incredibly lightweight.
In practice, this will cause a huge burden during the commitment generation phase.
(24/25) Turns out this might be an inherent property of commitment schemes, maybe like "in order to create a lightweight proof, a commitment needs to touch every piece of data."
If so, the commitment scheme design ends up becoming a trade-off between prover and verifier work.
(25/25) Verkle trees are an advanced improvement on an already complicated topic. But here's all you need to know:
- Verkle trees, like Merkle trees, allow verification of data within a dataset
- Verkle proofs are of constant size, regardless of dataset
As the World Computer attracts more use, the amount of resources needed to keep itself running increases. Today, that burden will increase to infinity, eventually crushing the network.
A roadmap to a sustainable Ethereum.
(2/20) @ethereum is the World Computer, a single, globally shared computing platform that exists in the space between a network of 1,000s of computers (nodes).
The nodes provide the hardware, the EVM provides the virtual computer and the blockchain records Ethereum's history.
(3/20) The EVM sits at the center of @ethereum, providing a decentralized computing platform to the world.
Everything is designed to construct this virtual machine and expose it to the world, so that anyone who is interested can permissionlessly interact with it.
(1/22) As the World Computer gets more use, the size of the EVM's state grows. Unceasing, unrelenting, unending growth. Eventually, the individual computers that make up the network wont be able to keep up.
(2/22) @ethereum is the World Computer, a single, globally shared computing platform that exists in the space between a network of 1,000s of computers (nodes).
The nodes provide the hardware, the EVM provides the virtual computer and the blockchain records Ethereum's history.
(3/22) The EVM sits at the center of @ethereum, providing a decentralized computing platform to the world.
Everything else is all designed to construct this virtual machine and expose it to the world, so that anyone who is interested can permissionlessly interact with it.
A commitment scheme is a primitive that allows one party to generate a piece of data anchored to a specific dataset.
This anchor (commitment), cannot leak information and yet can unequivocally verify the dataset.
(2/20) Let's say you have a large amount of data that, for whatever reason, is private.
How can you provide a public audience assurance that you will not alter the data without allowing them to see it?
(3/20) The naive approach breaks our problem statement's basic privacy assumptions. Revealing the data allows public verification... at the cost of 100% of privacy.
We need to find a way to provide a unique fingerprint of that specific set of data to act as a signature.
Vectors are a mathematical primitive that turn out to be particularly useful in computer science. In order to understand modern cryptography (and other advanced computational applications), you need a good grasp on this foundational topic.
(2/8) A vector is a concept used to convey quantities that cannot be expressed by a single number.
Think about velocity, which is a mathematician's way of saying "speed plus direction."
Speed is 10 m/s. Velocity is 10 m/s in a north-west direction.
(3/8) Vectors are incredibly versatile and show up again and again across mathematics.
There are many different ways to express a vector, each having their own benefits and drawbacks. Here are just a few of the ways we can express the same information.
At the heart of @Bitcoin, @ethereum and many blockchain computers is the Merkle Tree. While this data structure has served us well, it is not perfect... and if you look ahead, you can see impending problems.
Let's talk Merkle proof scaling.
(2/18) A Merkle Tree is a data structure that allows the efficient verification of the contents of a large dataset.
Check out this thread for a deeper dive, but we'll quickly review the basics.
(3/18) Merkle Trees are made by successive rounds of hashing (applying a hash function to a set of data).
A hash function transforms data of arbitrary contents (and length) into a unique, compact string. Think of a hash like a unique serial number for a set of data.
A story from 2010 of entrepreneurship, and some of the things I was up to before I found the World Computer.
It's funny, when life is happening it feels so chaotic; when you look back, things tend to make a lot of sense.
(2/25) Summer 2009 was the summer before university. John, one of my best friends, had gotten into the same school.
He KNEW he was going to study computer science. I had no idea, but if you'd asked I would have non-confidently said I was going to study math or physics.
(3/25) At the end of the first trimester, John MADE me take intro to comp-sci. It was the first time that I wanted MORE than what a class required; I looked forward to getting homework!
I never really decided to major in comp-sci... I just kept signing up for more classes.