Tweet

Joachim Neu

Aug 26 • 27 tweets • 8 min read

Data availability (DA) is critical for blockchains and rollups.
DA sampling is a deceptively simple and elegant proposed solution.
But actually, a lot of R&D challenges still need to be figured out! 🧵👇
paradigm.xyz/2022/08/das

@musalbas

Data availability (DA) is crucial for blockchains and rollups. An oft-discussed technique is random sampling for data availability verification (DAS = DA sampling), popularized by a paper by @musalbas @alberto_sonnino @VitalikButerin.

arxiv.org/abs/1809.09044

@CelestiaOrg

DAS is at the core of @CelestiaOrg and proposed for inclusion in proof-of-stake (PoS) @ethereum with “Danksharding.”

The DA Problem:

Somebody has produced a block of data. They claim to have made it “available” to the “public.” Your goal is to check: would you be able to obtain the data if you needed to?

A “naive” test: just download the entire block.

But we want to test for DA _without downloading too much data_, eg, because the data is larger than we can handle, or because it seems wasteful to spend much bandwidth on data we aren’t actually interested in, “only” to check DA.

At this point we need a model to clarify what it “means” to download or withhold only “part of the data.”

The “Bulletin Board In A Dark Room” Model:

1) First, the block producer enters the room and gets the opportunity to write some information on the bulletin board.

2) As the block producer exits, it can give you, the validator, a tiny piece of a hint.

3) You enter the room with a flashlight that has a very narrow light beam and is low on battery, so you can only read the writing on very few distinct locations of the bulletin board.

4) Your goal is to convince yourself that indeed the block producer has left enough information on the bulletin board so that if you were to turn on the light and read the complete bulletin board, you would be able to recover the file.

This seems tricky: We can ask the block producer to write down the complete file. But by inspecting the board at only a few locations, it’s hard to catch if a tiny piece is missing.

Thus, you cannot check for DA reliably. We need a new approach!

The (Theoretical) Solution:

This is where erasure correcting Reed-Solomon (RS) codes come into play.

An erasure correcting code “works” like this:

A vector of k information chunks gets encoded into a (longer!) vector of n coded chunks. The information chunks can be recovered from any subset of size k of coded chunks.

RS codes are built from the insight that low-degree polynomials are uniquely determined by their evaluations at a few distinct locations:

If you haven't seen RS codes before, check out the blog post for a gentle explainer using a simple intuitive example: paradigm.xyz/2022/08/das

Going back to DA: We now ask the block proposer to cut the file in k chunks, encode them using a Reed-Solomon code (say of rate k/n=1/2), and write the n=2k coded chunks to the bulletin board.

Either the producer is honest and writes all the chunks, or the producer misbehaves and wants to keep the file unavailable. Recall, we can recover the file from any k out of n=2k coded chunks. So to keep the file unavailable, the block producer can write at most k-1 chunks.

In other words, now at least k+1, more than half of the n=2k coded chunks, must be missing for an unavailable file!

These two scenarios, a 100% full bulletin board and a >50% empty board, are easily distinguishable: Inspect the board at a few r randomly sampled locations. File = available, if each sampled location has its chunk; file = unavailable, if any sampled locations is empty.

If the file is unavailable, and thus (more than) half of the board is empty, the probability that you erroneously consider the file available is less than 2^{-r}, ie, exponentially small in r.

The (Practical) Challenges

This is beautifully simple—within the given “bulletin board in a dark room” model.

Let’s think about the model: What do the components represent? Can we realize them in a real computer system, and how? What "is" the bulletin board really? Where is it stored? How is it read/write/sample accessed?

As long as you are left with pieces of the model that you haven’t translated into a computer/network/protocol equivalent, you know there is something left to be done—which might be either gaps in your understanding, or open research problems! ;)

In fact, there are still many R&D challenges. The blog post goes through a non-exhaustive collection of six challenges, but there are likely more. For example, should we use flooding-based gossip protocols or distributed hash tables (or something else?) to implement the board?

If you got curious, head over to the blog post for all the details: paradigm.xyz/2022/08/das

@musalbas

Special thanks for input to the post: @musalbas @dannyryan @dankrad @sreeramkannan @ssrivatsan97 @yangl1996 @danrobinson @gakonst @sourdzl @achalvs

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Joachim Neu

People who liked this thread also liked...

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?