Joachim Neu Profile picture
Aug 26 27 tweets 8 min read
Data availability (DA) is critical for blockchains and rollups.
DA sampling is a deceptively simple and elegant proposed solution.
But actually, a lot of R&D challenges still need to be figured out! 🧵👇
paradigm.xyz/2022/08/das
Data availability (DA) is crucial for blockchains and rollups. An oft-discussed technique is random sampling for data availability verification (DAS = DA sampling), popularized by a paper by @musalbas @alberto_sonnino @VitalikButerin.

arxiv.org/abs/1809.09044
DAS is at the core of @CelestiaOrg and proposed for inclusion in proof-of-stake (PoS) @ethereum with “Danksharding.”
The DA Problem:

Somebody has produced a block of data. They claim to have made it “available” to the “public.” Your goal is to check: would you be able to obtain the data if you needed to?
A “naive” test: just download the entire block.

But we want to test for DA _without downloading too much data_, eg, because the data is larger than we can handle, or because it seems wasteful to spend much bandwidth on data we aren’t actually interested in, “only” to check DA.
At this point we need a model to clarify what it “means” to download or withhold only “part of the data.”
The “Bulletin Board In A Dark Room” Model:
1) First, the block producer enters the room and gets the opportunity to write some information on the bulletin board.
2) As the block producer exits, it can give you, the validator, a tiny piece of a hint.
3) You enter the room with a flashlight that has a very narrow light beam and is low on battery, so you can only read the writing on very few distinct locations of the bulletin board.
4) Your goal is to convince yourself that indeed the block producer has left enough information on the bulletin board so that if you were to turn on the light and read the complete bulletin board, you would be able to recover the file.
This seems tricky: We can ask the block producer to write down the complete file. But by inspecting the board at only a few locations, it’s hard to catch if a tiny piece is missing.

Thus, you cannot check for DA reliably. We need a new approach!
The (Theoretical) Solution:

This is where erasure correcting Reed-Solomon (RS) codes come into play.

An erasure correcting code “works” like this:
A vector of k information chunks gets encoded into a (longer!) vector of n coded chunks. The information chunks can be recovered from any subset of size k of coded chunks.
RS codes are built from the insight that low-degree polynomials are uniquely determined by their evaluations at a few distinct locations:
If you haven't seen RS codes before, check out the blog post for a gentle explainer using a simple intuitive example: paradigm.xyz/2022/08/das
Going back to DA: We now ask the block proposer to cut the file in k chunks, encode them using a Reed-Solomon code (say of rate k/n=1/2), and write the n=2k coded chunks to the bulletin board.
Either the producer is honest and writes all the chunks, or the producer misbehaves and wants to keep the file unavailable. Recall, we can recover the file from any k out of n=2k coded chunks. So to keep the file unavailable, the block producer can write at most k-1 chunks.
In other words, now at least k+1, more than half of the n=2k coded chunks, must be missing for an unavailable file!
These two scenarios, a 100% full bulletin board and a >50% empty board, are easily distinguishable: Inspect the board at a few r randomly sampled locations. File = available, if each sampled location has its chunk; file = unavailable, if any sampled locations is empty.
If the file is unavailable, and thus (more than) half of the board is empty, the probability that you erroneously consider the file available is less than 2^{-r}, ie, exponentially small in r.
The (Practical) Challenges

This is beautifully simple—within the given “bulletin board in a dark room” model.
Let’s think about the model: What do the components represent? Can we realize them in a real computer system, and how? What "is" the bulletin board really? Where is it stored? How is it read/write/sample accessed?
As long as you are left with pieces of the model that you haven’t translated into a computer/network/protocol equivalent, you know there is something left to be done—which might be either gaps in your understanding, or open research problems! ;)
In fact, there are still many R&D challenges. The blog post goes through a non-exhaustive collection of six challenges, but there are likely more. For example, should we use flooding-based gossip protocols or distributed hash tables (or something else?) to implement the board?
If you got curious, head over to the blog post for all the details: paradigm.xyz/2022/08/das

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Joachim Neu

Joachim Neu Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(