ً Profile picture
22 Nov, 23 tweets, 4 min read
Ethereum clients currently store 275 GB of historical data that is unnecessary to validate the chain. That number is growing at a rate of around 140 GB per year. EIP-4444 proposes clients prune data older than 1 y/o.

So why don't we just prune the data already?
To understand why the data has yet to be pruned and why this is even a discussion, it's important to understand how historical data is used today. There are two main categories of usage: syncing and user requests over JSON-RPC.
In the world of syncing, there are two main approaches:
- Full sync - from genesis, download every block & execute it until tip of chain is reached.
- State sync - many schemes here, but essentially header sync using PoW checks and then download the state for the latest block.
In both cases, clients request historical data over the p2p network to develop their view of the chain. The trust model has generally been to trust the genesis state and verify everything else - either fully verifying or light verifying with PoW checks.
Proof-of-stake changes this. Because it's susceptible to long range attacks, we must rely on a "Weak Subjectivity (WS) Checkpoint". This is essentially a block in the canonical chain that we trust with the same level of trust conveyed onto the genesis block in PoW.
The WS checkpoint allows clients to skip the bootstrapping step of requesting historical data over the p2p network. Of course, they'll still need to sync the historical data after the checkpoint - therefore the checkpoint should always be before the pruning boundary.
This sounds like a regression in security. Before, we had one hash from 13 July 2015 to verify. Now, we have a moving WS checkpoints. But in reality, we've been relying on weak subjectivity all along.
When is the last time you verified the code diff between client releases? Most people don't have the technical background to do this. So, every time you update your client, you're relying on the client team to faithfully implement the Ethereum protocol.
Fortunately there are a lot of eyes on software like go-ethereum. It takes just one whistleblower to show a malicious commit in the code. Similarly, it only takes one whistleblower to point out that a client is shipping with a malicious WS checkpoint.
In fact, it's much easier to verify a client ships with a correct WS checkpoint than it is to ensure the code correctly executes the protocol.
So, from a security perspective there is really no regression.

That covers syncing - the other main category of usage that historical data is needed for is serving user requests.
There are two types of data that users can request:

- current data, e.g. the value of a storage slot, an account balance, the latest block number, etc
- historical data, e.g. the value of a storage slot at block N, the header of block N, a transaction receipt, etc
Current data will continue to be accessible, however with EIP-4444 the historical data may not be depending on how long ago it is from.
The main consumer of historical data are dapp devs. Many dapps populate their databases with historical information to serve to their users via their frontends. For them, it's important to be able to iterate through all txs and logs.
There are various ways to support this use case - currently the favored approach is a client release multiplexer that executes certain block ranges on releases that support the range. For example, geth vA may support up to block 10m and geth vB supports 10m+.
The multiplexer would execute blocks 0-10m with geth vA, output the state db and import it into geth vB, then continue with blocks 10m+. JSON-RPC requests would be directed to the client that has the appropriate information to respond.
However, if the historical blocks are no longer available on the p2p network - where do they come from?

It's expected that many large, trusted institutions will provide mirrors to this data. Since the data is static, it's easy to agree on their hash and verify. 1-of-N trust.
The new standard will be to *not* store historical data and run a client multiplexer. This means the standard footprint of Ethereum clients will be reduced by 275 GB - however there is final issue to mention.
Currently the Ethereum JSON-RPC responds with an empty response when a requested piece of data does not exists. Assuming the client is not syncing, this can be accepted as "this piece of data does not exist in the canonical chain or recent fork".
Once clients begin pruning old data, this invariant breaks. When a user requests a certain tx receipt, the client won't know if the receipt was pruned or just never existed.

Currently, the expectation is that the RPC will return an empty response in both cases.
I'm curious to get feedback on this approach. How do consumers of the JSON-RPC feel about this? How often do you access historical data over 1 y/o? An alternative (albeit heavier) would be to maintain an index of pruned hashes so that more context could be returned to the user.
The 275 GB number was ascertained from the output of `geth db inspect`. Here is a screenshot:
The official EIP-4444 (pronounced EIP four 4s btw) specification can be found here: eips.ethereum.org/EIPS/eip-4444

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with ً

ً Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lightclients

26 Apr
Smart contract developers:

How much does the storage refund need to decrease before you start using a different value to represent 0?
The London hard fork in July will almost certainly modify the behavior of refunds. There have been several iterations, but the latest is EIP-3529: Reduction in refunds.
I recommend reading the EIP as it is very thorough and explains the past proposals, however the main case we care about for this thread is setting a nonzero storage element to zero and the refund that incurs.

eips.ethereum.org/EIPS/eip-3529
Read 16 tweets
23 Apr
In the last month, users spent $23,830,482 on ERC-20 approvals. EIP-3074 would reduce that number by at least 30%, saving users millions while freeing up block space (thread). Image
Anyone whose nonce is greater than ten has likely experienced the approve+transfer flow that defi applications use to interact with ERC-20 tokens. The approve method is a relic of the original ERC-20 design.

@DuneAnalytics dashboard for above tweet: duneanalytics.com/lightclient/US…
There have been many attempts to alleviate the burden of token approvals, but none have succeeded. We believe EIP-3074 will resolve the issue because it addresses the two problems that have beleaguered other approaches.
Read 15 tweets
16 Mar
Ethereum wallets may be getting a significant upgrade soon. With the proposed change, EOAs will immediately be able to send batch txs, expiring txs, unordered txs, and more. (thread)
My colleagues, @_SamWilsn_ and @adietrichs, and I have been working on improving the UX of interacting with Ethereum. After many iterations, we've come up with EIP-3074: AUTH and AUTHCALL opcodes.
To use these opcodes, an EOA signs a message off-chain, provides the message to a relayer, the relayer passes the signature and calldata to a contract on-chain (called an invoker), the contract verifies the signature using AUTH, and then relays the EOA's call with AUTHCALL.
Read 26 tweets
13 Jan
2021 will be the most innovative year for the Ethereum protocol since 2016. Here are the EIPs to keep an eye on this year (thread):
The next hardfork is named "Berlin" & is scheduled to ship with 4 EIPs:

* EIP-2929: Gas cost increases for state access opcodes
* EIP-2930: Optional Access List Txs
* EIP-2718: Typed Transaction Envelope
* EIP-2315: Simple Subroutines
EIP-2929: Gas cost increases for state access opcodes

Storage accessing opcodes have historically been underpriced & malicious txs which take 20-80 seconds to execute can be created today.

This EIP increases the cost of state accesses by ~3x.
Read 21 tweets
12 Jan
What are the most exciting new(ish) ERCs? cc @0xMaki @bantg @nicksdjohnson @thegostep @AndreCronjeTech @kaiynne @danfinlay
so far, the most interesting IMO (as a non-dapp dev) are EIP-2535 Diamond Standard (@mudgen), EIP-3009 Transfer with Authorization (@petejkim), EIP-3000 Optimistic enactment of governance (@izqui9), and EIP-3156 Flash Loans (@acuestacanada)
but i feel like there are interesting conversations happening in some lower number EIPs that I'm just not following. any standards to improve the readability of tx data when signing (or signing standards in general)?
Read 4 tweets
12 Nov 20
Yesterday's chain split will be cemented in Ethereum's history as an inadvertent hard fork. Let's look at exactly what the vulnerability was & how this transaction exploited it (thread): etherscan.io/tx/0x57f7f9ec3…
I also want to preface that this is completely an outsiders account of the incident & therefore everything posted here is only to the best of my knowledge. I look forward to the go-ethereum team's full disclosure of the vulnerability!
On 2019-11-07, Geth v1.9.7 was released. It contained a few fixes and optimization, including PR #20177 titled "core/evm: avoid copying memory for input in calls".
Read 25 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Thank you for your support!

Follow Us on Twitter!

:(