. Profile picture
12 Nov, 25 tweets, 7 min read
Yesterday's chain split will be cemented in Ethereum's history as an inadvertent hard fork. Let's look at exactly what the vulnerability was & how this transaction exploited it (thread): etherscan.io/tx/0x57f7f9ec3…
I also want to preface that this is completely an outsiders account of the incident & therefore everything posted here is only to the best of my knowledge. I look forward to the go-ethereum team's full disclosure of the vulnerability!
On 2019-11-07, Geth v1.9.7 was released. It contained a few fixes and optimization, including PR #20177 titled "core/evm: avoid copying memory for input in calls".
This PR was a 102 line diff. The main change was modifying how *CALL arguments are passed to child frames. Instead of copying the memory, a reference to the arguments in the current frame's memory would be passed to the child frame to populate CALLDATA* opcodes.
In the normal case, this is fine. Each frame instantiates a fresh memory space and CALLDATA* opcodes *copy* the call's arguments into frame's memory. This explicit copy breaks the reference into the parent's memory.
To understand the vulnerability was exploited, we must understand the lesser known precompile at 0x04. In the Yellow Paper, it is referred to as the "identity function". It's sole purpose is to return the data it is given as input.
In the early days, it appears to have acted like a memcpy function. Solidity actually used it in 2016 to copy memory. Before EIP-150, copying memory in this way was rather economical at about 3 gas per word.
Post EIP-150, the initial cost of the `*CALL` increased dramatically from 40 to 700. This led @ethchris & @alexberegszaszi to replace its use entirely: github.com/ethereum/solid…
Returning to 2020, the identity precompile is alive and well at 0x04 -- and as it turns out, is actually at the center of the chain split.
We discussed above how child frames have a pointer to their parent's memory. You might now notice that we have a special precompile which returns the data it is given. There is nothing inherently wrong with this. However, this is where the impl of EIP-211 becomes critical.
Until EIP-211, CALL*s had to specify how much data its child would return. Adding RETURNDATACOPY & RETURNDATASIZE allowed child frames to return an arbitrary amount of data. This is useful if it's impossible to know how much data your call will return before you execute it.
You'll see that the Run(..) function which implements the identity precompile is very straightforward. It simply returns the input byte slice, unmodified.
This return data copied into the parent's memory and (as specified by EIP-211) saved in its entirety via a reference. In the screenshot, you'll see the call's return data is plugged directly into the current frame's RETURNDATA* opcodes via returnData.
To further emphasize that slices in the Go are just headers describing a contiguous sequences of bytes in memory, see how this playground copies the reference to the data, not the data itself: play.golang.org/p/pMAdD69dk9z
Now, there is a pointer to the current frame's memory sitting in the current frame's returnData field. Since EVM memory is writable, it's possible to overwrite the return data by writing to the same location.
Okay! We should have a solid understanding of what the vulnerability was. Let's move on to how it was exploited on mainnet. The exploit transaction was included in the canonical chain here: etherscan.io/tx/0x57f7f9ec3…
As an interesting aside if you haven't seen, it appears the Optimism team has disclosed that they accidentally caused the forked. If this thread is correct, that would imply they submitted the above transaction.
The exploit transaction called a contract which deployed a new contract with init code specified in the transaction's calldata. It's not clear why this CREATE step was taken, AFAICT the vulnerability could've been carried in the root frame with a call to 0x04.
This is the disassembled payload to CREATE. I made it a bit more readable -- note that 32 byte words are separated into half words and contiguous 0 bytes are omitted with ".." as a shorthand.
The payload called the identity contract with 0x00..69 and therefore was returned the same. In the vulnerable version of Geth, this return data is just a reference to current frame's memory.
Next 0x00..420,0x00..69 is stored into mem[0..32]. Because the return value was pointing to mem[0..16], the subsequent RETURNDATACOPY would place 0x00..420 (e.g. mem[0..16]) into mem[16..32] even though the *proper* behaviour is to place 0x00..69 into mem[16..32].
The value at mem[0..32] is then stored at 0x0 in the contract and the create frame returns. Again, in the canonical case, 0x0 would be set to 0x00..420,0x00..69. In the invalid case, 0x0 is set to 0x00..420,0x00..420.
You can see that the state diff on etherscan does show that the canonical 0x00..420,0x00..69 was saved to 0x0: etherscan.io/tx/0x57f7f9ec3…
This vulnerability was first reported by John Youngseok Yang on 2020-07-15. It was fixed one day later with PR #21222 by ensuring that return data is always copied into current frame's return data value. The fix was released in Geth v1.9.17 just 5 days later.
That's the story as far as I can discern! I didn't think it needed to be said, but if you find yourself in a situation where you believe you've run into a consensus issue, it's best to report it via the Ethereum bug bounty program. bounty.ethereum.org

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with .

. Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lightclients

1 Nov
Geth's transaction pool (aka mempool) can be boiled down to a few key data structures and processes. Transactions are the main building block. They are stored on the heap while references to them populate four objects: "all", "priced", "queue", and "pending".
The "all" object is a mapping of the transaction's hash to the actual transaction object on the heap. It is the canonical source of transactions in the mempool and is used to build (and rebuild) the "priced" object.
The "priced" object is a heap (data structure) that orders the transactions in the mempool highest to lowest priced. When the mempool is completely saturated, "priced" is asked to find the N cheapest transactions so that they can be fully evicted from the mempool.
Read 16 tweets
18 Jun 19
I'm going to live tweet as I read the @Libra_ whitepapers & relate it to Eth 2.0 where I can. [THREAD]
@Libra_ Please correct & discuss any of my thoughts, thanks!
Read 78 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!