Liam Zebedee Profile picture
engineer / distributed systems, ml, protocols, organiser @ausbuildooors

Apr 18, 2022, 101 tweets

Deployed my first starknet contract tonight goerli.voyager.online/contract/0x064…

cairo doesn't have strings jfc it's early

I've recently gotten a grant from @StarkWareLtd to do some R&D into building a system on Cairo. Going to be using this thread to document my learnings :)

Starknet is probably the single highest signal community I've encountered so far. Lots of incredibly smart engineers in these group chats.

The guy who wrote the Cairo string library also designed a neural net that runs on-chain. It's the standard MNIST digit detector. Very cool github.com/guiltygyoza/ti…

One issue of Starkware is that their tooling and terms are very confusing. So I'm going to attempt to explain it.

Blockchains are cool because it's a programmable worldwide computer (aka state machine).

A state machine is a function which transitions between two states according to some rules. In Ethereum, the state machine is the EVM.

You send a transaction to a contract, the EVM loads the contract bytecode, and starts executing with the payload of msg.data.

When the EVM finishes executing the tx (which probably included updating storage, balances), we get the new state of the blockchain.

How do we know nodes actually run the computations? Well, they have the incentive to do so - the block reward from producing the next block.

How do we know if the computation is run correctly? Every node verifies the transactions.

A common misunderstanding - a 51% attack does NOT give you the ability to arbitrarily change balances.

Nodes always verify the state transition. A consensus-level attack only changes the facts on which transactions are included in blocks (including past blocks).

So now we understand - consensus and verifiable state machines are two separate aspects of a blockchain.

The verifiability is useful, but it's really expensive. It's O(n) to verify a transaction, since we have to run it on our machine. Is there a shortcut?

ZK STARKS are a different model of computation, where there is a prover that tries to convince a verifier of a fact.

This is a more general model of computation than a state machine, and can be cheaper too.

In EVM-land, prover and verifier both do the same thing. The prover runs the tx in the EVM, the verifier runs the tx in the EVM. It's the same O(N) cost.

In STARK-land, the prover actually convinces the verifier using a different technique to running the whole computation.

The prover translates the computation into an equation using a technique called arithmetization.

medium.com/starkware/arit…

This equation can be evaluated much more cheaply than running the whole computation. A ZK-STARK proof costs O(log^2 n) to verify.

EVM - O(n)
ZK-STARK - O(log^2 n)

Some notes here I wrote on ZK proofs back in 2018, before starkware was even out. Might be useful for beginner's intuition -
gist.github.com/liamzebedee/6c…

STARK's are a vastly better model of computation in blockchains. Optimistic Rollups only scale us linearly - they just outsource computation to a second layer, and inevitably fraud proofs do cost O(n) on layer 1.

Okay, now I'm gonna start covering how Starkware works-

So Starkware have built a VM on top of ZK-STARK proofs. Instead of writing computation in terms of equations, we can write them just like regular programs - line-by-line, w/ variables, and if statements.

That VM is called Cairo. It's also the language.

eprint.iacr.org/2021/1063.pdf

This is where it starts to get confusing.

Cairo (language) compiles down to the Cairo machine bytecode, which is called AIR.

AIR stands for Algebraic Intermediate Representation - aka the arithmetization technique I was mentioning before.

The Cairo machine is an implementation of a verifier. It does not generate proofs, it merely verifies a computational trace is valid.

What is a trace?

When you run a program in the Cairo CPU machine, every read/write to the CPU's registers generates a trace.

It's useful to think of a trace as simply a table of the CPU registers over time, where every column is a register, and every row are the values at time t.

Some example on an execution trace for encoding the computation of the Fibonacci sequence-
medium.com/starkware/arit…

Coming to memory - how does the Cairo machine's memory work?

There is a single immutable stretch of memory. Cairo instructions operate on a single cell in this memory. That cell is specified by the value in the ap/fp registers (more on this...)

So there are three registers in Cairo's machine:

pc - program counter
ap - allocation pointer (aka free memory pointer in evm)
fp - frame pointer (used to reference the current call frame)

One thing that surprised me early on was that Cairo/Starkware allows for composable smart contracts.

This is how cross-contract calls look in Cairo VM memory-

Interesting...so I'm just learning how addition/multiplication work in Cairo.

In EVM, you might push two values to the stack, and then call ADD, which puts the sum in stack position 0.

In Cairo, the word size is a felt (field element), which is 252 bits. cairo-lang.org/docs/hello_cai…

A Cairo instruction can span 1-2 words.
Unlike other CPU architectures, Cairo instructions are packed very differently.
The 1st word contains the instruction, flags and values.
The 2nd word contains the input (imm - immediate value).

I'll be honest it's literally easier to read the state machine code than try explain this. pg. 33

eprint.iacr.org/2021/1063.pdf

As I understand:

op1, op2 - operands
imm - immediate value
res - result
dst - destination

res/dst are used for jumps
res/dst also used for assertions

Notably missing - Cairo machine doesn't have mstore/sstore! The machine's memory is immutable read-only. How do they implement read/write then?

Like a simple example - an ERC20.transferFrom function. How do we update the balance of receiver?

Cairo doesn't have the concept of storage. But StarkNet does. And what's confusing is that you use the same language, but a different compiler (cairo-compile vs. starknet-compile)

When we think about writing to memory, we think about it happening while the program is executing. But that's not really how our computation is running, since it's a proof.

Instead, we use assertions to write to memory.

When we assert a memory cell is a certain value, the Cairo prover generates a trace which says that cell has a value.

That functions as an assignment. The traces are in fact a history of a program's memory.

Assertions can literally be for any number in the field, e.g. 123123123 = 123123123

Interpreting an assertion as an assignment, where the left-hand side is the address and right-hand side is the value, we can view it as a massive address space for our program's memory.

And just like EVM has precompiles, we can define certain areas of that address space to use for communicating with the outside world! (e.g. SSTORE)

Cairo calls these builtins.

The memory model is a little more sophisticated. Basically builtins exist at predefined memory segments. The memory segments are handled by the Cairo Runner, which is the runner for your Cairo program on top of the Cairo machine (but before the Cairo bootloader)

An example of how invoking a builtin works-

Coming to some actual Cairo code to understand how this compiles down, this is an example of a token contract in .cairo-

The storage_var annotation is actually just some syntactic sugar that generates a getter/setter. It generates something like this-

sourcegraph.com/github.com/sta…

The way you write to storage looks like this-

balances.write(caller, new_balance)

But what does .write actually compile down to?

Basically .write is an autogenerated function, which writes data word-by-word into memory by calling the `storage_write` builtin.

Ok so how does `storage_write` work?

sourcegraph.com/github.com/sta…

Fun fact - all of the Cairo builtins are actually written in Cairo itself!

This is actually a really good signal - decoupling/separation of concerns in the Starkware stack makes me think that they've designed these primitives to do one thing *really* well.

Here's the storage_write builtin. It's called a syscall here.

sourcegraph.com/github.com/sta…

Breaking down this function:

1. It allocates a new struct, the StorageWrite struct, which encodes the call into memory (which will generate a trace of that syscall)
2. It writes that StorageWrite to the syscall_ptr, using an assertion

What's the syscall_ptr? This is what is known as an "implicit" argument that you will see EVERYWHERE in Cairo. It's one of the ugliest parts of the language, since it needs to be mentioned anywhere you use a syscall/builtin.

see cairo-lang.org/docs/how_cairo…

It makes sense why we need it though. The Cairo Runner presumably defines the memory segments where syscalls are to be written to (syscall_ptr), and this is passed into the user's Cairo program when it is run.

ie. syscall_ptr is dynamically injected at runtime

Okay, what about this weird line with the {% syntax? What's that do?

%{ syscall_handler.storage_write(segments=segments, syscall_ptr=ids.syscall_ptr) %}

That is one of the weirdest parts of Cairo - Hints.

cairo-lang.org/docs/how_cairo…

Recall back to the paradigm shift around "memory".

When we transact with a Cairo program, we actually are generating a proof of that computation being run correctly.

The execution trace stores the history of the program's memory during exec.

What I didn't mention is that the execution of the Cairo machine can be NON DETERMINISTIC.

The prover may do additional work that is not part of the proven computation.

They refer to this as "nondeterministic programming".
The example they use is really terse and I love it.

The verifier doesn't need to know how you compute the square root - only that it satisfies some relation. This opens up Cairo for some very interesting applications

How this works - you can write anything to memory during the proof generation, and you can even do it in languages other than Cairo!

The {% %} syntax lets you write to memory using Python.

It will generate a trace of any modifications, just like a regular Cairo assignemnt

And when syscall_handler.storage_write is called, it routes to the StarkNet OS backend.

What I don't understand yet is how the whole thing fits together--

Do proofs get generated locally or server side? I imagine they get generated on the StarkWare server.

But if they're generated on Starkware's side, where are hints evaluated?

My working model - in production, the user invokes functions on a StarkNet contract, by sending a signed transaction to the gateway. The tx includes the selector, and inputs.

So far, I'm reading the entire StarkNet blockchain is actually implemented in Cairo - sourcegraph.com/github.com/sta…

Looks like the StarkNet Sequencer has a whitelist of hints. I'm going to guess that when you deploy a StarkNet contract, you're uploading the list of hints in the compiled contract .json. Somehow it figures out where hints are invoked during the program?

sourcegraph.com/github.com/sta…

Money quote - sourcegraph.com/github.com/sta…

hints : a dictionary from memory addresses to an executable object

226 is presumably the mem addr, and the exec obj is presumably the .code

StarkNet FAQ has a good glossary - starknet.io/faq/

Constraints I've found on StarkNet so far:

Max Cairo machine steps - 1_000_000. [1]
Storage read/write is 40 steps [2]
So that's max 25k r/w per tx.

[1] sourcegraph.com/github.com/sta…
[2] sourcegraph.com/github.com/sta…

So for transferring 1M tokens at once, that would be 40 transactions worth. Not too bad. Not sure about gas costs yet.

Compared to Polygon:

Gas throughput per second (not per tx) - 9.2M
SLOAD is 20k gas
9_200_000 / 20_000 = 460

StarkNet's looking like a 54x improvement in throughput.

So I just realised something quite dumb. But since I’m dumb I imagine many people might think this.

STARKS don’t have to be ZK.

Starkware/StarkNet/Cairo are all just STARK-based. There is no private element to them (like Circom from memory)

Honestly if I were wanting to get a young student into maths, comsci and software, I’d get them to read the Cairo paper. It’s 22 pages of some of the most interesting and concise explanations of computer science topics I’ve ever read. It’s beautiful.

It goes from “computation can be represented as y=mx+c” to “let’s design a CPU, instruction set and high-level programming language”

And it’s just so freaking interesting. A paper where they build a CPU, instruction set, memory model, program runtime, and programming language.

All justified within the constraints of arithmetic on prime numbers (field elements)

Open-source STARK prover. You can implement computations that compile down to STARK proofs using Rust traits!
github.com/novifinancial/…

The documentation + featureset is incredible. Emphasis on "configurable fields" - you can fine-tune proof generation for specific security targets (e.g. mobile vs. desktop class compute)

AIR = assembly for STARK's
Cairo = C for STARK's
Prover = gcc/clang

(easy mental shortcuts to explain this)

Okay Cairo is maybe a bit more than just "C for STARK's".

It's more like Java

.cairo -> cairo VM bytecode
.java -> JVM bytecode

thanks @GuthL for clarifying this for me!

@GuthL JVM bytecode is interpreted into assembly.
Cairo bytecode is interpreted into AIR.

@GuthL so like:

java -> JVM bytecode -> assembly -> x86
cairo -> Cairo VM bytecode -> AIR

x86 runs on Intel CPU's
AIR runs inside a STARK prover like Winterfell

@GuthL Still working on this mental model. I'll abbreviate Cairo VM to CVM to make things easier.

CVM inteprets CVM IR
JVM interprets JVM IR

but where the JVM executes on an x86 processor (e.g. java.exe), the CVM executes on a STARK prover

@GuthL This is actually a lot more useful, to think of the STARK prover as a processor.

STARK provers will eventually be embedded in hardware, just like other cryptographic coprocessors (TPM's, enclaves)

@dystopiabreaker tweeted abt this the other day, go chuck him a follow

@GuthL @dystopiabreaker tbh building ASIC's for STARK's is probably the single biggest alpha play rn

@GuthL @dystopiabreaker it's much more defensible than proprietary provers/verifiers. STARK architectures and VM's are complex but they're not impossible to understand

@GuthL @dystopiabreaker 😍😍😍 An open-source Cairo VM on top of Winterfell.
github.com/maxgillett/giza

@GuthL @dystopiabreaker In the next 6wks I'm going to challenge myself and build a horizontally scalable decentralized database based on STARK/Cairo.

WIP spec here -
ethresear.ch/t/horizontally…

@GuthL @dystopiabreaker Hot take: L2 sucks. There's not a single decentralized system that can send 1M tokens in a single transaction.

What's the constraint? Storage is O(N) and compute is O(N) for a blockchain.

Every node must execute the tx, every node must store all data.

Here's the idea-

@GuthL @dystopiabreaker 1/ Async smart contracts w/ cross-chain verifiable message passing using STARK proofs

Database cluster. Every node is a "blockchain". All communication is via cross-chain txs. Verifiable without state transfer using STARK proofs.

@GuthL @dystopiabreaker 2/ Database state (rows, index) is sharded among nodes, we don't prove it's state, we only prove updates to the state.

e.g. prove after inserting row R that the index is correctly constructed (e.g. leaf to left/right follow some relation)

@GuthL @dystopiabreaker 3/

Storage - O(1)
Compute - O(log^2 n) via STARK proofs
Networking - O(log^2 n)

Storage cost is so low since nodes only store a shard of state. State sharing not necessary due to STARK proofs.

If it works, it'll be better than rollups/bridges and better than yanking in ETH 2.0.

Application-specific blockchains that communicate via STARK proofs without having to share state or use cryptoeconomic incentives.

That's my shit

My second project on #starkware #cairo.

A blockchain where the proof-of-work function is generating STARK proofs for the chain's own state transition.

Cheaper than hashcash, no need for sybil control since all computation is verifiable.

Every machine has a clock. A blockchain is a state machine, so the clock is naturally T, the idx of current state. When we do cross-chain comms, we're relying on a clock to establish causality. You could use STARK proofs of a blockchain's state trnsitns as lightweight clock sync

If you read the whitepaper, you know that bitcoin was originally described as a "distributed timestamp server". If you're a nerd, you know in the code it was actually called "timechain".

VDF's are cool for clock sync. Winterfell is basically a useful VDF.

Cairo syscalls seem similar to Linux interrupts (int 80h)? Write the arguments and syscall identifier to memory, and then call an interrupt opcode?

Haven’t found the interrupt opcode yet. LOL

Hot damn. I just realised. The execution trace of storage_write calls is the equivalent of Ethereum’s accessed_storage_keys (EIP-2930) eips.ethereum.org/EIPS/eip-2930

Some more notes on Cairo's prover architecture, the fact registry and the Cairo Runner-

Cairo proofs are modeled after a fact registry.
A Cairo program is executed with inputs, outputting an execution trace (showing the full state of memory over time), which is reduced to a STARK proof.

The Cairo STARK proof verifies that running a program with those inputs will generate this output - this is the *fact* we are submitting to the fact registry.

What's the fact registry?

The fact registry is a smart contract on Ethereum Goerli, which verifies STARK proofs and stores facts that are of the form `keccak(program_hash, output_hash)`.

The current fact registry is located at 0xAB43bA48c9edF4C2C4bB01237348D1D7B28ef168

goerli.etherscan.io/address/0xAB43…

This is a proxy pointing to a proprietary contract with the interface IFactRegistry.sol.

Doing some sleuthing, you can find a couple of implementations across Starkware's repos though that glean some more information-

goerli.etherscan.io/address/0x5f42…

The fact registry isn't a complex contract - it's just a KV store where you can call registerFact(fact).

The juice is in the Solidity STARK verifier.

Here's GpsOutputParser, which processes the output of the SHARP prover service (prev. called GPS)

sourcegraph.com/github.com/sta…

The external entrypoint to proving STARK proofs on-chain is in GpsStatementVerifier.verifyProofAndRegister

sourcegraph.com/github.com/sta…

Again, I keep coming back to the Cairo paper, just so succinct.

In all ZK proof systems, we have a verify(proof) function. We can verify a proof is correct, but how do we know what program it is proving?

For this, there is the Cairo runner. What the runner does, is write the hash of the program and its inputs to public memory. ie. something like this-

run(program, inputs):
write(public_mem, hash(program, inputs))
program(inputs)

What's seemingly beautiful about the invention of SNARK's is that it was truly also a process of discovery

I didn't realise this, but they had to literally search for the parameters of two elliptic curves - MNT4 and MNT6 - which took over 610,000 compute hours

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling