Liam Zebedee Profile picture
Apr 18, 2022 101 tweets 26 min read Read on X
Deployed my first starknet contract tonight goerli.voyager.online/contract/0x064…
cairo doesn't have strings jfc it's early
I've recently gotten a grant from @StarkWareLtd to do some R&D into building a system on Cairo. Going to be using this thread to document my learnings :)
Starknet is probably the single highest signal community I've encountered so far. Lots of incredibly smart engineers in these group chats.
The guy who wrote the Cairo string library also designed a neural net that runs on-chain. It's the standard MNIST digit detector. Very cool github.com/guiltygyoza/ti…
One issue of Starkware is that their tooling and terms are very confusing. So I'm going to attempt to explain it.

Blockchains are cool because it's a programmable worldwide computer (aka state machine).
A state machine is a function which transitions between two states according to some rules. In Ethereum, the state machine is the EVM.

You send a transaction to a contract, the EVM loads the contract bytecode, and starts executing with the payload of msg.data.
When the EVM finishes executing the tx (which probably included updating storage, balances), we get the new state of the blockchain.
How do we know nodes actually run the computations? Well, they have the incentive to do so - the block reward from producing the next block.

How do we know if the computation is run correctly? Every node verifies the transactions.
A common misunderstanding - a 51% attack does NOT give you the ability to arbitrarily change balances.

Nodes always verify the state transition. A consensus-level attack only changes the facts on which transactions are included in blocks (including past blocks).
So now we understand - consensus and verifiable state machines are two separate aspects of a blockchain.

The verifiability is useful, but it's really expensive. It's O(n) to verify a transaction, since we have to run it on our machine. Is there a shortcut?
ZK STARKS are a different model of computation, where there is a prover that tries to convince a verifier of a fact.

This is a more general model of computation than a state machine, and can be cheaper too.
In EVM-land, prover and verifier both do the same thing. The prover runs the tx in the EVM, the verifier runs the tx in the EVM. It's the same O(N) cost.
In STARK-land, the prover actually convinces the verifier using a different technique to running the whole computation.

The prover translates the computation into an equation using a technique called arithmetization.

medium.com/starkware/arit… Image
This equation can be evaluated much more cheaply than running the whole computation. A ZK-STARK proof costs O(log^2 n) to verify.

EVM - O(n)
ZK-STARK - O(log^2 n)

Some notes here I wrote on ZK proofs back in 2018, before starkware was even out. Might be useful for beginner's intuition -
gist.github.com/liamzebedee/6c…
STARK's are a vastly better model of computation in blockchains. Optimistic Rollups only scale us linearly - they just outsource computation to a second layer, and inevitably fraud proofs do cost O(n) on layer 1.
Okay, now I'm gonna start covering how Starkware works-
So Starkware have built a VM on top of ZK-STARK proofs. Instead of writing computation in terms of equations, we can write them just like regular programs - line-by-line, w/ variables, and if statements.

That VM is called Cairo. It's also the language.

eprint.iacr.org/2021/1063.pdf
This is where it starts to get confusing.

Cairo (language) compiles down to the Cairo machine bytecode, which is called AIR.

AIR stands for Algebraic Intermediate Representation - aka the arithmetization technique I was mentioning before.
The Cairo machine is an implementation of a verifier. It does not generate proofs, it merely verifies a computational trace is valid.

What is a trace? Image
When you run a program in the Cairo CPU machine, every read/write to the CPU's registers generates a trace. Image
It's useful to think of a trace as simply a table of the CPU registers over time, where every column is a register, and every row are the values at time t.

Some example on an execution trace for encoding the computation of the Fibonacci sequence-
medium.com/starkware/arit…
Coming to memory - how does the Cairo machine's memory work?

There is a single immutable stretch of memory. Cairo instructions operate on a single cell in this memory. That cell is specified by the value in the ap/fp registers (more on this...) Image
So there are three registers in Cairo's machine:

pc - program counter
ap - allocation pointer (aka free memory pointer in evm)
fp - frame pointer (used to reference the current call frame)
One thing that surprised me early on was that Cairo/Starkware allows for composable smart contracts.

This is how cross-contract calls look in Cairo VM memory- Image
Interesting...so I'm just learning how addition/multiplication work in Cairo.

In EVM, you might push two values to the stack, and then call ADD, which puts the sum in stack position 0. Image
In Cairo, the word size is a felt (field element), which is 252 bits. cairo-lang.org/docs/hello_cai…
A Cairo instruction can span 1-2 words.
Unlike other CPU architectures, Cairo instructions are packed very differently.
The 1st word contains the instruction, flags and values.
The 2nd word contains the input (imm - immediate value). Image
I'll be honest it's literally easier to read the state machine code than try explain this. pg. 33

eprint.iacr.org/2021/1063.pdf
As I understand:

op1, op2 - operands
imm - immediate value
res - result
dst - destination

res/dst are used for jumps
res/dst also used for assertions
Notably missing - Cairo machine doesn't have mstore/sstore! The machine's memory is immutable read-only. How do they implement read/write then? Image
Like a simple example - an ERC20.transferFrom function. How do we update the balance of receiver?

Cairo doesn't have the concept of storage. But StarkNet does. And what's confusing is that you use the same language, but a different compiler (cairo-compile vs. starknet-compile)
When we think about writing to memory, we think about it happening while the program is executing. But that's not really how our computation is running, since it's a proof.

Instead, we use assertions to write to memory.
When we assert a memory cell is a certain value, the Cairo prover generates a trace which says that cell has a value.

That functions as an assignment. The traces are in fact a history of a program's memory.
Assertions can literally be for any number in the field, e.g. 123123123 = 123123123

Interpreting an assertion as an assignment, where the left-hand side is the address and right-hand side is the value, we can view it as a massive address space for our program's memory.
And just like EVM has precompiles, we can define certain areas of that address space to use for communicating with the outside world! (e.g. SSTORE)

Cairo calls these builtins. Image
The memory model is a little more sophisticated. Basically builtins exist at predefined memory segments. The memory segments are handled by the Cairo Runner, which is the runner for your Cairo program on top of the Cairo machine (but before the Cairo bootloader)
An example of how invoking a builtin works- Image
Coming to some actual Cairo code to understand how this compiles down, this is an example of a token contract in .cairo- Image
The storage_var annotation is actually just some syntactic sugar that generates a getter/setter. It generates something like this-

sourcegraph.com/github.com/sta…
The way you write to storage looks like this-

balances.write(caller, new_balance)

But what does .write actually compile down to?
Basically .write is an autogenerated function, which writes data word-by-word into memory by calling the `storage_write` builtin.

Ok so how does `storage_write` work?

sourcegraph.com/github.com/sta…
Fun fact - all of the Cairo builtins are actually written in Cairo itself!

This is actually a really good signal - decoupling/separation of concerns in the Starkware stack makes me think that they've designed these primitives to do one thing *really* well.
Here's the storage_write builtin. It's called a syscall here.

sourcegraph.com/github.com/sta…
Breaking down this function:

1. It allocates a new struct, the StorageWrite struct, which encodes the call into memory (which will generate a trace of that syscall)
2. It writes that StorageWrite to the syscall_ptr, using an assertion
What's the syscall_ptr? This is what is known as an "implicit" argument that you will see EVERYWHERE in Cairo. It's one of the ugliest parts of the language, since it needs to be mentioned anywhere you use a syscall/builtin.

see cairo-lang.org/docs/how_cairo…
It makes sense why we need it though. The Cairo Runner presumably defines the memory segments where syscalls are to be written to (syscall_ptr), and this is passed into the user's Cairo program when it is run.

ie. syscall_ptr is dynamically injected at runtime
Okay, what about this weird line with the {% syntax? What's that do?

%{ syscall_handler.storage_write(segments=segments, syscall_ptr=ids.syscall_ptr) %}

That is one of the weirdest parts of Cairo - Hints.

cairo-lang.org/docs/how_cairo…
Recall back to the paradigm shift around "memory".

When we transact with a Cairo program, we actually are generating a proof of that computation being run correctly.

The execution trace stores the history of the program's memory during exec.

What I didn't mention is that the execution of the Cairo machine can be NON DETERMINISTIC.

The prover may do additional work that is not part of the proven computation.
They refer to this as "nondeterministic programming".
The example they use is really terse and I love it.

The verifier doesn't need to know how you compute the square root - only that it satisfies some relation. This opens up Cairo for some very interesting applications Image
How this works - you can write anything to memory during the proof generation, and you can even do it in languages other than Cairo!

The {% %} syntax lets you write to memory using Python.

It will generate a trace of any modifications, just like a regular Cairo assignemnt
And when syscall_handler.storage_write is called, it routes to the StarkNet OS backend.

What I don't understand yet is how the whole thing fits together--

Do proofs get generated locally or server side? I imagine they get generated on the StarkWare server.
But if they're generated on Starkware's side, where are hints evaluated?
My working model - in production, the user invokes functions on a StarkNet contract, by sending a signed transaction to the gateway. The tx includes the selector, and inputs.
So far, I'm reading the entire StarkNet blockchain is actually implemented in Cairo - sourcegraph.com/github.com/sta…
Looks like the StarkNet Sequencer has a whitelist of hints. I'm going to guess that when you deploy a StarkNet contract, you're uploading the list of hints in the compiled contract .json. Somehow it figures out where hints are invoked during the program?

sourcegraph.com/github.com/sta…
Money quote - sourcegraph.com/github.com/sta…

hints : a dictionary from memory addresses to an executable object

226 is presumably the mem addr, and the exec obj is presumably the .code Image
StarkNet FAQ has a good glossary - starknet.io/faq/
Constraints I've found on StarkNet so far:

Max Cairo machine steps - 1_000_000. [1]
Storage read/write is 40 steps [2]
So that's max 25k r/w per tx.

[1] sourcegraph.com/github.com/sta…
[2] sourcegraph.com/github.com/sta…
So for transferring 1M tokens at once, that would be 40 transactions worth. Not too bad. Not sure about gas costs yet.
Compared to Polygon:

Gas throughput per second (not per tx) - 9.2M
SLOAD is 20k gas
9_200_000 / 20_000 = 460

StarkNet's looking like a 54x improvement in throughput.
So I just realised something quite dumb. But since I’m dumb I imagine many people might think this.

STARKS don’t have to be ZK.

Starkware/StarkNet/Cairo are all just STARK-based. There is no private element to them (like Circom from memory)
Honestly if I were wanting to get a young student into maths, comsci and software, I’d get them to read the Cairo paper. It’s 22 pages of some of the most interesting and concise explanations of computer science topics I’ve ever read. It’s beautiful.
It goes from “computation can be represented as y=mx+c” to “let’s design a CPU, instruction set and high-level programming language”
And it’s just so freaking interesting. A paper where they build a CPU, instruction set, memory model, program runtime, and programming language.

All justified within the constraints of arithmetic on prime numbers (field elements)
Open-source STARK prover. You can implement computations that compile down to STARK proofs using Rust traits!
github.com/novifinancial/…
The documentation + featureset is incredible. Emphasis on "configurable fields" - you can fine-tune proof generation for specific security targets (e.g. mobile vs. desktop class compute) Image
AIR = assembly for STARK's
Cairo = C for STARK's
Prover = gcc/clang

(easy mental shortcuts to explain this)
Okay Cairo is maybe a bit more than just "C for STARK's".

It's more like Java

.cairo -> cairo VM bytecode
.java -> JVM bytecode

thanks @GuthL for clarifying this for me!
@GuthL JVM bytecode is interpreted into assembly.
Cairo bytecode is interpreted into AIR.
@GuthL so like:

java -> JVM bytecode -> assembly -> x86
cairo -> Cairo VM bytecode -> AIR

x86 runs on Intel CPU's
AIR runs inside a STARK prover like Winterfell
@GuthL Still working on this mental model. I'll abbreviate Cairo VM to CVM to make things easier.

CVM inteprets CVM IR
JVM interprets JVM IR

but where the JVM executes on an x86 processor (e.g. java.exe), the CVM executes on a STARK prover
@GuthL This is actually a lot more useful, to think of the STARK prover as a processor.

STARK provers will eventually be embedded in hardware, just like other cryptographic coprocessors (TPM's, enclaves)

@dystopiabreaker tweeted abt this the other day, go chuck him a follow
@GuthL @dystopiabreaker tbh building ASIC's for STARK's is probably the single biggest alpha play rn
@GuthL @dystopiabreaker it's much more defensible than proprietary provers/verifiers. STARK architectures and VM's are complex but they're not impossible to understand
@GuthL @dystopiabreaker 😍😍😍 An open-source Cairo VM on top of Winterfell.
github.com/maxgillett/giza
@GuthL @dystopiabreaker In the next 6wks I'm going to challenge myself and build a horizontally scalable decentralized database based on STARK/Cairo.

WIP spec here -
ethresear.ch/t/horizontally…
@GuthL @dystopiabreaker Hot take: L2 sucks. There's not a single decentralized system that can send 1M tokens in a single transaction.

What's the constraint? Storage is O(N) and compute is O(N) for a blockchain.

Every node must execute the tx, every node must store all data.

Here's the idea-
@GuthL @dystopiabreaker 1/ Async smart contracts w/ cross-chain verifiable message passing using STARK proofs

Database cluster. Every node is a "blockchain". All communication is via cross-chain txs. Verifiable without state transfer using STARK proofs.
@GuthL @dystopiabreaker 2/ Database state (rows, index) is sharded among nodes, we don't prove it's state, we only prove updates to the state.

e.g. prove after inserting row R that the index is correctly constructed (e.g. leaf to left/right follow some relation)
@GuthL @dystopiabreaker 3/

Storage - O(1)
Compute - O(log^2 n) via STARK proofs
Networking - O(log^2 n)

Storage cost is so low since nodes only store a shard of state. State sharing not necessary due to STARK proofs.
If it works, it'll be better than rollups/bridges and better than yanking in ETH 2.0.

Application-specific blockchains that communicate via STARK proofs without having to share state or use cryptoeconomic incentives.

That's my shit
My second project on #starkware #cairo.

A blockchain where the proof-of-work function is generating STARK proofs for the chain's own state transition.

Cheaper than hashcash, no need for sybil control since all computation is verifiable.
Every machine has a clock. A blockchain is a state machine, so the clock is naturally T, the idx of current state. When we do cross-chain comms, we're relying on a clock to establish causality. You could use STARK proofs of a blockchain's state trnsitns as lightweight clock sync
If you read the whitepaper, you know that bitcoin was originally described as a "distributed timestamp server". If you're a nerd, you know in the code it was actually called "timechain".

VDF's are cool for clock sync. Winterfell is basically a useful VDF.
Cairo syscalls seem similar to Linux interrupts (int 80h)? Write the arguments and syscall identifier to memory, and then call an interrupt opcode?

Haven’t found the interrupt opcode yet. LOL
Hot damn. I just realised. The execution trace of storage_write calls is the equivalent of Ethereum’s accessed_storage_keys (EIP-2930) eips.ethereum.org/EIPS/eip-2930
Some more notes on Cairo's prover architecture, the fact registry and the Cairo Runner-
Cairo proofs are modeled after a fact registry.
A Cairo program is executed with inputs, outputting an execution trace (showing the full state of memory over time), which is reduced to a STARK proof.
The Cairo STARK proof verifies that running a program with those inputs will generate this output - this is the *fact* we are submitting to the fact registry.

What's the fact registry?
The fact registry is a smart contract on Ethereum Goerli, which verifies STARK proofs and stores facts that are of the form `keccak(program_hash, output_hash)`.

The current fact registry is located at 0xAB43bA48c9edF4C2C4bB01237348D1D7B28ef168

goerli.etherscan.io/address/0xAB43…
This is a proxy pointing to a proprietary contract with the interface IFactRegistry.sol.

Doing some sleuthing, you can find a couple of implementations across Starkware's repos though that glean some more information-

goerli.etherscan.io/address/0x5f42…
The fact registry isn't a complex contract - it's just a KV store where you can call registerFact(fact).

The juice is in the Solidity STARK verifier.

Here's GpsOutputParser, which processes the output of the SHARP prover service (prev. called GPS)

sourcegraph.com/github.com/sta…
The external entrypoint to proving STARK proofs on-chain is in GpsStatementVerifier.verifyProofAndRegister

sourcegraph.com/github.com/sta…
Again, I keep coming back to the Cairo paper, just so succinct.

In all ZK proof systems, we have a verify(proof) function. We can verify a proof is correct, but how do we know what program it is proving?
For this, there is the Cairo runner. What the runner does, is write the hash of the program and its inputs to public memory. ie. something like this-

run(program, inputs):
write(public_mem, hash(program, inputs))
program(inputs)
What's seemingly beautiful about the invention of SNARK's is that it was truly also a process of discovery Image
I didn't realise this, but they had to literally search for the parameters of two elliptic curves - MNT4 and MNT6 - which took over 610,000 compute hours Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Liam Zebedee

Liam Zebedee Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @liamzebedee

Jan 19, 2023
so @the_ethernaut has come up with a very interesting upgradeable contract pattern

- single contract
- delegatecalls to every other contract in system using an autogenerated proxy
- storage is thus shared between all contracts

github.com/Synthetixio/sy…
@the_ethernaut with a really ergonomic way to isolate storage from collisions
@the_ethernaut basically the base contract has the calltable for all selectors on all implementation contracts, binary searches the right one and delegatecalls it

24kb max contract size

can anyone tell me how many selectors this could support?
Read 4 tweets
Jan 18, 2023
Haha so something wild just happened. Just found out that R&D I did into a P2P reputation protocol way back in 2018 is now being used in a European grant project.
This is the project, it uses a framework called Evidence Based Subjective Logic, which is similar to eigenvector centrality / pagerank / eigentrust if you're familiar with that.

github.com/liamzebedee/re…
and then I stumbled upon this by accident while googling it. Surely this qualifies for a visa of some sort??

cyber-trust.eu/wp-content/upl…
Read 6 tweets
Dec 4, 2022
for something which is literally just signing a blob of data using a private key, it was an insanely complicated process

app is finally signed + notarized
shoutouts to whoever wrote this guide, which is like, the most sane thing I've found in a forest of insane hacks and winding github discussions
kilianvalkhof.com/2019/electron/…
the hardest part of programming is never the maths, data structures, etc. It's always the step where you have to setup the build system.

couple things that are better than 10yrs ago in this regard:
- buying ssl certs -> letsencrypt
- cmake -> docker, cargo
Read 4 tweets
Nov 21, 2022
content-addressable routing (find content by hash) beats location-based routing (ask the IP) because anyone can become a server

in essence, it's a permissionless hosting network
IPFS is useful only for key-value lookups - ie. find the value for key
ipfs://bafybeia5tevogbog7fyzsaobhc7oont6fgqry5nqdtqoq4dsyanqv4nhjy

but what if it was useful for an index?
In my little analogical brain, I imagine ZK proofs as being a sort of "generalised" hash.

ipfs://bafybeia5tevogbog7fyzsaobhc7oont6fgqry5nqdtqoq4dsyanqv4nhjy

this could represent a Google search for "vitalik"
Read 6 tweets
Nov 13, 2022
Haha so I've actually written another note on STARK's (for mere mortals).

Basically I wanted to figure out - how hard is it to implement this fantasy alien technology, recursive proofing?

To do this, I had to understand a couple of areas.

Link: hackmd.io/@liamzebedee/H…
What does the data structure of a STARK proof actually look like?

I went and reverse-engineered/read the code of a Circom verifier and Starkware's codebase.
From this, I was able to tell I had no idea what was going on. Why is there a separate trace and constraint merkle tree, AND the FRI? So I tried to research how STARK's are actually generated, to understand the domain.

How are STARK proofs actually generated?
Read 13 tweets
Oct 6, 2022
Has OFAC violated your rights?
You may be entitled to decentralization

..............

Introducing Dappnet 🍸
Dappnet resists capture
It's a permissionless application network
built on @ipfs and @ensdomains
The problem - we access dapps right now through DNS and servers, but they can be taken down, censored, captured, and seized from us at any time
Read 29 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(