There's a lot of buzz about code sandboxes. Which makes sense – coding agents are very useful! However I suspect building a business around it is quite hard. Some thoughts in 🧵
First of all, the code execution itself is definitely hard, but not crazy hard – products like Firecracker and gVisor solve the underlying hard technical problem of isolation.
So what's going to matter long term? Some thoughts: 1. Doing this at very large scale. This is a hard systems problem, and it's also necessary to make the economics work. You need to run at least 100k or ideally 1M sandboxes on avg to have healthy revenue.
2. The SDK itself will be a strong differentiator. Developer ergonomics matter. Make it easy to get started and build powerful things 3. The technical primitives. Things like network tunnels and storage. 4. Performance. Starting containers in 100ms or less is hard.
5. State snapshotting. This makes it possible to suspend/resume/clone sandboxes which imo will make coding agents way more powerful (what if you can branch and run a whole search tree)
6. Adjacent products. In particular LLM inference (to output code), but also training – especially doing cool things with reinforcement learning (rollouts). This makes the economics work and also creates an end-to-end product.
End of 🧵 – but I should also mention that we're working on all these things at @modal! Very excited about code sandboxes as a market, but it's still very early.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
An exciting technical project we've been working on at modal.com allows us to build a fresh container image, and boot up a lot of containers running that image, on many different nodes, all in a couple of seconds. Some notes on how we do it:
1. You can't use a traditional container registry. Pushing and pulling takes eternities (like 10s or more). Need to do something faster.
2. When you look at what you need to boot a container (strace is your friend), it turns out a very small % of files on that image is actually read. E.g. the Python interpreter reads a couple of hundred files to start.
I’m deep in the rabbit hole of optimizing Python imports today. Some notes:
1. Python precompiles and caches bytecode in .pyc files. It validates these by default by checking the modified time of the .py file and uses the cached .pyc if a magic checksum inside it matches the modified time
2. However! You can make it use an alternative method and use a hash of the .py file instead by setting the env var SOURCE_DATE_EPOCH! This is completely undocumented afaict but I found it in the source code: github.com/python/cpython…
Some will misinterpret this article as "masks are not effective" so it's worth pointing out that if you're a Bayesian then this should still move your posterior in favor of "masks work". The study would have had to see a -40% reduction to be stat sig. nytimes.com/2020/11/18/hea…
And to add to that, the way the study was designed, it will always underestimate the effect in the first place, since they only capture the effect on the people wearing the masks getting infected, not other people around them who also benefit.
I'm generally highly skeptical of poor statistical significance, but when you translate this into policy, you need to look at the whole payoff matrix, not throw the baby in the bathwater saying "we have no evidence that masks work" (looking at you, Sweden).