Predictions for the future of software engineering:
1/ Models will be extraordinarily good at coding, very soon. Research labs are investing more in coding + reasoning improvements than any other domain for the next model generation. Their efforts will bear fruit.
2/ Why? Besides general AI progress, coding specifically has a unique advantage: potential for superhuman data scaling via “self play”. Models can write code, and then run it. Or write code, write a test, and check for self-consistency.
3/ This type of automatic supervision is not possible in most domains, which are facing data walls in post-training as we approach the limits of human expertise. Code is different—it can be tested empirically & automatically.
4/ As a result, software engineering will look radically different in a few years. True coding agents, which do tasks end to end, will complement today’s AI copilots. The experience will look something like giving every engineer an army of interns.
5/ In this new world, every engineer becomes an engineering manager. You will delegate basic tasks to coding agents, and spend more time on the higher level parts of coding: understanding the requirements, architecting systems, and deciding what to build.
6/ This will lead to an era of unprecedented Software Abundance. Software has historically been difficult and expensive to create. It will soon be 10x more accessible. We will see a proliferation of “single use software”—one-off apps and websites that are only now viable.
7/ There will be way more software engineers in the future than the present. The job will just be very different: more English, less boilerplate coding. Engineers will adjust, like they did for the transition from assembly to Python.
8/ There will also be substantial second order effects for startups, besides the immediate productivity gains.
9/ For one, companies that market to developers will soon start “marketing” to coding agents as well. After all, your agent might decide what cloud you use and which database you choose. Agent-friendly UI/UX (often: a good CLI) will be prioritized.
10/ The bar for product quality will also rise. Half-baked or feature-incomplete MVPs are less acceptable in a world where developers can ship so much faster.
11/ Testing infrastructure will be much more important & prevalent with the rise of coding agents. Both because the coding agents will write more tests, and also because they will depend on these tests to check their work.
12/ Switching costs will decline as a moat for tech companies, as agents make migrations easier. Companies will even start bundling migration-assistant coding agents when you buy their products, to streamline your adoption.
13/ Regardless of the specifics, the macro is clear: there’s never been a better or more productive time to be a builder.
14/ Coda: I’m excited to share that (in no small part, due to these predictions), I’ve joined @cognition_labs to help build Devin. I’ve been here >3 months, and Devin, while still early, is the first true glimpse I’ve seen of what the Software Abundance era could look like.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
🔥 Thread of cool things hackers are building at Scale’s generative AI hackathon today:
The @krea_ai team is building the Game of Life, where each alive cell is a whimsical happy Stable Diffusion image and each dead cell is an eerie, dark Stable Diffusion image, all of which evolve over time. Built on a generative AI version of Canva they made.
@sjwhitmore and team are building a key-value store to enable long-term memory in language model conversations
We evaluate davinci-003 across a range of classification, summarization, and generation tasks using Scale Spellbook🪄, the platform for LLM apps. Some highlights: 🧵
Davinci-003 appears to be significantly better at zero-shot classification. On a sample Yelp review sentiment dataset, davinci-003 reached 92% classification accuracy zero-shot, versus less than 70% for davinci-002.
Davinci-003 is an incredible poet. It is significantly more adept at particular styles of writing, like this poem about horses in iambic pentameter.
Second order effects of the rise of large language models:
1/ Soon, all products for creators will have embedded intelligence from massive language models (think Copilot in VSCode, DALL-E 2 in Photoshop, GPT-3 in GDocs). Companies making these products will need to roll their own massive language models or pay a tax to OpenAI/Google/etc.
2/ Over time, companies will become stratified into Compute Rich and Compute Poor. Many Compute Poor companies will become existentially dependent on the ML models of the Compute Rich.
What I’ve learned about making synthetic data work for training ML models:
1/ Context: synthetic data has matured drastically in the past 1-2 years. It’s gone from a research niche to a production dependency of many large-scale ML pipelines, especially in computer vision.
2/ Historically, the #1 obstacle to adopting synthetic data has been the reality gap — the small differences between real and synthetic data that models may fixate on incorrectly, harming generalization.
Today I saw the impact that AlphaFold is having on speeding up drug discovery firsthand:
1/ A friend runs a biotech startup designing drugs to fight cancer. In prior work, they found that tumor cells make a protein that binds to two receptors in the body. Binding to just one of them would inhibit the tumor’s growth, but binding to both makes the tumor grow faster.
2/ If they could design a new protein that binds to only one receptor and not the other, this mutant protein might be a potent cancer drug.
1/ It pays to be paranoid. Bugs can take so long to find that it’s best to be really careful as you go. Add breakpoints to sanity check numpy tensors while you're coding; add visualizations just before your forward pass (it must be right before! otherwise errors will slip in).
2/ It's not enough to be paranoid about code. The majority of issues are actually with the dataset. If you're lucky, the issue is so flagrant that you know something must be wrong after model training or evaluation. But most of the time you won't even notice.