AI Pub Profile picture
Technical AI, explained. Get hired by the best AI companies:
A VC Profile picture Chung Profile picture Jerome Ku Profile picture 4 added to My Authors
Jan 25 22 tweets 15 min read
Best of AI Twitter - January 11-25 (megathread):

Andrej Karpathy builds "GPT from Scratch",
A class-action lawsuit against Stability AI,
Track human pose using WiFi alone,
An LLM-native IDE,
InstructPix2Pix & AdA,
Training a world-class Rocket-league RL bot,

... and more:

1/22 Best of this edition: "GPT from Scratch" with @karpathy!

@karpathy continues his epic "Neural Networks: Zero to Hero" series, this time building a 10M-parameter GPT model from scratch and training it on Shakespeare.

Jan 11 17 tweets 9 min read
Best of AI Twitter (Jan 2 - 11):

- Microsoft investing $10B to own 49% of OpenAI,
- MedPaLM matches human doctors in medical accuracy,
- The "LLM uncanny valley",
- SOTA Muse and VALL-E models,
- Stability AI's "DeepFloyd": a mysterious new research lab...

... and more:

1/17 Best of this week:

Rumors on Microsoft's new $10B investment in OpenAI at a $29B valuation.

If my math is right and $29B is post-money, this would increase Microsoft's stake in OpenAI from ~15% ownership to 49%!

(correct me if I'm wrong.)

Jan 7 17 tweets 7 min read
// Algorithm Distillation, Explained //

Using Algorithm Distillation (AD), DeepMind scientists trained transformers to "learn how to learn":

- in arbitrary RL environments,
- without updating their weights!

How does AD work? The major ideas of AD, below:

1/17 AD data generation and training process.Algo DistillataHow AD generalizes to unseen tasks. TL;DR: A team from DeepMind trained transformers to:

- imitate RL algorithms in a variety of environments,
- in-context and with fixed weights,
- by treating RL policy-learning as a sequence modeling problem.

Dec 17, 2022 4 tweets 2 min read
What are the coolest computer vision startups you know of?

Bonus if they're recent/early stage! Lots of CV startups launched 5-10 years ago when deep learning + CV was taking off. Context: have met several computer vision engineers & researchers looking for their next role, via the AI Pub Talent Network.

Looking to find great vision companies for them!
Dec 11, 2022 8 tweets 3 min read
// Founding Research Scientist Opening //

Harvey is hiring a head of research.

My jaw dropped during their entire product demo. By *far* the startup I'm most excited about.

Job posting:…

More info below. Share with friends looking for RS positions!

1/8 What is Harvey?

- Harvey is an AI lawyer.
- Harvey makes lawyers more efficient by providing them with a natural language interface for all their workflows: research, drafting, editing, and everything else

Dec 6, 2022 15 tweets 10 min read
Best of AI Twitter (Nov 28 - Dec 6):

- ChatGPT,
- Design proteins with a text prompt (!)
- @roon & @noahpinion on the future of work with AI,
- NeurIPS 2022,
- Stable Diffusion sped-up 20x,
- @characterAI is building AGI,
- The "Forward-Forward Algorithm",

... and more:

1/15 Best of this week: OpenAI releases ChatGPT!

So hot right now.

Try it out at

Nov 28, 2022 14 tweets 8 min read
Last two weeks in AI (Nov 13-27):

- MinD-Vis: reading mental imagery from brain recordings,
- Meta AI's CICERO plays human-level Diplomacy,
- Stable Diffusion 2.0,
- Galactica: science-generation text model,
- Action-driven LLMs,
- The future of data moats,

... and more:

1/14 Best of this week:

Meta AI builds CICERO, an AI agent that plays Diplomacy better than 90% of human players.

A favorite game of JFK, Kissinger, and DeepMind founder Demis Hassabis. Extremely challenging, requires multiplayer collaboration, persuasion, and backstabbing!

Nov 23, 2022 10 tweets 6 min read
// Diplomacy AI thread-of-threads //

Yesterday, Meta AI revealed CICERO, an AI system that beat 90% of human players at Diplomacy.

Unlike Chess, Go, etc., Diplomacy requires multiplayer collaboration - and backstabbing!

A CICERO thread-of-threads from Meta AI researchers:

1/9 Announcement and video from Meta AI:

Nov 17, 2022 8 tweets 3 min read
// Next-gen LLMs //

Just co-wrote a piece on next-gen LLM capabilities, and implications for startups - with Swift VC, a B2B AI VC firm in the Bay Area.

Next-gen LLMs will be:
- Multi-prompt
- Agentic
- Vastly knowledgeable

More info:…

Next-gen LLMs will be *multi-prompt*.

Next-gen language models will revise their own work and solve complex problems iteratively, over several steps - using their prior "work" as a prompt for the next step.

See, e.g., OpenAI's recent Codex demo:

Nov 14, 2022 18 tweets 9 min read
Best of AI Twitter (Nov 7-14):

- AI-generated pull requests (!),
- Sam Altman drops ominous GPT-4 memes 😅,
- LLM-based search engines,
- Scale AI releases LLM + product ad art platforms,
- Molecules designed from text descriptions alone,

... and more:

1/18 A slower last week for AI Twitter. Perhaps we've been distracted.

I heard something happened in the Bahamas?

Nov 10, 2022 9 tweets 5 min read
// "Emergence" in ESMFold //

Emergence is the phenomenon of large ML models "learning" to do much more than they were trained to do, after training on a simple task.

Just came across the coolest example of emergence in ESMFold, Meta AI's new SOTA protein-folding model:

1/9 What is emergence?

Emergence is when an ML model is trained on a simple task, and learns to do *much more* than that simple task.

Simplest example is GPT-3: it's trained merely to predict the next word after previewing text, like your iPhone autocomplete.

Nov 8, 2022 4 tweets 2 min read
Just released: AI Pub Job Board!

Software engineer, ML engineer, and ML researcher roles at some of the best AI companies.

Check it out here:

1/4 ImageImage All job listings have to come from *excellent* AI companies.

Our criteria for companies in this thread:

Nov 6, 2022 22 tweets 14 min read
Best of AI Twitter (Oct 30-Nov 6):

- Terrifying OpenAI Codex demo 😅,
- Meta AI teaches LLMs to fold proteins and solve math olympiad problems,
- George Hotz leaves Comma for "tinycorp",
- Broken Neural Scaling Laws,
- E-book search + fashion-gen code demos,

... and more:

1/22 ImageImageImageImage Best of this week: terrifying OpenAI Codex demo. 😅

Give Codex an open-ended, challenging math problem to solve via simulation.

Codex writes several iterations of code, explains itself along the way, checks and corrects itself, gets the right answer - with one prompt.

Oct 31, 2022 24 tweets 13 min read
Best of AI Twitter (Oct 16-30) - megathread!

- DeepMind trains RL transformers to "learn how to learn",
- Holodeck & text-to-music demos,
- Stability AI gets the $101M bag 💰,
- Neural audio compression + Hokkien translation,
- "Bleeding edge" research feed,

... and more:

1/24 Best of this week:

Algorithm Distillation (AD) RL breakthrough from DeepMind.

They train transformers to imitate the trajectory of RL algorithms.

Instead of training them to learn a policy that stays fixed after training, the transfomers "learn how to learn" policies...

Oct 23, 2022 17 tweets 8 min read
// Neural Radiance Fields (NeRFs), Explained //

You've seen NeRFs all over Twitter - they generate 3D views from a few 2D photos.

But what exactly are NeRFs? How do they work?

We'll explain:
- What NeRFs do
- How NeRFs make images
- Architecture + training

... and more:

1/17 First, what do NeRFs do?

- Generate a 3D representation of an object or scene
- from several 2D photos of that object or scene.

Example: take some photos of your backyard, then list the 3D walkthrough on Zillow:

Oct 21, 2022 7 tweets 4 min read
Spent a few hours learning about Neural Radiance Fields (NeRFs) yesterday!

Publishing a NeRF explainer thread tomorrow.

In the meantime, here are the best resources I found to learn about NeRFs:

1/6 .@ykilcher's paper walkthrough of "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis".

Excellent explainer + walkthrough, really the only thing on this list you need to understand NeRFs at a high level!


2/6 Image
Oct 9, 2022 18 tweets 8 min read
Best of AI Twitter (Oct. 2 - Oct. 9):

- Whisper-powered Twitter video translation bot,
- AlphaTensor discovers SOTA matmul algorithms,
- Imagen + Phenaki text-to-video generation,
- Scaling laws for RL agents,
- Zero-shot encoder-decoder stitching,

... and more:

1/17 ImageImageImageImage Coolest thing from AI Twitter this week:


@altryne set up a Twitter bot with OpenAI's Whisper to translate any Twitter video from any language.

Respond to any video on Twitter with "@vidtranslator translate" and get a translation in minutes.

Oct 2, 2022 18 tweets 9 min read
Best of AI Twitter (Sept. 25 - Oct. 2):

- Tesla AI Day + Optimus reveal,
- Noninvasive fMRI brain-reading,
- State of the art text-to-video AND text-to-3D,
- iPhone stable diffusion + GPT-3 CLI demos,

... and more:

1/17 Tesla AI Day.

- Prototype of Optimus, a general-purpose humanoid robot
- Walkthrough of Dojo supercomputer
- Self-driving architecture, incl. using modified language models to make lane predictions

Elon has said that "[Optimus] will be worth more than the car business"

Oct 2, 2022 5 tweets 2 min read
Tesla AI day was incredible!

- Optimus bot revealed
- Dojo supercomputer
- Self-driving architecture upgrades

There have been some great threads on AI Twitter + Robotics Twitter recapping the event and walking through major breakthroughs.

Here are a few I enjoyed:

1/5 Image Whole-event walkthrough:

Sep 28, 2022 19 tweets 7 min read
// Git Re-Basin, Explained (Part I) //

Two weeks ago, researchers discovered a way to "merge" ML models, trained on different datasets, at *no* cost to loss!

They also found that that NN loss landscapes effectively contain a single basin.

Why, and how?

Read below:

1/19 The Git Re-Basin paper has two parts:

Part I is about symmetries of neural networks, and how to "align" the weights of two NNs with these symmetries.

Part II shows how to "merge" two models once the weights are aligned, and the limits and implications of merging.

Sep 28, 2022 10 tweets 3 min read
// LeCun's 3-Layer 🍰, AI Outmoding Humans 🤔 //

A useful metaphor here is Yann LeCun's "3-layer cake". (…)

The former, eg. self-driving, is a product that replaces all layers of the cake, including the "cherry" RL agent at the top: the human driver.

1/10 The latter, e.g. Copilot, is a product that only replaces the first two layers of the cake (foundation model + fine-tuning), while leaving the RL agent on the top - the human programmer - intact.

Replacing the whole cake is much harder than just replacing a lower layer.