AI Pub Profile picture
AI papers and AI research explained, for technical people. Get hired by the best AI companies: https://t.co/MySVjUGOQ3
Jin Ho Hur Profile picture Chung Profile picture Jerome Ku Profile picture 4 subscribed
Apr 11, 2023 4 tweets 1 min read
// AI Recruiting: Survey //

I help ~25 AI startups recruit top-notch engineers, via the AI Pub Talent Network:

Now helping some with their hiring processes.

ML and software engineers: you're invited to interview. Why do you *not* start the hiring process with a company?

1/2
Some reasons that come to mind:

- Not ready / not the right time to leave current role
- Hiring process is long / a PITA
- Cash or equity comp not transparent
- Comp not high enough
- Product, company, or team isn't compelling

Any others?

2/2
Apr 8, 2023 10 tweets 8 min read
// Harvey: Legal AGI //

Harvey is an OpenAI-backed GPT-4 startup building AI knowledge workers.

They've signed deals with the largest law firms on earth, and are the fastest-growing LLM startup by revenue I know of.

Everything you need to know about Harvey:

1/10 ImageImageImageImage Harvey's first product is a GPT-4 powered AI knowledge worker.

Harvey can:
- Generate long-form legal documents
- With niche knowledge of the law
- Answer complex legal questions
- Leveraging millions of documents
- Create firm-specific models

2/10 Image
Mar 21, 2023 8 tweets 5 min read
// Deep Papers #3: Toolformer //

LLMs like Bing and ChatGPT use external tools like calculators and web search to answer questions.

How do you teach LLMs to *use* these external tools?

Toolformer shows how!

We interviewed the authors :)

Spotify: open.spotify.com/episode/6uXohG… LLMs can only spit out the next token, given the context.

How then does an LLM even *use* external tools?

In Toolformer, the authors teach LLMs to output:
- an <API> token,
- followed by a request body,
- followed by a <Call API> token.
Mar 10, 2023 5 tweets 3 min read
// Toolformer Podcast: Preview //

Today I'm interviewing the Toolformer authors!

LLMs like Bing (and soon, ChatGPT) can use external tools like calculators or internet search to answer questions.

But how do language models *learn to use* these tools?

1/5 ImageImage I'll publish a thread this weekend explaining how, but for now:

The most interesting question (& hardest part of the problem) is creating the dataset.

2/5 Image
Feb 16, 2023 7 tweets 5 min read
Today: the 7th largest law firm on Earth announced a 3,500-lawyer deal with Harvey, an OpenAI-backed AI Lawyer startup:

See below for:
- Deal details
- Harvey's capabilities (❗)
- Harvey's open roles (I refer talent to them!)

1/6 Allen & Overy, the 2nd-largest law firm in the UK and 7th-largest on Earth, is partnering with Harvey after a 3-month trial of its AI lawyer product.

It is now unrolling Harvey to 3,500+ lawyers in its offices.

Announcement link: allenovery.com/en-gb/global/n…

2/6
Feb 14, 2023 6 tweets 5 min read
// Podcast #2: Hungry Hungry Hippos (H3) //

Stanford researchers just released a new architecture that:

- Beats Transformers at ~1B param scale
- Admits *much* longer context than Transformers

Is H3 the Transformer-killer? More below!

Spotify: open.spotify.com/episode/45eXtV…

1/5 ImageImageImageImage Hungry Hungry Hippos, aka "H3", functions like a linear RNN, or a long convolution.

The key idea: due to the fast Fourier transform, an H3 layer:

- can be computed in n*log(n) time, with n the context length
- unlike Transformers, which require n^2!

2/5 Image
Jan 25, 2023 22 tweets 15 min read
Best of AI Twitter - January 11-25 (megathread):

Andrej Karpathy builds "GPT from Scratch",
A class-action lawsuit against Stability AI,
Track human pose using WiFi alone,
An LLM-native IDE,
InstructPix2Pix & AdA,
Training a world-class Rocket-league RL bot,

... and more:

1/22 Best of this edition: "GPT from Scratch" with @karpathy!

@karpathy continues his epic "Neural Networks: Zero to Hero" series, this time building a 10M-parameter GPT model from scratch and training it on Shakespeare.

2/22
Jan 11, 2023 17 tweets 9 min read
Best of AI Twitter (Jan 2 - 11):

- Microsoft investing $10B to own 49% of OpenAI,
- MedPaLM matches human doctors in medical accuracy,
- The "LLM uncanny valley",
- SOTA Muse and VALL-E models,
- Stability AI's "DeepFloyd": a mysterious new research lab...

... and more:

1/17 Best of this week:

Rumors on Microsoft's new $10B investment in OpenAI at a $29B valuation.

If my math is right and $29B is post-money, this would increase Microsoft's stake in OpenAI from ~15% ownership to 49%!

(correct me if I'm wrong.)

2/17
Jan 7, 2023 17 tweets 7 min read
// Algorithm Distillation, Explained //

Using Algorithm Distillation (AD), DeepMind scientists trained transformers to "learn how to learn":

- in arbitrary RL environments,
- without updating their weights!

How does AD work? The major ideas of AD, below:

1/17 AD data generation and training process.Algo DistillataHow AD generalizes to unseen tasks. TL;DR: A team from DeepMind trained transformers to:

- imitate RL algorithms in a variety of environments,
- in-context and with fixed weights,
- by treating RL policy-learning as a sequence modeling problem.

2/17
Dec 17, 2022 4 tweets 2 min read
What are the coolest computer vision startups you know of?

Bonus if they're recent/early stage! Lots of CV startups launched 5-10 years ago when deep learning + CV was taking off. Context: have met several computer vision engineers & researchers looking for their next role, via the AI Pub Talent Network.

Looking to find great vision companies for them!
Dec 11, 2022 8 tweets 3 min read
// Founding Research Scientist Opening //

Harvey is hiring a head of research.

My jaw dropped during their entire product demo. By *far* the startup I'm most excited about.

Job posting: jobs.ashbyhq.com/harvey/6dd0998…

More info below. Share with friends looking for RS positions!

1/8 What is Harvey?

- Harvey is an AI lawyer.
- Harvey makes lawyers more efficient by providing them with a natural language interface for all their workflows: research, drafting, editing, and everything else
...

2/8
Dec 6, 2022 15 tweets 10 min read
Best of AI Twitter (Nov 28 - Dec 6):

- ChatGPT,
- Design proteins with a text prompt (!)
- @roon & @noahpinion on the future of work with AI,
- NeurIPS 2022,
- Stable Diffusion sped-up 20x,
- @characterAI is building AGI,
- The "Forward-Forward Algorithm",

... and more:

1/15 Best of this week: OpenAI releases ChatGPT!

So hot right now.

Try it out at chat.openai.com.

2/15
Nov 28, 2022 14 tweets 8 min read
Last two weeks in AI (Nov 13-27):

- MinD-Vis: reading mental imagery from brain recordings,
- Meta AI's CICERO plays human-level Diplomacy,
- Stable Diffusion 2.0,
- Galactica: science-generation text model,
- Action-driven LLMs,
- The future of data moats,

... and more:

1/14 Best of this week:

Meta AI builds CICERO, an AI agent that plays Diplomacy better than 90% of human players.

A favorite game of JFK, Kissinger, and DeepMind founder Demis Hassabis. Extremely challenging, requires multiplayer collaboration, persuasion, and backstabbing!

2/14
Nov 23, 2022 10 tweets 6 min read
// Diplomacy AI thread-of-threads //

Yesterday, Meta AI revealed CICERO, an AI system that beat 90% of human players at Diplomacy.

Unlike Chess, Go, etc., Diplomacy requires multiplayer collaboration - and backstabbing!

A CICERO thread-of-threads from Meta AI researchers:

1/9 Announcement and video from Meta AI:

2/9
Nov 17, 2022 8 tweets 3 min read
// Next-gen LLMs //

Just co-wrote a piece on next-gen LLM capabilities, and implications for startups - with Swift VC, a B2B AI VC firm in the Bay Area.

Next-gen LLMs will be:
- Multi-prompt
- Agentic
- Vastly knowledgeable

More info:

blog.swift.vc/the-next-gener…

1/7
Next-gen LLMs will be *multi-prompt*.

Next-gen language models will revise their own work and solve complex problems iteratively, over several steps - using their prior "work" as a prompt for the next step.

See, e.g., OpenAI's recent Codex demo:

2/7
Nov 14, 2022 18 tweets 9 min read
Best of AI Twitter (Nov 7-14):

- AI-generated pull requests (!),
- Sam Altman drops ominous GPT-4 memes 😅,
- LLM-based search engines,
- Scale AI releases LLM + product ad art platforms,
- Molecules designed from text descriptions alone,

... and more:

1/18 A slower last week for AI Twitter. Perhaps we've been distracted.

I heard something happened in the Bahamas?

2/18
Nov 10, 2022 9 tweets 5 min read
// "Emergence" in ESMFold //

Emergence is the phenomenon of large ML models "learning" to do much more than they were trained to do, after training on a simple task.

Just came across the coolest example of emergence in ESMFold, Meta AI's new SOTA protein-folding model:

1/9 What is emergence?

Emergence is when an ML model is trained on a simple task, and learns to do *much more* than that simple task.

Simplest example is GPT-3: it's trained merely to predict the next word after previewing text, like your iPhone autocomplete.

2/9
Nov 8, 2022 4 tweets 2 min read
Just released: AI Pub Job Board!

Software engineer, ML engineer, and ML researcher roles at some of the best AI companies.

Check it out here: aipub.pallet.com/jobs

1/4 ImageImage All job listings have to come from *excellent* AI companies.

Our criteria for companies in this thread:

2/4
Nov 6, 2022 22 tweets 14 min read
Best of AI Twitter (Oct 30-Nov 6):

- Terrifying OpenAI Codex demo 😅,
- Meta AI teaches LLMs to fold proteins and solve math olympiad problems,
- George Hotz leaves Comma for "tinycorp",
- Broken Neural Scaling Laws,
- E-book search + fashion-gen code demos,

... and more:

1/22 ImageImageImageImage Best of this week: terrifying OpenAI Codex demo. 😅

Give Codex an open-ended, challenging math problem to solve via simulation.

Codex writes several iterations of code, explains itself along the way, checks and corrects itself, gets the right answer - with one prompt.

2/22
Oct 31, 2022 24 tweets 13 min read
Best of AI Twitter (Oct 16-30) - megathread!

- DeepMind trains RL transformers to "learn how to learn",
- Holodeck & text-to-music demos,
- Stability AI gets the $101M bag 💰,
- Neural audio compression + Hokkien translation,
- "Bleeding edge" research feed,

... and more:

1/24 Best of this week:

Algorithm Distillation (AD) RL breakthrough from DeepMind.

They train transformers to imitate the trajectory of RL algorithms.

Instead of training them to learn a policy that stays fixed after training, the transfomers "learn how to learn" policies...

2/24
Oct 23, 2022 17 tweets 8 min read
// Neural Radiance Fields (NeRFs), Explained //

You've seen NeRFs all over Twitter - they generate 3D views from a few 2D photos.

But what exactly are NeRFs? How do they work?

We'll explain:
- What NeRFs do
- How NeRFs make images
- Architecture + training

... and more:

1/17 First, what do NeRFs do?

NeRFs:
- Generate a 3D representation of an object or scene
- from several 2D photos of that object or scene.

Example: take some photos of your backyard, then list the 3D walkthrough on Zillow:

2/17