Latest Twitter Threads by @charles_irl on Thread Reader App

Jul 2 • 13 tweets • 4 min read

Two years ago, I built my first Modal app -- a diffusion-based QR code generator.

The results were sometimes good, sometimes terrible.

It's a common story: a cool AI demo that's not robust enough to be useful.

Here's how we engineered our way from the left image to the right.

FYI this thread is a summary of a blog post -- head there for a lot more detail!

The title gives away the game. We built solid evals and then we used those evals to unlock inference-time compute scaling.

modal.com/blog/qart-code…

Dec 12, 2024 • 13 tweets • 5 min read

I think programming GPUs is too hard. Part of the problem is sprawling, scattered documentation & best practices.

Over the past few months, we’ve been working to solve that problem, putting together a “Rosetta Stone” GPU Glossary.

And now it’s live!

My take-aways in thread.

The heart of the CUDA stack, IMO, is not anything named CUDA: it’s the humble Parallel Thread eXecution instruction set architecture, the compilation target of the CUDA compiler and the only stable interface to GPU hardware.

modal.com/gpu-glossary/d…

Aug 5, 2024 • 19 tweets • 4 min read

Last week @brad19brown, @jordanjuravsky, & co-authors released a paper on inference-time scaling laws that enable small LMs to beat the big boys.

So this weekend, @HowardHalim & I dropped everything to run their analysis on a new model + new data.

Success 😎

Why this matters:

Details of our work and repro code on the Modal blog.

All you need are @modal_labs and @huggingface credentials! And it's free: it fits in the $30/month in Modal's free tier.modal.com/blog/llama-hum…

Nov 30, 2022 • 4 tweets • 2 min read

a lot more fun to use than the classic playground interface, which makes interactions like this one more delightful 😎

https://twitter.com/npew/status/1598016510588354560

(please do not park your car on a volcano, even if you have an e-brake)

Nov 22, 2022 • 6 tweets • 3 min read

I had a delightful session talking through the paper "In-Context Learning and Induction Heads" with author @NeelNanda5.

It's part of a long research thread, one of my favorites over the last five years, on "reverse engineering" DNNs.

The core claim of the paper is that a large fraction of the in-context learning behavior that makes contemporary transformer LLMs so effective comes from a surprisingly simple type of circuit they call an _induction head_.

Nov 21, 2022 • 10 tweets • 5 min read

last week @modal_labs made A100 GPUs available

so on Friday i dropped everything to play with them

in hours i had a CLI tool that could make @StabilityAI art of the new puppy in my life, Qwerty

by Sunday i had multiple autoscaling pet-art-generating web apps -- and so can you!

context: A100s are beefy GPUs, and they have enough VRAM to comfortably train models, like Stable Diffusion, that generate images from text

if you can train the models, you can "teach" them proper nouns -- here "Qwerty", the name of my roommate @gottapatchemall's puppy (below)

Mar 21, 2022 • 25 tweets • 13 min read

I recently presented a series of four reports over 40 years on system failure, ranging from a 1985 typewritten white paper on mainframe database crashes to a 2021 Zoom talk on outages in one of Google's ML-based ranking systems.

Here's a summary, with connections to reliable ML.

Each report was a post-hoc meta-analysis of post-mortem analyses: which "root causes" come up most often? Which take the most time to resolve?

Each captures 100 or more outages from a system using best practices of its era & modality at the largest scale.

Mar 3, 2022 • 5 tweets • 5 min read

last week i attended MLcon2.0 by @cnvrg_io and saw some great talks from all across the ML development stack

all of them are now available on-demand!

i'll call out some of my favorites here

cnvrg.io/mlcon-2 from @DivitaVohra, an overview of @Spotify's ML platform. super cool to hear how a product manager thinks about the problem of supporting ML systems

Mar 3, 2022 • 4 tweets • 3 min read

im looking to start an interest group crossing over @full_stack_dl + @ml_collective!

we'll work through long-form content (h/t @chipro + @sh_reya) first, w sync discussions weekly to keep us on track

async folks can chat on discord, contribute to a wiki, + catch the recordings

this follows the format of really successful MLC interest groups in e.g. NLP (notion.so/MLC-NLP-Paper-…) and Computer Vision (notion.so/MLC-Computer-V…)

this group will focus on problems in production ML, like building datasets, monitoring models, and designing robust systems

Mar 1, 2022 • 33 tweets • 10 min read

really cool new #AISTATS2022 paper presenting 1) a particular setting for model monitoring and 2) a provably optimal strategy for requesting ground truth labels in that setting.

plus a bonus example, and theorem, on why you shouldn't just do anomaly detection on logits!

https://twitter.com/james_y_zou/status/1498677901897654280

scene: data in real life is non-stationary, meaning P(X,Y) changes over time.

our model performance is based on that joint distribution, so model performance changes over time, mostly downwards.

this is bad.

it's the ML equivalent of dependency changes breaking downstream code

Feb 26, 2022 • 24 tweets • 7 min read

Read through these awesome notes by @chipro and noticed something interesting about distribution shifts: they form a lattice, so you can represent them like you do sets, ie using a Venn diagram!

I find this view super helpful for understanding shifts, so let's walk through it.

https://twitter.com/chipro/status/1490924046350909442

(inb4 pedantry: the above diagram is an Euler diagram, not a Venn diagram, meaning not all possible joins are represented. that is good, actually, for reasons to be revealed!)

Feb 25, 2022 • 10 tweets • 4 min read

There's been some back-and-forth about this paper on getting gradients without doing backpropagation, so I took a minute to write up an analysis on what breaks and how it might be fixed.

tl;dr: the estimated gradients are _really_ noisy! like wow

charlesfrye.github.io/pdfs/SNR-Forwa…

https://twitter.com/arankomatsuzaki/status/1494488254304989228

The main result I claim is an extension of Thm 1 in the paper. They prove that the _expected value_ of the gradient estimate is the true gradient, and I worked out the _variance_ of the estimate.

It's big! Each entry has variance equal to the entire true gradient's norm😬

Nov 18, 2021 • 7 tweets • 3 min read

the final video for the @weights_biases Math4ML series, on probability, is now up on YouTube!

@_ScottCondron and I talk entropies, divergence, and loss functions

🔗:

this is the final video in a four-part series of "exercise" videos, where Scott and I work through a collection of Jupyter notebooks with automatically-graded Python coding exercises on math concepts

read more in this 🧵

https://twitter.com/charles_irl/status/1457840021772259332?s=20

Nov 8, 2021 • 8 tweets • 5 min read

New video series out this week (and into next!) on the @weights_biases YouTube channel.

They're Socratic livecoding sessions where @_ScottCondron and I work through the exercise notebooks for the Math4ML class.

Details in 🧵⤵️

Socratic: following an ancient academic tradition, I try to trick @_ScottCondron into being wrong, so that students can learn from mistakes and see their learning process reflected in the content.

Aug 24, 2021 • 9 tweets • 4 min read

If you're like me, you've written a lot of PyTorch code without ever being entirely sure what's _really_ happening under the hood.

Over the last few weeks, I've been dissecting some training runs using @PyTorch's trace viewer in @weights_biases.

Read on to learn what I learned!

I really like the "dissection" metaphor

a trace viewer is like a microscope, but for looking at executed code instead of living cells

its powerful lens allows you to see the intricate details of what elsewise appears a formless unity

kinda like this, but with GPU kernels:

Jul 31, 2020 • 8 tweets • 4 min read

another great regular online talk series! they're talking about GPT-3 now

https://twitter.com/DrLukeOR/status/1289305330027921408

@realSharonZhou: sees opportunities in medicine for with "democratization" of design of e.g. web interfaces.

this could be key for healthcare providers who have clinical expertise and know what patients need but don't have web design skills.

Jul 24, 2020 • 19 tweets • 8 min read

1/hella

this 🧵 by @daniela_witten is a masterclass in both the #SVD and in technical communication on Twitter.

i want to hop on this to expand on the "magic" of this decomposition and show folks where the rabbit goes, because i just gave a talk on it this week!

🧙‍♂️🐇💨😱

https://twitter.com/WomenInStat/status/1285612667839885312

tl;dr: the basic idea of the SVD works for _any_ function.

it's a three step decomposition:

- throw away the useless bits ⤵
- rename what remains 🔀
- insert yourself into the right context ⤴

Share this page!

Enter URL or ID to Unroll