Charles 🎉 Frye Profile picture
Nov 8, 2021 8 tweets 5 min read Read on X
New video series out this week (and into next!) on the @weights_biases YouTube channel.

They're Socratic livecoding sessions where @_ScottCondron and I work through the exercise notebooks for the Math4ML class.

Details in 🧵⤵️
Socratic: following an ancient academic tradition, I try to trick @_ScottCondron into being wrong, so that students can learn from mistakes and see their learning process reflected in the content.
(i was inspired to try this style out by the @PyTorchLightnin Master Class series, in which @_willfalcon and @alfcnz talk nitty-gritty of DL with PyTorch+Lightning while writing code. strong recommend!)

Math4ML: in the class, we cover core ideas from linear algebra, calculus, and probability that are useful in ML.

I try to emphasize the key intuitions and connect them to programming ideas: shapes are like types, big O-notation and limits, etc.

wandb.me/m4ml-videos
Exercise notebooks: the M4ML class has always included GitHub-backed Colab/binder notebooks with text and exercises that firmed up the ideas in the lectures, but there wasn't any public video content explaining how to use them. Until now!

github.com/wandb/edu/tree…
Livecoding: the exercises are code, and we write the solutions together live (with light editing to remove typos etc)

Because the exercises are code, they can be graded programmatically.

Essentially, each comes with unit tests that you have to pass. Failures generate hints!
This course material is designed for remote, asynchronous online education.

The combination of video lectures, recorded homework sessions, and self-grading exercises is meant to make it possible to get the full benefit of the course asynchronously via the internet.
And if you have questions that the videos and autograder can't answer, you can post about them on the YouTube channel or in the W&B forum: wandb.me/and-you

The first video will be out tomorrow! I hope to see you there.

wandb.me/m4ml-exercises…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Charles 🎉 Frye

Charles 🎉 Frye Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @charles_irl

Dec 12
I think programming GPUs is too hard. Part of the problem is sprawling, scattered documentation & best practices.

Over the past few months, we’ve been working to solve that problem, putting together a “Rosetta Stone” GPU Glossary.

And now it’s live!

My take-aways in thread. Image
The heart of the CUDA stack, IMO, is not anything named CUDA: it’s the humble Parallel Thread eXecution instruction set architecture, the compilation target of the CUDA compiler and the only stable interface to GPU hardware.

modal.com/gpu-glossary/d…Image
This is obvious in hindsight. The ISA is where machines make contact with programs and it fundamentally divides the responsibilities of the hardware engineers and software engineers. This is true in a way even for a virtual ISA like PTX.
Read 13 tweets
Aug 5
Last week @brad19brown, @jordanjuravsky, & co-authors released a paper on inference-time scaling laws that enable small LMs to beat the big boys.

So this weekend, @HowardHalim & I dropped everything to run their analysis on a new model + new data.

Success 😎

Why this matters: Image
Details of our work and repro code on the Modal blog.



All you need are @modal_labs and @huggingface credentials! And it's free: it fits in the $30/month in Modal's free tier.modal.com/blog/llama-hum…
First: we are bad at using language models.

They are statistical models of Unicode sequences. We know that sequential sampling is hard, but (driven by the economics of inference service providers) we ignore that when sampling from LMs and sample a single sequence greedily.
Read 19 tweets
Nov 30, 2022
a lot more fun to use than the classic playground interface, which makes interactions like this one more delightful 😎
(please do not park your car on a volcano, even if you have an e-brake)
Zero-shot, the responses can be a bit "beige" and boring,
Read 4 tweets
Nov 22, 2022
I had a delightful session talking through the paper "In-Context Learning and Induction Heads" with author @NeelNanda5.

It's part of a long research thread, one of my favorites over the last five years, on "reverse engineering" DNNs.
The core claim of the paper is that a large fraction of the in-context learning behavior that makes contemporary transformer LLMs so effective comes from a surprisingly simple type of circuit they call an _induction head_.
In the video, Neel and I talk through the context of this claim and some of the phenomenological evidence for it.

In the process, I was delighted to discover that we share a deep love for and perspective informed by the natural sciences.
Read 6 tweets
Nov 21, 2022
last week @modal_labs made A100 GPUs available

so on Friday i dropped everything to play with them

in hours i had a CLI tool that could make @StabilityAI art of the new puppy in my life, Qwerty

by Sunday i had multiple autoscaling pet-art-generating web apps -- and so can you! Image
context: A100s are beefy GPUs, and they have enough VRAM to comfortably train models, like Stable Diffusion, that generate images from text

if you can train the models, you can "teach" them proper nouns -- here "Qwerty", the name of my roommate @gottapatchemall's puppy (below) Image
A100s are expensive and finicky, and training on smaller GPUs (like my home 3070) can be painful

but Modal, a new cloud-native development platform, has them available, and easily -- you just add some decorators and classes in your Python code Image
Read 10 tweets
Mar 21, 2022
I recently presented a series of four reports over 40 years on system failure, ranging from a 1985 typewritten white paper on mainframe database crashes to a 2021 Zoom talk on outages in one of Google's ML-based ranking systems.

Here's a summary, with connections to reliable ML. Image
Each report was a post-hoc meta-analysis of post-mortem analyses: which "root causes" come up most often? Which take the most time to resolve?

Each captures 100 or more outages from a system using best practices of its era & modality at the largest scale. ImageImageImageImage
"Why Do Computers Stop" was the first in the series, by Jim Gray (standing, center), who pioneered transactional databases and the ACID principle in 80s.

It's clear that these ideas were informed by his close engagement with actual failure data. ImageImageImageImage
Read 25 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(