Mark Saroufim Profile picture
gpu poor guy @pytorch
Jerome Ku Profile picture 1 subscribed
Dec 8, 2022 13 tweets 4 min read
Kinda wild that inductor, the default backend compiler for torch.compile() is about 16K lines of Python code. There's never been a better time to become an ML compiler hacker. So what kind of optimizations does an ML compiler need to do? Image Forget about dragons for a second, yes compilers are complex because perf outweighs complexity but picture a happy panda instead.

The optimizations that matter for compilers were figured out by Frances Allen from this talk venge.net/graydon/talks/… Image
Dec 8, 2022 7 tweets 3 min read
I haven't seen many people complaining that torch.compile() is crashing and there's a reason for that!

The minifier by @cHHillee and @anijain2305 is the silent star of 2.0. I've used it to turn crashing 1000+ line models into 10. I view it as a revolution in customer support. Customer: My thing isn't working
Me: Can you share a repro
Customer: Not really, it's tied to my infra, the model is too big, I can't leak my IP
Me: :(
Aug 3, 2022 5 tweets 1 min read
People aren't paying enough attention to how big a deal nbdev is. Notebook as a service providers historically have failed because of small margins competing vs cloud providers and because there is no easy way to go from experiment to prod. So some notebook providers provide

1/n
a shell with which you can some stuff but nothing as useful as just getting an EC2 machine. So you end up defaulting back to authoring everything from your IDE after moving away from a notebook that made you productive and then string together tests, docs and release scripts

2/n
Oct 9, 2021 7 tweets 2 min read
Reading the "Big Score" by Michael Malone, I'm struck by how much of Silicon Valley history essentially boils down to:

"Company pisses off star engineers working on key product. Engineers leave to build a better version of that product or a platform on top it" Exhibit A: Amdahl after quitting IBM
Oct 3, 2021 8 tweets 3 min read
With the release of Needham's new book on Visual Differential Geometry and Forms, I can't help but remember fondly the beautifully clear visual math books I've loved. A thread

amazon.com/gp/product/069… Needham reached textbook fame with Visual Complex Analysis. Complex numbers initially elicit a sense of mystery but only because they make a lot more sense once you draw them!

amazon.com/Visual-Complex…
Sep 28, 2021 7 tweets 2 min read
Watching the @huggingface infinity talks on how they got 1ms BERT GPU latency and 3ms CPU latency

They estimate that it takes 2-3 engineers about 2 months to get less than 20ms latency, sounds about right They add hardware vendor specific magic configs in docker containers which also makes it easy for them to collaborate with enterprise teams without having to share data
Sep 16, 2021 8 tweets 3 min read
marksaroufim.substack.com/p/working-clas… There's a lot of content online about what you need to do to become an ML researcher but not so much if you're a working class deep learner
Aug 3, 2021 7 tweets 3 min read
New post summarizing a conversation between myself and @iScienceLuvr the 18 year old PhD student on education, parenting and community
marksaroufim.substack.com/p/speedrunning… The educational system introduces many caps on progress which can frustrating for people who know what they like and want to do early on UNLESS like @iScienceLuvr they figure out the minimum viable graduation requirements
Aug 3, 2021 4 tweets 1 min read
Jun 27, 2021 4 tweets 2 min read
Just watched Karpathy's CVPR talk - my favorite ideas

Simplifying your input data modalities simplifies your org structure.

Simple Resnet for each camera aggregated over time in a single base transformer branching into multiple trunks for each team

1/n Owning the full stack gives you control over your own destiny. You can avoid meeting with salespeople or PMs and improve things in ways that matter to you.

Int8 quantization for your chip and your own distributed filesystem with superfast video retrieval and allsync()

2/n