Aleksa Gordić (水平问题)'s Threads

Sep 29 • 5 tweets • 3 min read

New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along.

(Remember matmul is the single most important operation that transformers execute both during training and inference. Most of NVIDIA compute is spent on it. Gaining 1% in efficiency translates to massive savings in the order of many nuclear reactors :P)

I, yet again, realized i underestimated the effort. 😅 Here is one more booklet (lol). 47 figures!

I covered:

* The fundamentals of the GPU architecture with an emphasis on the memory hierarchy, building mental models for GMEM, SMEM, and L1/L2, and then connecting them to the CUDA programming model. Along the way we also looked at the "speed of light," how it's bounded by power, with hardware reality leaking into our model.

* PTX/SASS, and how to steer the compiler into generating what we actually want (is that loop being unrolled, are we using vectorized loads like LDG.128, etc.). I've annotated one PTX/SASS example for a simple matmul kernel in excruciating detail. Even if you're new to compilers you should find this useful.

(i actually found various inefficiencies in both compilers - fun!)

* Many core concepts such as tile/wave quantization, occupancy, ILP (instruction-level parallelism), roofline model, etc. Also building intuition around fundamental equivalences: dot product as a sum of partial outer products, why square tiles are the right shape for high arithmetic intensity, etc.

* The warp tiling method - which is near SOTA assuming you can't use tensor cores, TMA, async mem instructions, and bf16. Just maximizing GPU's performance using nothing but CUDA cores, registers and shared memory.

* Finally, we step into Hopper (H100): TMA, swizzling, tensor cores and the wgmma instruction, async load/store pipelines, scheduling policies like Hilbert curves, clusters with TMA multicast, faster PTX barriers, and more.

As always lots of examples, lots of visuals. This is the first time i could see warp tiling kernel and be like "oh i get it completely". I just needed my mental image transformed into an actual image.

A few years ago I was really inspired by @Si_Boehm's excellent blog post on how matmul works, but I also found it had several errors, some unclear explanations, and it was quite outdated. Building on @pranjalssh amazing work (who did a great job building sota kernels for H100) and my own research, this is the final result.

---

Again a huge thank you to @Hyperstackcloud (GPU cloud) for giving me an H100 (PCIe) node to run some of the experiments and analysis that i needed to write this up.

Also a big thank you to my friends Aroun (who did a very thorough review of the post; Aroun's doing cool GPU/AI stuff at Magic and was previously GPU architect at Apple and Imagine, he's one of the best GPU people i know and we worked together on llm.c w/ @karpathy) and the amazing @marksaroufim! (PyTorch) for taking the time during weekend when they didn't have to. :)

Blog post: aleksagordic.com/blog/matmul

Apr 16, 2023 • 7 tweets • 3 min read

[🤖 This is BIG!] The best truly open-source ChatGPT alternative just came in! OpenAssistant! In a user study, they showed that OpenAssistant replies are on par with ChatGPT (48.3 vs 51.7%)! 🤯

Try it out: open-assistant.io/chat
@ykilcher's vid:

1/ 👇🧵

Additionally, I assume the model will suffer far less from "corporate speech" which will make it way more fun to use! Let me know how you like it once you try it down in the comment section. :))

2/

Apr 15, 2023 • 4 tweets • 2 min read

Insightful new blog post by @chipro covers a broad set of LLM-related topics fairly concisely.

Great if you want to get broad exposure to the field: huyenchip.com/2023/04/11/llm…

The topic covered include:

* What to do with LLMs' output issues due to the inherent ambiguity

1/ 🧵👇

of natural language (e.g. your output format might be violated, your outputs could vary without you changing the input, etc.)

* Prompt versioning (similar to how you version code, data...) and optimization (CoT - chain of thought prompting, self-consistency technique, etc.)

2/

Apr 15, 2023 • 10 tweets • 3 min read

[IMPORTANT ❗] I feel a sense of duty to warn people about social media posts they'll be seeing in AI in the upcoming months.

Web3/crypto/salesy vibes came to AI big time. Huge monetary incentives are at play so many of them flocked to the space...

1/ 🧵👇

(not judging it's just a fact - also there are as always exceptions!).

I'm a techno-optimist but there is hyping and there is hyping so hard that you're basically lying and spreading misinformation.

A concrete example:

People saying AutoGPT, a project that started...

2/

Mar 14, 2023 • 16 tweets • 5 min read

BIG LIFE ANNOUNCEMENT: I'm leaving @DeepMind to start my own company. I'm 28 now. This is a start of a new life chapter.

I'm both happy and sad.

❤️ I'm happy because I've been planning on starting my own company ever since I graduated from college back in 2017.

1/ (MEGA 🧵)

In the long run, I always felt that's going to be the best way to maximize my positive impact on the world.

My plan was to gain some real-world experience in the top tech companies before I go on to start my own thing.

That moment has finally come.

2/

Dec 28, 2022 • 25 tweets • 10 min read

[🤖 Build time! 🧠] I'm so excited to announce my new project: Andrew Huberman podcast transcripts! 🎉🥳

hubermantranscripts.com

Quickly search for an episode, find highly accurate transcripts, click and be directed to the exact timestamp in the YouTube video!

@hubermanlab
1/ Find valuable information that @hubermanlab gave us for free. I know It had a tremendous positive impact on my life.

With this one, I'm opening up a series of projects that I'll be building over the next year!

Took me ~3 full days to transcribe all of the videos using my...

2/

Nov 24, 2022 • 9 tweets • 3 min read

Enjoying the Silicon Valley! :)

More photos around the @Google campus in the thread below! 👇

1/

Slav squatting with Androids - err excuse me, *slavdroids

2/

Oct 1, 2022 • 6 tweets • 4 min read

Watched the whole @Tesla AI day video:

Some takeaways:

[16:55 - 58:00] They introduced a prototype of their humanoid robot - Optimus. Only a concept last year - and now a reality. The progress was incredibly fast!

1/

Throughout the presentation, they stressed that there are so many parallels between building a humanoid robot and building a self-driving car. That's why the progress was so fast - they could reuse the supply chain, the training infra, etc.

2/

Sep 1, 2022 • 4 tweets • 3 min read

If you want to understand how @StableDiffusion works behind the scenes I just made a deep dive video on it walking you through the codebase and papers step by step.

YT:

This is one of my most detailed deep dives so far

@robrombach @andi_blatt @pess_r
1/

If you want to understand how Stable Diffusion works behind the scenes I walk you through the codebase (github.com/CompVis/stable…) step by step explaining:

1. First stage autoencoder training (with KL regularization)

2. Latent Diffusion Model training (UNet + cond model)

2/

Aug 30, 2022 • 4 tweets • 2 min read

[💥 Open-sourcing Stable Diffusion scripts 💥] Folks if you missed this one I open-sourced a script that should make it super easy to get started playing with stable diffusion!

The code is here: github.com/gordicaleksa/s…

1/

It supports generating a diverse set of images, interpolating in the latent space, and thus creating (mostly) smooth transitions in the image space!

The image you see above was generated using the prompt:

"a painting of an ai robot having an epiphany moment" 🤖🤖🤖

2/

Aug 29, 2022 • 5 tweets • 4 min read

[🤯 Stable Diffusion 💥] If you wanted to get started with Stable Diffusion this video is for you!

Includes a walk-through of my code inspired by @karpathy's gist: github.com/gordicaleksa/s…

YT:

Thanks @EMostaque and the team for making this possible.

1/

I show you 3 ways to get started with Stable diffusion:

1. Using @huggingface Spaces (super slow, but super easy)

2. Using diffusers Colab notebooks (mid-ground). Thanks @psuraj28, @pcuenq for making these!

3. Running it locally (my code, most control/flexibility)

2/

Aug 18, 2022 • 9 tweets • 4 min read

Took some time to read through the logs behind @BigscienceW's BLOOM and @MetaAI's OPT-175B model training.

It's amazing they shared these publicly.

LLM training is true alchemy and modern-day babysitting.

Some examples that cracked me up

from github.com/facebookresear…

1/ 👇🧵 This is what perplexity vs wall-clock time looks like when training LLMs. 😅

You can almost taste that suffering

2/

Aug 3, 2022 • 5 tweets • 2 min read

During 2020 I started logging my ML journey that eventually led to me landing a job at DeepMind - and I'm so happy I've done it!

For multiple reasons:

* I forced myself to distill everything I've learned, and that compression/reflection solidified my knowledge

1/ 👇🧶 * I feel I helped others going on a similar path (although the logs are fairly meta as well - it's a more general learning blueprint) as well as the future me!

* It's a nice historical document and a public artifact that I am proud of.

2/

Aug 1, 2022 • 6 tweets • 2 min read

[learning machine learning 🧠] Don't fall into the same trap as many - namely trying to overengineer your curriculum when you're just getting started (and later as well).

You'll just end up with a decision making paralysis and eventually you'll end up giving up...

1/ 👇🧶 - which is the only bad outcome (unless it comes from a place of deep self-awareness).

You ask yourself the following questions:

What's the best ML course out there? Should I do X, Y or Z? The reputable guy on Reddit said Y, @ylecun said Z, and my professor said W.

2/

Jul 31, 2022 • 12 tweets • 3 min read

I get asked a lot about what does it take to land a job at DeepMind or any other world-class AI industry lab.

For those of you that are unaware of it I wrote a detailed blog on that topic and shared my personal journey here: link.medium.com/dV0H7fay6rb

If I could summarize..

1/ 🧶 ..my tips the main ones would be:

1) Have a lot of tenacity - it takes a lot of hard work, patience and consistency. The good thing is - this can be learned/practiced! For me personally I built this part of my personality through sports (calisthenics, running, martial arts..

2/

Jul 30, 2022 • 8 tweets • 2 min read

I wish I learned how to learn while I was still a kid. For some reason they don't teach us this in schools and everyone is left to figure it out on their own - which is sad, as most people never take the time to learn this.

A while ago I wrote a blog on this topic...

1/ 👇🧶 inspired by the "Learning How To Learn" @coursera course:
link.medium.com/0ZUtyq714rb

I strongly recommend you read it. Taking a step back from "actual learning" to boost your learning effiency is a time well spent.

IMHO, things that will benefit everyone should be...

2/

Jul 29, 2022 • 7 tweets • 3 min read

If you truly want to become proficient with machine learning (I really don't like the word expert) try to get out from the "going through the newest courses and books" phase as soon as possible.

Too many people keep on reading the newest books that come out...

1/ (and same for courses), thinking they are now up to date with ML world whereas in reality they are "light years" behind (things move fast around here 😇).

Try to get into the paper reading and replication phase as fast as you can without skipping the necessary steps.

2/

Jun 8, 2022 • 4 tweets • 3 min read

Very much enjoyed doing this podcast with @LeiserNeil from AI Stories, discussing what makes one a good ML engineer, imposter syndrome, my career path and challenges I faced, my path to @DeepMind, how to learn ML, and much more!

Watch here:

1/ 🧵

It's my first podcast ever and it was already long due! :))

Neil reached out already in January this year but since I had so much going on (moving to London, starting a new job + bunch of personal things) I had to postpone it until now - but now it's up!

2/

Jun 8, 2022 • 10 tweets • 3 min read

[🧠 Getting started with biology 🧠] I just finished the best MOOC course I've done in my life: "Introduction to Biology - The Secret of Life" offered on edX by the famous @eric_lander (Humane Genome project guy).

I've collected a ton of notes over the last month or so.

1/ 🧵👇

My idea is to start sharing my learnings and notes over the next weeks - do let me know down in the comments whether you'd find that useful!

A bit about the course:

* You'll learn the fundamentals of biochemistry, genetics/genomics, and more - enough to understand...

2/

Jun 7, 2022 • 12 tweets • 3 min read

[🧠 Interesting read 🧠] "Can a Biologist Fix a Radio? or, What I Learned while Studying Apoptosis" paper. So why is it interesting?

The state of AI seems to be somewhere in the middle between the experimental biologist's approach, as described in this paper, and...

1/

...the classical engineering approach.

The author observes and describes funny (and all too familiar?) boom and bust cycles that happen in biology while trying to understand complex phenomena with the promise of discovering a miracle drug that will solve all our problems.

2/

Apr 8, 2022 • 7 tweets • 4 min read

Everyone is hyped up about @OpenAI's DALL-E 2 model atm and most people have noticed this cryptic "signature" code at the end of their images - but how many of you understand what it stands for?

I did some research and found the answer! 🔎🧠 I couldn't believe it.

Thread 👇🧵1/

I've decided to do some deciphering. Is it a simple color code? What are @sama et al. trying to tell us?

I opened up my Photoshop and selected the color picker tool.

I first extracted 5 RGB tuples (R, G, B) from the 5 colors of the signature.

2/

Share this page!

Enter URL or ID to Unroll