Jeremy Howard Profile picture
🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Hon Professor: @UQSchoolITEE ; Digital Fellow: @Stanford
Caught the #WokeMindVirus and ... I LIKE It! Profile picture Ross Grayson, MPH, CIH Profile picture ☀️ Leon-Gerard Vandenberg 🇳🇱🇨🇦🇦🇺 Math+e/acc Profile picture Maleph Profile picture Remco Frijling Profile picture 20 subscribed
Mar 7 9 tweets 2 min read
Today, with @Tim_Dettmers, @huggingface, & @mobius_labs, we're releasing FSDP/QLoRA, a new project that lets you efficiently train very large (70b) models on a home computer with consumer gaming GPUs. 1/🧵
answer.ai/posts/2024-03-… "With this capability we can take huge models to new heights locally, and gigantic, hundreds of billions of parameter models are now accessible by small labs", says legendary model builder @Teknium1
Feb 7 5 tweets 2 min read
Currying and composition in a nutshell (with APL). Image (This is easier for primary school children to learn than many things they are taught. At least according to the primary school kids I've taught it to.)
Jan 26 14 tweets 6 min read
There are few things more important to our civilization than understanding how to better do R&D. Thankfully, @eric_is_weird has dedicated himself to studying this question.

As a result, he's become the foremost scholar and historian of 19th and 20th century R&D labs.
1/🧵 Image We are incredibly lucky that @eric_is_weird has taken a strong interest in , and decided to do a deep dive into our organizational structure and R&D approach.

His article is a fascinating exploration of the last 2 centuries of R&D:
Answer.AI
answer.ai/posts/2024-01-…
Dec 11, 2023 6 tweets 2 min read
This is rather long, and I haven't checked it, but in short the claims here are use of complex GPT4 prompting to achieve:

- 100% on ConceptARC (which is a really difficult task that previously hasn't been cracked)
- A chess engine that beats all other chess engines. Examples are provided (but they need to be run directly on GPT4 API with temperature 0) so you can check the claims.
Nov 18, 2023 11 tweets 3 min read
OK everyone's asking me for my take on the OpenAI stuff, so here it is. I have a strong feeling about what's going on, but no internal info so this is just me talking.

The first point to make is that the Dev Day was (IMO) an absolute embarrassment. I could barely watch the keynote. It was just another bland corp-speak bunch of product updates.

For those researchers I know that were involved from the beginning, this must have felt nausea-inducing.

The plan was AGI, lifting society to a new level. We got Laundry Buddy. Image
Oct 13, 2023 9 tweets 4 min read
If you're like me and find it easier to read *code* than *math*, and you have access to @OpenAI GPT 4V (or use @bing or @google Bard), try pasting a image of an equation you wanna understand in there.

It might just blow your mind.
1/🧵 Image Multiple equations? No problem!

Image
Image
Image
Oct 13, 2023 9 tweets 4 min read
If you're like me and find it easier to read math than code, and you have access to @OpenAI GPT 4V, try pasting a image of an equation you wanna understand in there.

It might just blow your mind. Image Multiple equations? No problem!

Image
Image
Image
Sep 24, 2023 5 tweets 1 min read
I wanted ChatGPT to show how to get likes/views ratio for a bunch of YouTube videos, without dealing with the hassle of YouTube's Data API limits.

But it didn't want to, because it claimed screen scraping is against the YouTube ToS.

So I lied to ChatGPT. Image It's weird how typing a lie into ChatGPT feels naughty, yet it's basically the same as typing a lie into Google Docs.

They're both just pieces of computer software.
Sep 24, 2023 11 tweets 4 min read
I just uploaded a 90 minute tutorial, which is designed to be the one place I point coders at when they ask "hey, tell me everything I need to know about LLMs!"

It starts at the basics: the 3-step pre-training / fine-tuning / classifier ULMFiT approach used in all modern LLMs. Image It goes all the way through to fine-tuning your own LLM that converts questions about data into SQL statements to answer the question, using @PyTorch, @huggingface Transformers, and @MetaAI Llama 2.
Sep 6, 2023 13 tweets 5 min read
It looks like @johnowhitaker & I may have found something crazy: LLMs can nearly perfectly memorise from just 1-2 examples!

We're written up a post explaining what we've seen, and why we think rapid memorization fits the pattern. Summary 🧵 follows.
fast.ai/posts/2023-09-… Johno & I are teaming up on the @Kaggle LLM Science Exam competition, which “challenges participants to answer difficult science-based questions written by a Large Language Model".

We were training models using a dataset compiled by @radekosmulski...
kaggle.com/competitions/k…
Sep 1, 2023 7 tweets 2 min read
There's an amazingly convenient way to install the *full* NVIDIA CUDA dev stack on Linux, that I've never seen mentioned before.

It's all done with conda!

I just tried it and it worked perfectly.🧵
docs.nvidia.com/cuda/cuda-inst… First you need conda installed (e.g. via anaconda, miniconda, or miniforge). If you don't have it already, just run this script:
github.com/fastai/fastset…
Aug 10, 2023 6 tweets 2 min read
Now that ChatGPT has rolled out custom instructions to most users, try out this instruction -- it makes GPT 4 far more accurate for me: (Concat the rest of this 🧵 together and put in your custom instruction section) Image You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You carefully provide accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning. If you think there might not be a correct answer, you say so.
Jul 11, 2023 18 tweets 4 min read
I've spent the last few months interviewing >60 experts in law, economics, AI, alignment, etc, on the impacts of AI, and safety interventions.

Today I'm publishing my first article, showing regulation designed to increase AI safety may backfire badly!
fast.ai/posts/2023-11-… A new paper released today proposes various regulations designed to "ensure" safety of model *development*. The idea is to:
- Create standards for development and deployment of AI models, and
- Create mechanisms to ensure compliance with these standards.
arxiv.org/abs/2307.03718
May 31, 2023 11 tweets 3 min read
I teamed up with philosopher @sethlazar and AI impacts researcher @random_walker to investigate the "Statement on AI Risk" that proposes:

"Mitigating the risk of extinction from AI should be a global priority".

tl;dr: We're not convinced.🧵
fast.ai/posts/2023-05-… One thing I haven't seen mentioned elsewhere: the original request for people to sign the letter had the subject line "Invitation to join Hinton, Bengio & Amodei".

That's pretty powerful social status signaling being used to attract signatories. Image
May 4, 2023 11 tweets 5 min read
There's a new programming language in town - it's Mojo! I'm more than a little excited about it. It's Python, but with none of Python's problems.

You can write code as fast as C, and deploy small standalone applications like C.

My post is below, and a 🧵
fast.ai/posts/2023-05-… Python is the language that I have used for nearly all my work over the last few years. It is a beautiful language. It has an elegant core on which everything else is built.

But it comes with a downside: performance. It's thousands of times slower than C.
Apr 28, 2023 16 tweets 3 min read
I'm seeing a lot of people confused about this - asking: what exactly is the problem here? That's a great question!

Let's use this as a learning opportunity and dig in. 🧵 First, I've seen that one of the most common responses is that anyone criticising the original post clearly doesn't understand it and is ignorant of how language models work.

Aidan Gomez is an author of the Transformers paper, and is CEO of Cohere. I think he understands fine.
Apr 28, 2023 4 tweets 1 min read
Sometimes it feels like NLP papers prior to 2020 don't exist...

(Bidirectional autoregressive models have been common for many years, and were for instance used in ULMFiT.) Image AFAIK the first bidirectional RNN was from 1997. (Although it was popularised in Alex Grave's classic 2013 paper "Generating Sequences With Recurrent Neural Networks" I think.)
ieeexplore.ieee.org/document/650093
Apr 5, 2023 11 tweets 6 min read
Our new course, "From Deep Learning Foundations to Stable Diffusion", is finally done after 8 months of work!!!

With >30 hours of video content (all free, no ads!), you'll learn how to create and train a Stable Diffusion model starting from pure Python 🧵
fast.ai/posts/part2-20… This field was developing rapidly as we were developing and teaching the course, so many lessons include a walk-through of a paper that had just been released.

We also implement key papers that aren't in Stable Diffusion, such as Karras et al (2022)
arxiv.org/abs/2206.00364
Apr 3, 2023 25 tweets 7 min read
There's a lot of folks under the misunderstanding that it's now possible to run a 30B param LLM in <6GB, based on this GitHub discussion.

This is not the case. Understanding why gives us a chance to learn a lot of interesting stuff! 🧵
github.com/ggerganov/llam… The background is that the amazing @JustineTunney wrote this really cool commit for @ggerganov's llama.cpp, which modifies how llama models are loaded into memory to use mmap
github.com/ggerganov/llam…
Nov 21, 2022 8 tweets 3 min read
Intriguing new study from the amazing Adriaan Bax and team suggests that most covid deaths resulted from (preventable) snoring droplets rather than (unpreventable) microaspiration. This could be a game changer.

No time for the paper? Then read this 🧵!
sciencedirect.com/science/articl… Infection of the lung with SARS-CoV-2 is a two-step process: first the nose / throat, then the lungs. Postulated, but physically implausible, mechanism for step 2 involves “microaspiration” Image
Oct 24, 2022 7 tweets 4 min read
After just 2 weeks of the new @fastdotai course, our students are already making research advances in Stable Diffusion.

@sebderhy developed a novel yet simple modification to classifier-free guidance that gives better results (previous approach on left, new approach on right) Image @fastdotai @sebderhy I think in this case there's room to improve the results even further. The basic idea being tackled is that the "old way" of doing guidance actually increased the scale of the update (especially if the difference between conditional and unconditional embeddings is large)