FSDL Lecture 7: Foundation Models is now available!

This lecture is 💯 new to the course.

We talk about building on Transformers, GPT-3, CLIP, StableDiffusion, and other foundation models.

Brief thread below.

The brave new world of large models is astonishing.

With scale, these models show emergent capabilities that seem truly magical.

At hundreds of billions of params, many GPUs are needed simply to load the model, and API-based access makes a lot of sense. Image
We start old-school, talking about the importance of embeddings and the concept of fine-tuning models. Image
Then we talk about the Transformer architecture, covering its three simple components:

· Self-attention
· Positional encoding
· Layer normalization Image
We cover the most notable LLMs:

· BERT
· GPT/GPT-2/GPT-3
· T5
· Instruct-GPT
· RETRO
· Chinchilla (and its Scaling Law implications) Image
We discuss LLM vendors such as @OpenAI, @CohereAI, @AI21Labs

as well as open-source projects such as BLOOM from @BigscienceW, GPT models from EleutherAI, and OPT from @MetaAI

and ways to host inference such as @huggingface. Image
Drawing on excellent GPT-3 wrangling by @goodside, @npew, and others, we share some prompt engineering tricks:

· Tokenization effects
· Scratch-pad
· "Let's think step by step"
· Formatting tricks
· Prompt injection attacks Image
Code generation is an incredible application of LLMs.

We share results from @DeepMind AlphaCode, @OpenAI Codex and math problem solving work, and thoroughly stan @github copilot.

We also show that good old GPT-3 is perfectly capable of writing code 😎

The future is with cross-modal applications of LLMs, and we cover results such as Flamingo from @DeepMind and Socratic Models from @GoogleAI. Image
Lastly, we talk about the joint embedding of text and images unleashed by CLIP from @OpenAI.

While CLIP alone does not allow going from image to text and vice versa, follow-up work does. Image
And that's what we cover next: the unCLIP (#dalle2) model, as well as #stablediffusion.

We cover diffusion models, the role of the "prior," and U-Nets for image generation. Image
Open-source datasets and models have kicked off a true explosion of activity in image/video generation. We're excited to see what related projects come out from our synchronous FSDL cohort! Image
The world of AI has never been more exciting than right now. It feels like we've crested a hill and can see a beautiful new landscape all around. There's a ton to build, and we're excited to help you do it!

Follow us here and follow along at fullstackdeeplearning.com/course/2022 Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Full Stack Deep Learning

Full Stack Deep Learning Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @full_stack_dl

Sep 6
FSDL Lecture 6: Deployment is now live!

This lecture covers a critical step: getting your model into prod.

The key message is similar to our philosophy in other parts of the ML workflow:

Start simple, add complexity as you need it.

fullstackdeeplearning.com/course/2022/le… Image
When it's time to deploy, the first step is to create a prototype you and your friends / teammates can interact with.

@Gradio, @huggingface, and @streamlit are your friends at this stage.

You do want this to have a basic UI and be hosted behind a webserver to reduce friction. Image
This is an example of the model-in-service deployment paradigm, where you just embed your model in your webserver.

It's simple to implement, but will run into issues as you scale because models and web servers scale differently. Image
Read 13 tweets
Sep 1
🧪 FSDL Lab 6: Data Annotation 🧪

Try out @LabelStudioHQ and see how the tasty Tensor sausage gets made out of data chuck with our latest lab notebook and video!

Image
However much you care about data, you should probably care more.

High-quality data is still a major differentiator for ML app quality.

And good understanding of the data is a major differentiator for ML engineer quality! ImageImage
Throughout the labs, we've been building up a neural network capable of basic OCR. We've focused on model architectures, on training frameworks, on experiment management, and on software engineering infra.

In this lab, we tackle the data. Image
Read 5 tweets
Aug 31
📀 FSDL Lecture 4: Data Management 📀

The key message is simple enough: become one with the data, and don't overcomplicate things 🙃.

Find the video and notes on our website, and check out the thread below for some condensed learnings first.

fullstackdeeplearning.com/course/2022/le… Image
First, we talk about data storage.

• Speed and bandwidth of disks varies a lot, so use NVMe SSDs
• Store binary data in standard formats like JPGs
• Store metadata and text as JSON or Parquet
• Databases are the best tool for deep work with structured data Image
• When it comes to data warehouses/lakes, there's a bunch of jargon that could be helpful to know
• The basic takeaway is that data lakes are great if you need to aggregate different data sources at scale
@SnowflakeDB and @databricks are the leading solutions Image
Read 8 tweets
Aug 23
FSDL Lecture 3: Troubleshooting & Testing is now live!

We cover:
• how to design software tests
• recommended tooling for testing and code quality assurance
• how to test ML systems, the easy and the hard way
• how to debug neural networks

(Link below)
The lecture video by @charles_irl is at

As always, our recommendations are specific and actionable. We recommend testing docstring code with doctests and quick-and-dirty notebook testing with nbformat.
We share a perspective on testing from @nelhage: test suites are like classifiers, classifying code updates as "acceptable" or "unacceptable".

In ML, we design classifiers to trade off precision and recall.

What does that mean for designing test suites?

blog.nelhage.com/post/test-suit…
Read 11 tweets
Aug 18
In the latest FSDL lab notebook and video, we walk through why experiment management is so important for building awesome ML-powered products and how you can do it with @weights_biases.

🔗:
In the first half of the course, we survey how to train models, ingest and store data, and put models in production.

Or, as @sergeykarayev puts it in lecture 2: "Development, Data, Deployment".

We're covering development now.

Because applied deep learning is still fairly new, model development is a messy iterative process that's more akin to experimentation than engineering.

But that's not an excuse for carelessness!
Read 11 tweets
Aug 16
FSDL Lecture 2: Development Infrastructure & Tooling is now live!

We cover what you need to know about:
• software engineering
• deep learning frameworks
• distributed training
• GPUs (cloud and on-prem)
• experiment management

(Link below)
The lecture video by @sergeykarayev is at

We aim to be specific and make actionable recommendations.

For example, we think the debate between modules and notebooks is that you should write code in modules, and import them into notebooks 😃
We think PyTorch is the clear pragmatic choice in 2022, but that Tensorflow is also great, and Jax could be an excellent choice if you're going off the beaten path.
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(