Tweet

Full Stack Deep Learning

Sep 20 • 13 tweets • 8 min read

https://twitter.com/sergeykarayev/status/1572027734276345858

FSDL Lecture 7: Foundation Models is now available!

This lecture is 💯 new to the course.

We talk about building on Transformers, GPT-3, CLIP, StableDiffusion, and other foundation models.

Brief thread below.

https://twitter.com/sergeykarayev/status/1572027734276345858

The brave new world of large models is astonishing.

With scale, these models show emergent capabilities that seem truly magical.

At hundreds of billions of params, many GPUs are needed simply to load the model, and API-based access makes a lot of sense.

We start old-school, talking about the importance of embeddings and the concept of fine-tuning models.

Then we talk about the Transformer architecture, covering its three simple components:

· Self-attention
· Positional encoding
· Layer normalization

We cover the most notable LLMs:

· BERT
· GPT/GPT-2/GPT-3
· T5
· Instruct-GPT
· RETRO
· Chinchilla (and its Scaling Law implications)

@OpenAI

We discuss LLM vendors such as @OpenAI, @CohereAI, @AI21Labs

as well as open-source projects such as BLOOM from @BigscienceW, GPT models from EleutherAI, and OPT from @MetaAI

and ways to host inference such as @huggingface.

@goodside

Drawing on excellent GPT-3 wrangling by @goodside, @npew, and others, we share some prompt engineering tricks:

· Tokenization effects
· Scratch-pad
· "Let's think step by step"
· Formatting tricks
· Prompt injection attacks

@DeepMind

Code generation is an incredible application of LLMs.

We share results from @DeepMind AlphaCode, @OpenAI Codex and math problem solving work, and thoroughly stan @github copilot.

We also show that good old GPT-3 is perfectly capable of writing code 😎

https://twitter.com/sergeykarayev/status/1570848080941154304

@DeepMind

The future is with cross-modal applications of LLMs, and we cover results such as Flamingo from @DeepMind and Socratic Models from @GoogleAI.

@OpenAI

Lastly, we talk about the joint embedding of text and images unleashed by CLIP from @OpenAI.

While CLIP alone does not allow going from image to text and vice versa, follow-up work does.

And that's what we cover next: the unCLIP (#dalle2) model, as well as #stablediffusion.

We cover diffusion models, the role of the "prior," and U-Nets for image generation.

Open-source datasets and models have kicked off a true explosion of activity in image/video generation. We're excited to see what related projects come out from our synchronous FSDL cohort!

The world of AI has never been more exciting than right now. It feels like we've crested a hill and can see a beautiful new landscape all around. There's a ton to build, and we're excited to help you do it!

Follow us here and follow along at fullstackdeeplearning.com/course/2022

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @full_stack_dl

Full Stack Deep Learning

@full_stack_dl

Sep 6

FSDL Lecture 6: Deployment is now live!

This lecture covers a critical step: getting your model into prod.

The key message is similar to our philosophy in other parts of the ML workflow:

Start simple, add complexity as you need it.

fullstackdeeplearning.com/course/2022/le…

@Gradio

When it's time to deploy, the first step is to create a prototype you and your friends / teammates can interact with.

@Gradio, @huggingface, and @streamlit are your friends at this stage.

You do want this to have a basic UI and be hosted behind a webserver to reduce friction.

This is an example of the model-in-service deployment paradigm, where you just embed your model in your webserver.

It's simple to implement, but will run into issues as you scale because models and web servers scale differently.

Read 13 tweets

Full Stack Deep Learning

@full_stack_dl

Sep 1

@LabelStudioHQ

🧪 FSDL Lab 6: Data Annotation 🧪

Try out @LabelStudioHQ and see how the tasty Tensor sausage gets made out of data chuck with our latest lab notebook and video!

However much you care about data, you should probably care more.

High-quality data is still a major differentiator for ML app quality.

And good understanding of the data is a major differentiator for ML engineer quality!

Throughout the labs, we've been building up a neural network capable of basic OCR. We've focused on model architectures, on training frameworks, on experiment management, and on software engineering infra.

In this lab, we tackle the data.

Read 5 tweets

Full Stack Deep Learning

@full_stack_dl

Aug 31

📀 FSDL Lecture 4: Data Management 📀

The key message is simple enough: become one with the data, and don't overcomplicate things 🙃.

Find the video and notes on our website, and check out the thread below for some condensed learnings first.

fullstackdeeplearning.com/course/2022/le…

First, we talk about data storage.

• Speed and bandwidth of disks varies a lot, so use NVMe SSDs
• Store binary data in standard formats like JPGs
• Store metadata and text as JSON or Parquet
• Databases are the best tool for deep work with structured data

@SnowflakeDB

• When it comes to data warehouses/lakes, there's a bunch of jargon that could be helpful to know
• The basic takeaway is that data lakes are great if you need to aggregate different data sources at scale
• @SnowflakeDB and @databricks are the leading solutions

Read 8 tweets

Full Stack Deep Learning

@full_stack_dl

Aug 23

FSDL Lecture 3: Troubleshooting & Testing is now live!

We cover:
• how to design software tests
• recommended tooling for testing and code quality assurance
• how to test ML systems, the easy and the hard way
• how to debug neural networks

(Link below)

@charles_irl

The lecture video by @charles_irl is at

As always, our recommendations are specific and actionable. We recommend testing docstring code with doctests and quick-and-dirty notebook testing with nbformat.

@nelhage

We share a perspective on testing from @nelhage: test suites are like classifiers, classifying code updates as "acceptable" or "unacceptable".

In ML, we design classifiers to trade off precision and recall.

What does that mean for designing test suites?

blog.nelhage.com/post/test-suit…

Read 11 tweets

Full Stack Deep Learning

@full_stack_dl

Aug 18

@weights_biases

In the latest FSDL lab notebook and video, we walk through why experiment management is so important for building awesome ML-powered products and how you can do it with @weights_biases.

🔗:

@sergeykarayev

In the first half of the course, we survey how to train models, ingest and store data, and put models in production.

Or, as @sergeykarayev puts it in lecture 2: "Development, Data, Deployment".

We're covering development now.

Because applied deep learning is still fairly new, model development is a messy iterative process that's more akin to experimentation than engineering.

But that's not an excuse for carelessness!

Read 11 tweets

Full Stack Deep Learning

@full_stack_dl

Aug 16

FSDL Lecture 2: Development Infrastructure & Tooling is now live!

We cover what you need to know about:
• software engineering
• deep learning frameworks
• distributed training
• GPUs (cloud and on-prem)
• experiment management

(Link below)

@sergeykarayev

The lecture video by @sergeykarayev is at

We aim to be specific and make actionable recommendations.

For example, we think the debate between modules and notebooks is that you should write code in modules, and import them into notebooks 😃

We think PyTorch is the clear pragmatic choice in 2022, but that Tensorflow is also great, and Jax could be an excellent choice if you're going off the beaten path.

Read 17 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Separate emails with commas Message

Share this page!

Full Stack Deep Learning

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @full_stack_dl

Full Stack Deep Learning

Full Stack Deep Learning

Full Stack Deep Learning

Full Stack Deep Learning

Full Stack Deep Learning

Full Stack Deep Learning

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!