The Full Stack Profile picture
Sep 20, 2022 13 tweets 8 min read Read on X
FSDL Lecture 7: Foundation Models is now available!

This lecture is 💯 new to the course.

We talk about building on Transformers, GPT-3, CLIP, StableDiffusion, and other foundation models.

Brief thread below.

The brave new world of large models is astonishing.

With scale, these models show emergent capabilities that seem truly magical.

At hundreds of billions of params, many GPUs are needed simply to load the model, and API-based access makes a lot of sense. Image
We start old-school, talking about the importance of embeddings and the concept of fine-tuning models. Image
Then we talk about the Transformer architecture, covering its three simple components:

· Self-attention
· Positional encoding
· Layer normalization Image
We cover the most notable LLMs:

· BERT
· GPT/GPT-2/GPT-3
· T5
· Instruct-GPT
· RETRO
· Chinchilla (and its Scaling Law implications) Image
We discuss LLM vendors such as @OpenAI, @CohereAI, @AI21Labs

as well as open-source projects such as BLOOM from @BigscienceW, GPT models from EleutherAI, and OPT from @MetaAI

and ways to host inference such as @huggingface. Image
Drawing on excellent GPT-3 wrangling by @goodside, @npew, and others, we share some prompt engineering tricks:

· Tokenization effects
· Scratch-pad
· "Let's think step by step"
· Formatting tricks
· Prompt injection attacks Image
Code generation is an incredible application of LLMs.

We share results from @DeepMind AlphaCode, @OpenAI Codex and math problem solving work, and thoroughly stan @github copilot.

We also show that good old GPT-3 is perfectly capable of writing code 😎

The future is with cross-modal applications of LLMs, and we cover results such as Flamingo from @DeepMind and Socratic Models from @GoogleAI. Image
Lastly, we talk about the joint embedding of text and images unleashed by CLIP from @OpenAI.

While CLIP alone does not allow going from image to text and vice versa, follow-up work does. Image
And that's what we cover next: the unCLIP (#dalle2) model, as well as #stablediffusion.

We cover diffusion models, the role of the "prior," and U-Nets for image generation. Image
Open-source datasets and models have kicked off a true explosion of activity in image/video generation. We're excited to see what related projects come out from our synchronous FSDL cohort! Image
The world of AI has never been more exciting than right now. It feels like we've crested a hill and can see a beautiful new landscape all around. There's a ton to build, and we're excited to help you do it!

Follow us here and follow along at fullstackdeeplearning.com/course/2022 Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with The Full Stack

The Full Stack Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @full_stack_dl

Jul 25, 2023
Is it the revenge of recurrent nets? Is it a subquadratic Transformer?

It's both, it's neither, it's RWKV: @BlinkDL_AI's novel architecture that infers efficiently like an RNN but matches Transformer quality -- so far.

Deep dive by @charles_irl:

fullstackdeeplearning.com/blog/posts/rwk…
> What is RWKV?

Typical RNNs are like a for loop that can't be vectorized, which hurts parallelization during training.

RWKV cleverly resolves this with a layer that works like an RNN cell when it's run step by step, but can be computed all at once like Transformer attention.
And unlike many other alternatives to Transformers, it gets comparable language modeling performance up to the largest scales tested: 14B params, 300B tokens. Image
Read 9 tweets
May 25, 2023
🆕 LLM Bootcamp videos are now available!

Check out our awesome invited speakers:

🏋🏻 @truerezashabani walks us through training LLMs at @Replit
🕵🏽 @hwchase17 talks about building agents with @LangChainAI
🔥 @npew talks about the path to @OpenAI ChatGPT
@truerezashabani led the team that trained the new bespoke code completion models at @Replit.

He breaks down
· The Modern LLM Stack™️
· What makes a good "LLM engineer"
· The importance of knowing and cleaning your data

fullstackdeeplearning.com/llm-bootcamp/s…
In the "agent" pattern, LLMs are given memory, access to tools, and goals.

@hwchase17, founder of the most popular LLM framework @LangChainAI, shares exciting recent research results and the gnarly challenges facing agents in production.

fullstackdeeplearning.com/llm-bootcamp/s…
Read 5 tweets
May 23, 2023
🥞🦜 LLM Bootcamp 🦜🥞

Today, let's talk about UX.

tl;dr: LLMs unlock new user interaction design patterns based on language user interfaces (LUIs). But the same principles of user-centered design still apply! Image
Since the inception of computing programmers & designers have dreamed of interfacing with computers via language as naturally as we interface with each other.

Proof-of-concepts for such language user interfaces date back to the 60s and recur repeatedly.

LLMs make LUIs possible. Image
A paradigm shift in user interfaces makes for a great time to build ambitious applications!

But because language models (and ML in general) come from the math-ier side of engineering, lots of folks are less familiar with the principles that guide user interaction design.
Read 16 tweets
May 16, 2023
🥞🦜 LLM Bootcamp 🦜🥞

Today, let's talk about prompt engineering.

tl;dr Effective prompting requires some intuition about language models, but there's an emerging playbook of general techniques.
First off: What is a "prompt"? What is "prompt engineering"?

The prompt is the text that goes into your language model.

Prompt engineering is the design of that text: how is it formatted, what information is in it, and what "magic words" are included.
So, what are some high-level intuitions for prompting?

First of all, the idea that LMs are "just statistical models of text," while literally true, leads to bad intuition that underestimates what they can do. Image
Read 13 tweets
Apr 17, 2023
🦜 LLM Lit Review 🦜

Over the last two weeks, we tweeted out twelve papers we love in the world of language modeling, from agent simulation and browser automation to BERTology and artificial cognitive science.

Here they are, collected in a single 🧵 for your convenience.
1/12 - Reynolds and McDonell, 2021. "Prompt Programming for LLMs: Beyond the Few-Shot Paradigm"

The OG Prompt Engineering paper -- formatting ticks, agent sim, and chain-of-thought, before they were cool
2/12 - Nye, Andreassen et al., 2021. "Show Your Work: Scratchpads for Intermediate Computation with LMs"

An early demonstration of unlocking LM capabilities by giving them control of their context
Read 14 tweets
Feb 21, 2023
Whatever our thoughts on chat _bots_, we enjoyed our chat with @hwchase17 of @LangChainAI on the most recent FSDL Tool Talk!
@charles_irl started us off with an overview of why we need LLM frameworks, then after a demo of how to use LangChain to do Q&A over the LangChain docs we did some live Q&A -- humans only.
You can catch the whole video on YouTube here:

Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(