Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

The Full Stack

Sep 20, 2022 • 13 tweets • 8 min read • Read on X

https://twitter.com/sergeykarayev/status/1572027734276345858

FSDL Lecture 7: Foundation Models is now available!

This lecture is 💯 new to the course.

We talk about building on Transformers, GPT-3, CLIP, StableDiffusion, and other foundation models.

Brief thread below.

https://twitter.com/sergeykarayev/status/1572027734276345858

The brave new world of large models is astonishing.

With scale, these models show emergent capabilities that seem truly magical.

At hundreds of billions of params, many GPUs are needed simply to load the model, and API-based access makes a lot of sense.

We start old-school, talking about the importance of embeddings and the concept of fine-tuning models.

Then we talk about the Transformer architecture, covering its three simple components:

· Self-attention
· Positional encoding
· Layer normalization

We cover the most notable LLMs:

· BERT
· GPT/GPT-2/GPT-3
· T5
· Instruct-GPT
· RETRO
· Chinchilla (and its Scaling Law implications)

@OpenAI

We discuss LLM vendors such as @OpenAI, @CohereAI, @AI21Labs

as well as open-source projects such as BLOOM from @BigscienceW, GPT models from EleutherAI, and OPT from @MetaAI

and ways to host inference such as @huggingface.

@goodside

Drawing on excellent GPT-3 wrangling by @goodside, @npew, and others, we share some prompt engineering tricks:

· Tokenization effects
· Scratch-pad
· "Let's think step by step"
· Formatting tricks
· Prompt injection attacks

@DeepMind

Code generation is an incredible application of LLMs.

We share results from @DeepMind AlphaCode, @OpenAI Codex and math problem solving work, and thoroughly stan @github copilot.

We also show that good old GPT-3 is perfectly capable of writing code 😎

https://twitter.com/sergeykarayev/status/1570848080941154304

@DeepMind

The future is with cross-modal applications of LLMs, and we cover results such as Flamingo from @DeepMind and Socratic Models from @GoogleAI.

@OpenAI

Lastly, we talk about the joint embedding of text and images unleashed by CLIP from @OpenAI.

While CLIP alone does not allow going from image to text and vice versa, follow-up work does.

And that's what we cover next: the unCLIP (#dalle2) model, as well as #stablediffusion.

We cover diffusion models, the role of the "prior," and U-Nets for image generation.

Open-source datasets and models have kicked off a true explosion of activity in image/video generation. We're excited to see what related projects come out from our synchronous FSDL cohort!

The world of AI has never been more exciting than right now. It feels like we've crested a hill and can see a beautiful new landscape all around. There's a ton to build, and we're excited to help you do it!

Follow us here and follow along at fullstackdeeplearning.com/course/2022

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @full_stack_dl

The Full Stack

@full_stack_dl

Jul 25, 2023

Is it the revenge of recurrent nets? Is it a subquadratic Transformer?

It's both, it's neither, it's RWKV: @BlinkDL_AI's novel architecture that infers efficiently like an RNN but matches Transformer quality -- so far.

Deep dive by @charles_irl:

fullstackdeeplearning.com/blog/posts/rwk…

> What is RWKV?

Typical RNNs are like a for loop that can't be vectorized, which hurts parallelization during training.

RWKV cleverly resolves this with a layer that works like an RNN cell when it's run step by step, but can be computed all at once like Transformer attention.

And unlike many other alternatives to Transformers, it gets comparable language modeling performance up to the largest scales tested: 14B params, 300B tokens.

Read 9 tweets

The Full Stack

@full_stack_dl

May 25, 2023

@truerezashabani

🆕 LLM Bootcamp videos are now available!

Check out our awesome invited speakers:

🏋🏻 @truerezashabani walks us through training LLMs at @Replit
🕵🏽 @hwchase17 talks about building agents with @LangChainAI
🔥 @npew talks about the path to @OpenAI ChatGPT

@truerezashabani

@truerezashabani led the team that trained the new bespoke code completion models at @Replit.

He breaks down
· The Modern LLM Stack™️
· What makes a good "LLM engineer"
· The importance of knowing and cleaning your data

fullstackdeeplearning.com/llm-bootcamp/s…

@hwchase17

In the "agent" pattern, LLMs are given memory, access to tools, and goals.

@hwchase17, founder of the most popular LLM framework @LangChainAI, shares exciting recent research results and the gnarly challenges facing agents in production.

fullstackdeeplearning.com/llm-bootcamp/s…

Read 5 tweets

The Full Stack

@full_stack_dl

May 23, 2023

🥞🦜 LLM Bootcamp 🦜🥞

Today, let's talk about UX.

tl;dr: LLMs unlock new user interaction design patterns based on language user interfaces (LUIs). But the same principles of user-centered design still apply!

Since the inception of computing programmers & designers have dreamed of interfacing with computers via language as naturally as we interface with each other.

Proof-of-concepts for such language user interfaces date back to the 60s and recur repeatedly.

LLMs make LUIs possible.

A paradigm shift in user interfaces makes for a great time to build ambitious applications!

But because language models (and ML in general) come from the math-ier side of engineering, lots of folks are less familiar with the principles that guide user interaction design.

Read 16 tweets

The Full Stack

@full_stack_dl

May 16, 2023

🥞🦜 LLM Bootcamp 🦜🥞

Today, let's talk about prompt engineering.

tl;dr Effective prompting requires some intuition about language models, but there's an emerging playbook of general techniques.

First off: What is a "prompt"? What is "prompt engineering"?

The prompt is the text that goes into your language model.

Prompt engineering is the design of that text: how is it formatted, what information is in it, and what "magic words" are included.

So, what are some high-level intuitions for prompting?

First of all, the idea that LMs are "just statistical models of text," while literally true, leads to bad intuition that underestimates what they can do.

Read 13 tweets

The Full Stack

@full_stack_dl

Apr 17, 2023

🦜 LLM Lit Review 🦜

Over the last two weeks, we tweeted out twelve papers we love in the world of language modeling, from agent simulation and browser automation to BERTology and artificial cognitive science.

Here they are, collected in a single 🧵 for your convenience.

https://twitter.com/full_stack_dl/status/1640738021854310401

1/12 - Reynolds and McDonell, 2021. "Prompt Programming for LLMs: Beyond the Few-Shot Paradigm"

The OG Prompt Engineering paper -- formatting ticks, agent sim, and chain-of-thought, before they were cool

https://twitter.com/full_stack_dl/status/1640738021854310401

https://twitter.com/full_stack_dl/status/1641100413457035269

2/12 - Nye, Andreassen et al., 2021. "Show Your Work: Scratchpads for Intermediate Computation with LMs"

An early demonstration of unlocking LM capabilities by giving them control of their context

https://twitter.com/full_stack_dl/status/1641100413457035269

Read 14 tweets

The Full Stack

@full_stack_dl

Feb 21, 2023

@hwchase17

Whatever our thoughts on chat _bots_, we enjoyed our chat with @hwchase17 of @LangChainAI on the most recent FSDL Tool Talk!

@charles_irl

@charles_irl started us off with an overview of why we need LLM frameworks, then after a demo of how to use LangChain to do Q&A over the LangChain docs we did some live Q&A -- humans only.

You can catch the whole video on YouTube here:

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

The Full Stack

Try unrolling a thread yourself!

More from @full_stack_dl

The Full Stack

The Full Stack

The Full Stack

The Full Stack

The Full Stack

The Full Stack

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!