Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Jeremy Howard

@jeremyphoward

Apr 28 • 16 tweets • 3 min read Twitter logo

Read on Twitter

https://twitter.com/aidangomezzz/status/1651053357719535622

I'm seeing a lot of people confused about this - asking: what exactly is the problem here? That's a great question!

Let's use this as a learning opportunity and dig in. 🧵

https://twitter.com/aidangomezzz/status/1651053357719535622

First, I've seen that one of the most common responses is that anyone criticising the original post clearly doesn't understand it and is ignorant of how language models work.

Aidan Gomez is an author of the Transformers paper, and is CEO of Cohere. I think he understands fine.

So why haven't we seen clear explanations of why "checking for sudden drops in the loss function and suspending training" comment is so ludicrous?

Well, the problem is that it's such a bizarre idea that it's not even wrong. It's nonsensical. Which makes it hard to refute.

To understand why, we need to understand how these models work. There are two key steps: training, and inference.

Training a model involves calculating the derivatives of the loss function with respect to the weights, and using those to update the weights to decrease loss.

Inference involves taking a model that has gone through the above process (called "back propagation") many times and then calculating activations from that trained model using new data.

Neither training nor inference can have any immediate impact on the world. They are simply calculating the parameters of a mathematical function, or using those parameters to calculate the result of a function.

Therefore, we don't need to check for sudden drops in the loss function and suspend training, because the training process has no immediate impact on the outside world.

The only time that a model can impact anything is when it's *deployed* - that is, it's made available to people or directly to external systems, being provided data, making calculations, and then those results being used in some way.

So in practice, the way that models are *always* deployed is that after training, they are tested, to see how they operate on new data, and how their outputs work when used in some process.

Now of course, if we'd seen during training that our new model has much lower loss than we've seen before, whilst we wouldn't "suspend training", we would of course check the model's practical performance extra carefully. After all, maybe it was a bug? Or maybe it's more capable?

But saying "we should test our trained models before deploying them" is telling no-one anything new whatsoever. We all know that, and we all do that.

Figuring out better ways to test models before deployment is an active and rich research area.

OTOH, "check for sudden drops in the loss function and suspend training" sounds much more exciting.

Problem is, it's not connected with the real world at all.

Some folks have pointed out that "drops in the loss function" is a pretty odd way to phrase things. It's actually just "drops in the loss".

An AI researcher saying "drops in the loss function" is a bit like a banker saying "ATM machine" - maybe a slip, maybe incompetence.

PS: please don't respond to this thread with "OK the exact words don't make sense, but if we wave our hands we can imagine he really meant some different set of words that if we squint kinda do make sense".

I don't know why some folks respond like this *every* *single* *time*.

PPS: None of this is to make any claim as to the urgency or importance of working on AI alignment. However, if you believe AI alignment is important work, I hope you'll agree that it's worth discussing with intellectual rigor and with a firm grounding of basic principals.

*principles

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Read 7 tweets

Jeremy Howard

@jeremyphoward

Oct 20, 2022

I got a special surprise for you all...

We just released the first 5.5 hours of our new course "From Deep Learning Foundations to Stable Diffusion", for free!
fast.ai/posts/part2-20…

Lesson 9 starts with a tutorial on how to use pipelines in the Diffusers library to generate images. We show some nifty tweaks like guidance scale and textual inversion.

The second half of the lesson shows the key concepts involved in Stable Diffusion.

@johnowhitaker

Lesson 9A (by @johnowhitaker) shows what is happening behind the scenes, looking at the components and processes and how each can be modified for control over generation.

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Jeremy Howard

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @jeremyphoward

Jeremy Howard

Jeremy Howard

Jeremy Howard

Jeremy Howard

Jeremy Howard

Jeremy Howard

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!