Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Tommy

May 28, 2023 • 22 tweets • 9 min read • Read on X

Scrolly

Stephen Wolfram of Wolfram Alpha wrote the absolute best post on ChatGPT and Large Language Models.

It took me about two hours to read, but significantly increased my understanding of what's going on under the hood of ChatGPT.

A few of my favorite takeaways (helps my process)

The goal of a large language model is to reasonably continue the text it already has

ChatGPT's LLM estimates these probabilities

Temperature is a parameter that determines how often lower ranked words are used, adding randomness.

LLM's are trained on vast amounts of human text

Where to these probabilities come from?

ChatGPT is a model that lets people estimate the probabilities which sequences of words should occur.

Stephen adds an interesting walk through demonstrating the probability of how often letters occur, and then pairs of letters and beyond

Neutral Nets are similar to a human brain

The brain has 100B neurons and are connected to ~1,000 other Neurons

A neuron pulses depending on what pulses it gets from other Neurons all with their own connections.

This contributes to different weights in the model. Viola!

Neural Net Explanation

- Neurons arranged in layers
- Each Neuron has a weight (significance)
- ML is first used to find the weights
- Neuron evaluates numerical function
- Input is fed and neurons at each layer evaluate and feeds results to next layer
- End result is reached

Larger networks do better at landing on results.

In the below image the goal is to take in a point and recognize it in one of the three regions.

I laughed when Stephen said at the boundaries it has trouble "making up its mind". Much human.

Unsure results could be dangerous

Training Neural Nets

The goal is to feed a zillion examples, and find weights that reproduce the examples.

Everytime an example is used, the weights are adjusted throughout the model.

Training is really expensive and computationally intensive.

Now how are the weights adjusted?

Stephen describes that the model uses a Loss Function.

The goal of adjusting the weights is to to reduce the loss function, or how far away your output is from the intended result based upon the examples.

More data, lower loss function

One of the most counterintuitive takeaways that with Neural Nets it's easier to solve more complicated problems than simpler ones.

That's good too since I'm dumb and need help with the complicated problems in life.

I'll let Stephen take it from here:

ChatGPT has an easier time training since it can conduct "Unsupervised Learning"

- ChatGPT gets text (masks end)
- Use probabilities to get the end of the sentence
- Use this as a training input
- Output is the complete piece of text

TLDR it's easier to get examples to train

Summing this all up, Stephen shares an image showing the training process for a neural net and how the loss function should decrease over time.

If the loss eventually streamlines, yay you have a solid model

If it not you can't rely on it and it's time to change the architecture

@stephen_wolfram

ChatGPT is often extrapolated as a path to Terminators

Stephen counters that the magic of LLMs for writing really isn't that hard.

We're not closer to terminators, writing essays just isn't as hard as we think.

@stephen_wolfram plz share more on NN's replacing humans (pic 2)

Embeddings

Embeddings are laying out words, represented by numbers, to those they are commonly associated with

Probabilities are found using vast amounts of text

Embeddings give a more natural feel to ChatGPT since words that are commonly associated with each other can be used

Onto ChatGPT!

ChatGPT+ is a giant Neural Net with 100 Trillion Parameters (GPT3 had 175B) focused on language.

That 1,000x the parameters of the brain. Woof.

The most important feature is the Transformer

@ilblackdragon

An interesting side note for the Crypto audience.

Crypto's own @ilblackdragon, the co-founder of @NEARProtocol, is one of the authors on the original Transformers Paper

arxiv.org/abs/1706.03762

First let's recap ChatGPT's process:

ChatGPT's Process

- Takes text
- Finds embeddings (numbers to represent text)
- Processes (values go through layers of the NN)
- New embedding produced (new number array)
- Takes array and generates 50,000 values for next possible tokens
- Highest prob produces text (I think)

Transformers are a breakthrough for LLMs.

An analogy is they allow the model to understand the context of words and the relationship between words that are far apart

Transformers can read all text at once vs one at a time so are much more efficient and scalable

Thanks ChatGPT!

Meaning Space

Stephen shares that in ChatGPT, text is represented by an array of numbers in a meaning space.

He goes on to describe that the trajectory of what words come next is far from a mathematical or physics like law we can rest our hats on.

So is ChatGPT similar to a human brain?

His conclusions:

- The neural net architecture may be similar
- Training of LLMs way less efficient vs human brain
- ChatGPT has no loops to go back and recompute data like humans can which severely limits its computational capability

I am not an AI researcher but the post made me realize LLM's are nowhere near the AGI or Terminator level AI intelligence some fear

Of course it's on the path, but LLM's are probabilistic models focused on continuing sentences.

They are really good at it, but not AGI (yet)

@stephen_wolfram

I think it's incredibly cool that a gigabrain like @stephen_wolfram would open source his thinking on ChatGPT

This has been the single best resource I've found so far on learning about @OpenAI's ChatGPT, LLMs and Neural Nets

Disc. I def got things wrong

writings.stephenwolfram.com/2023/02/what-i…

@stephen_wolfram

Also @stephen_wolfram if you're ever interested in a long form podcast to walk through your thoughts, we'd love to host you on @Delphi_Digital's podcast!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Read 13 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Tommy

Try unrolling a thread yourself!

More from @Shaughnessy119

Tommy

Tommy

Tommy

Tommy

Tommy

Tommy

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!