Stephen Wolfram of Wolfram Alpha wrote the absolute best post on ChatGPT and Large Language Models.
It took me about two hours to read, but significantly increased my understanding of what's going on under the hood of ChatGPT.
A few of my favorite takeaways (helps my process)
The goal of a large language model is to reasonably continue the text it already has
ChatGPT's LLM estimates these probabilities
Temperature is a parameter that determines how often lower ranked words are used, adding randomness.
LLM's are trained on vast amounts of human text
Where to these probabilities come from?
ChatGPT is a model that lets people estimate the probabilities which sequences of words should occur.
Stephen adds an interesting walk through demonstrating the probability of how often letters occur, and then pairs of letters and beyond
Neutral Nets are similar to a human brain
The brain has 100B neurons and are connected to ~1,000 other Neurons
A neuron pulses depending on what pulses it gets from other Neurons all with their own connections.
This contributes to different weights in the model. Viola!
Neural Net Explanation
- Neurons arranged in layers
- Each Neuron has a weight (significance)
- ML is first used to find the weights
- Neuron evaluates numerical function
- Input is fed and neurons at each layer evaluate and feeds results to next layer
- End result is reached
Larger networks do better at landing on results.
In the below image the goal is to take in a point and recognize it in one of the three regions.
I laughed when Stephen said at the boundaries it has trouble "making up its mind". Much human.
Unsure results could be dangerous
Training Neural Nets
The goal is to feed a zillion examples, and find weights that reproduce the examples.
Everytime an example is used, the weights are adjusted throughout the model.
Training is really expensive and computationally intensive.
Now how are the weights adjusted?
Stephen describes that the model uses a Loss Function.
The goal of adjusting the weights is to to reduce the loss function, or how far away your output is from the intended result based upon the examples.
More data, lower loss function
One of the most counterintuitive takeaways that with Neural Nets it's easier to solve more complicated problems than simpler ones.
That's good too since I'm dumb and need help with the complicated problems in life.
I'll let Stephen take it from here:
ChatGPT has an easier time training since it can conduct "Unsupervised Learning"
- ChatGPT gets text (masks end)
- Use probabilities to get the end of the sentence
- Use this as a training input
- Output is the complete piece of text
TLDR it's easier to get examples to train
Summing this all up, Stephen shares an image showing the training process for a neural net and how the loss function should decrease over time.
If the loss eventually streamlines, yay you have a solid model
If it not you can't rely on it and it's time to change the architecture
ChatGPT is often extrapolated as a path to Terminators
Stephen counters that the magic of LLMs for writing really isn't that hard.
We're not closer to terminators, writing essays just isn't as hard as we think.
@stephen_wolfram plz share more on NN's replacing humans (pic 2)
Embeddings
Embeddings are laying out words, represented by numbers, to those they are commonly associated with
Probabilities are found using vast amounts of text
Embeddings give a more natural feel to ChatGPT since words that are commonly associated with each other can be used
Onto ChatGPT!
ChatGPT+ is a giant Neural Net with 100 Trillion Parameters (GPT3 had 175B) focused on language.
That 1,000x the parameters of the brain. Woof.
The most important feature is the Transformer
An interesting side note for the Crypto audience.
Crypto's own @ilblackdragon, the co-founder of @NEARProtocol, is one of the authors on the original Transformers Paper
- Takes text
- Finds embeddings (numbers to represent text)
- Processes (values go through layers of the NN)
- New embedding produced (new number array)
- Takes array and generates 50,000 values for next possible tokens
- Highest prob produces text (I think)
Transformers are a breakthrough for LLMs.
An analogy is they allow the model to understand the context of words and the relationship between words that are far apart
Transformers can read all text at once vs one at a time so are much more efficient and scalable
Thanks ChatGPT!
Meaning Space
Stephen shares that in ChatGPT, text is represented by an array of numbers in a meaning space.
He goes on to describe that the trajectory of what words come next is far from a mathematical or physics like law we can rest our hats on.
So is ChatGPT similar to a human brain?
His conclusions:
- The neural net architecture may be similar
- Training of LLMs way less efficient vs human brain
- ChatGPT has no loops to go back and recompute data like humans can which severely limits its computational capability
I am not an AI researcher but the post made me realize LLM's are nowhere near the AGI or Terminator level AI intelligence some fear
Of course it's on the path, but LLM's are probabilistic models focused on continuing sentences.
They are really good at it, but not AGI (yet)
I think it's incredibly cool that a gigabrain like @stephen_wolfram would open source his thinking on ChatGPT
This has been the single best resource I've found so far on learning about @OpenAI's ChatGPT, LLMs and Neural Nets
Also @stephen_wolfram if you're ever interested in a long form podcast to walk through your thoughts, we'd love to host you on @Delphi_Digital's podcast!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Watched @karpathy's must watch 3.5‑hour LLM deep dive and here are my favorite takeaways, surprises, and musings.
He is a god tier communicator and technical genius
My thread is below but I would recommend watching it as its the best way to learn
1a/ One standout example: DeepSeek-R1. Karpathy highlights it as a reasoning LLM pushed to its limits with reinforcement learning
Instead of just mimicking textbook solutions, DeepSeek was trained via trial and error to solve problems, especially tough math questions
1b/ DeepSeek-R1 uses large-scale RL fine-tuning on top of a base model
The result? As it learns, its answers become longer and more methodical. It will backtrack and retrace steps when needed, producing deeper reasoning chains.
This emergent behavior wasn’t pre-programmed at all!
I wanted to share my mental model of the Crypto x AI landscape
Long term I think Crypto x AI wins because the most powerful technology of our generation can be owned, governed and iterated by an incentivized global community vs a centralized black box
Danger of Centralized AI's effects on millions of applications reaching billions of users.
I thought this was an interesting use of AI, so sharing the quick story below
TLDR - Bard found and summarized research studies to give me accurate info to make a decision - in a few hours of searching.
I've struggled with Allergies and sinus infections my whole life. I had allergy shots as a kid, and then sinus surgery (Balloon sinulpasty where they insert and inflate a balloon to expand your sinuses)
Recently I've been getting sinus infections again, so back to the ENT
The ENT was for another Balloon Sinuplasty (~$20k to them), but this didn't make sense to me as structurally my sinuses were already expanded
My gut take was to treat the underlying cause (allergies) more aggressively instead, but I needed more info