Stephen Wolfram of Wolfram Alpha wrote the absolute best post on ChatGPT and Large Language Models.
It took me about two hours to read, but significantly increased my understanding of what's going on under the hood of ChatGPT.
A few of my favorite takeaways (helps my process)
The goal of a large language model is to reasonably continue the text it already has
ChatGPT's LLM estimates these probabilities
Temperature is a parameter that determines how often lower ranked words are used, adding randomness.
LLM's are trained on vast amounts of human text
Where to these probabilities come from?
ChatGPT is a model that lets people estimate the probabilities which sequences of words should occur.
Stephen adds an interesting walk through demonstrating the probability of how often letters occur, and then pairs of letters and beyond
Neutral Nets are similar to a human brain
The brain has 100B neurons and are connected to ~1,000 other Neurons
A neuron pulses depending on what pulses it gets from other Neurons all with their own connections.
This contributes to different weights in the model. Viola!
Neural Net Explanation
- Neurons arranged in layers
- Each Neuron has a weight (significance)
- ML is first used to find the weights
- Neuron evaluates numerical function
- Input is fed and neurons at each layer evaluate and feeds results to next layer
- End result is reached
Larger networks do better at landing on results.
In the below image the goal is to take in a point and recognize it in one of the three regions.
I laughed when Stephen said at the boundaries it has trouble "making up its mind". Much human.
Unsure results could be dangerous
Training Neural Nets
The goal is to feed a zillion examples, and find weights that reproduce the examples.
Everytime an example is used, the weights are adjusted throughout the model.
Training is really expensive and computationally intensive.
Now how are the weights adjusted?
Stephen describes that the model uses a Loss Function.
The goal of adjusting the weights is to to reduce the loss function, or how far away your output is from the intended result based upon the examples.
More data, lower loss function
One of the most counterintuitive takeaways that with Neural Nets it's easier to solve more complicated problems than simpler ones.
That's good too since I'm dumb and need help with the complicated problems in life.
I'll let Stephen take it from here:
ChatGPT has an easier time training since it can conduct "Unsupervised Learning"
- ChatGPT gets text (masks end)
- Use probabilities to get the end of the sentence
- Use this as a training input
- Output is the complete piece of text
TLDR it's easier to get examples to train
Summing this all up, Stephen shares an image showing the training process for a neural net and how the loss function should decrease over time.
If the loss eventually streamlines, yay you have a solid model
If it not you can't rely on it and it's time to change the architecture
ChatGPT is often extrapolated as a path to Terminators
Stephen counters that the magic of LLMs for writing really isn't that hard.
We're not closer to terminators, writing essays just isn't as hard as we think.
@stephen_wolfram plz share more on NN's replacing humans (pic 2)
Embeddings
Embeddings are laying out words, represented by numbers, to those they are commonly associated with
Probabilities are found using vast amounts of text
Embeddings give a more natural feel to ChatGPT since words that are commonly associated with each other can be used
Onto ChatGPT!
ChatGPT+ is a giant Neural Net with 100 Trillion Parameters (GPT3 had 175B) focused on language.
That 1,000x the parameters of the brain. Woof.
The most important feature is the Transformer
An interesting side note for the Crypto audience.
Crypto's own @ilblackdragon, the co-founder of @NEARProtocol, is one of the authors on the original Transformers Paper
- Takes text
- Finds embeddings (numbers to represent text)
- Processes (values go through layers of the NN)
- New embedding produced (new number array)
- Takes array and generates 50,000 values for next possible tokens
- Highest prob produces text (I think)
Transformers are a breakthrough for LLMs.
An analogy is they allow the model to understand the context of words and the relationship between words that are far apart
Transformers can read all text at once vs one at a time so are much more efficient and scalable
Thanks ChatGPT!
Meaning Space
Stephen shares that in ChatGPT, text is represented by an array of numbers in a meaning space.
He goes on to describe that the trajectory of what words come next is far from a mathematical or physics like law we can rest our hats on.
So is ChatGPT similar to a human brain?
His conclusions:
- The neural net architecture may be similar
- Training of LLMs way less efficient vs human brain
- ChatGPT has no loops to go back and recompute data like humans can which severely limits its computational capability
I am not an AI researcher but the post made me realize LLM's are nowhere near the AGI or Terminator level AI intelligence some fear
Of course it's on the path, but LLM's are probabilistic models focused on continuing sentences.
They are really good at it, but not AGI (yet)
I think it's incredibly cool that a gigabrain like @stephen_wolfram would open source his thinking on ChatGPT
This has been the single best resource I've found so far on learning about @OpenAI's ChatGPT, LLMs and Neural Nets
Also @stephen_wolfram if you're ever interested in a long form podcast to walk through your thoughts, we'd love to host you on @Delphi_Digital's podcast!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Quick take on state of Crypto Venture Funds, Hedge Funds and Fundraising
Venture Funds: Unlikely VCs get a ton of new capital, especially with mega funds already being allocated to by UHNWs and they are sitting on $Bs in dry powder.
I do think this is the absolute best time to allocate to pre-seed/seed deals though given no fomo, 1st principles thinking and lower vals.
VCs are setting Vals right now (equity vals and dilution to get to token FDVs)
The past few years founders were setting vals.
Hedge Funds: Redemptions will hurt the market esp if funds have to liq and further hurt prices. Funds have to pay out for tax liabilities too. A lot of funds are also down a ton given locked funds on FTX so they have to liq more of what they do have to to cover LP taxes
One of my favorite takeaways is on earnings declines during recessions (~26% on average in 01, 08 and 20) and currently NTM EPS estimates for the S&P500 are only down 4% from peak
For everyone thinking a FED pivot means a market pivot, think again.
- DCG Owes $2.025B
- Genesis can call their $1.675B loan
- Genesis owes $900M to Gemini
DCG Liq
- Grayscale $10B AUM x 2% = $200M x 3x multiple = $600M
- GBTC/ETHE Holdings = 9.7%/3.8% = $629M with discount, $1.17B at Par
- VC book = Firesale values in a bear
Valuation multiples for asset management firms vary pretty widely from 1.5-2x for T-Rowe / Franklin Templeton to a few turns higher for other names like Blackrock at 4.5-6x.
I don't think Grayscale would command a company friendly valuation given the unwind risks.
On Grayscale:
Can probably fetch 600-800M in a sale on a 3-4x multiple.
Future fee generation is under pressure as entities sue for Reg M relief to close the discount
🏰 DCG: Hold co owns Coindesk, Genesis, Grayscale, and many other cos
💳 Genesis: Lending arm with ~$3B in total loans (per an article)
💰 Grayscale: Cash Cow. $ETHE = $3.6B, $GBTC $10.6B (trading at a 43% discount to NAV)
The news item is that Genesis was looking for a $1B emergency loan, probably to meet withdrawals.
My thoughts process is that Genesis accepted ETHE and GBTC as collateral for loans, and now the value of their ETHE and GBTC holdings are 40% below their true value
Why not just sell? Well with limited liq and an already 40% discount you can't