Discover and read the best of Twitter Threads about #microsoftBuild

Most recents (1)

Watching @karpathy presentation from today and taking twitter notes, come along for the ride:

If you're like only the practical tips, skip to #32

@karpathy starts with stages:
1 - Pre-training - months x thousands of GPUs
2, 3, 4 - Finetuning stages that take hours or days

1/ Image
Before pre-training happens, there are 2 preparation steps.

Data collection - Get tons of data from different sources (here Andrej LLaMa mixture)

Tokenization - a lossless translations between pieces of words and integers.

2/ ImageImage
"You shouldn't judge the power of the model just by the number of parameters it contains"

LLaMa has trained on 1-1.4 Trillion tokens vs 300B tokens in GPT-3.

3/ Image
Read 44 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!