Yam peleg Profile picture
Jun 21 4 tweets 1 min read Twitter logo Read on Twitter
Ladies and Gentlemen. GPT-4 👇 Image
*Ported from HF's code
Firing LLaMA-7B 8xMLP MoE now, wish me luck!
OK we are going somewhere! it definitely learns so the router net is probably fine.
Now I just need to understand if we want to freeze the backbone on all machines when training the experts or we can somehow sync them and we are ready to train wizardcoder-hermes-airoboros experts

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Yam peleg

Yam peleg Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Yampeleg

Jun 21
I think I get it now. let's try something out:
.
Comment to this thread everything you "just guess" about GPT-4 👇👇
Guess: In the dataset: Went through every undergrad major and included exams, tests & textbooks of the subject. [to create a "wow" effect for every educated person no matter the field]
Guess: GPT-3's batch-size is written deliberately misleading in the paper (millions) when in real life it is much smaller. This is because NEARLY ALL THE TIME smaller batches leads to better performance. So: the training here at some point probably was done in small batches. twitter.com/i/web/status/1…
Read 8 tweets
Jun 20
B R E A K I N G! @realGeorgeHotz Just said:
- GPT-4 Has 220 parameters.
- GPT-4 Is a mixture of experts with 8 experts.
- GPT-4 Is doing 16 times inference (did u mean beams? or just 2 beams per model?)
-
@realGeorgeHotz It's HUGE!! Can you confirm??
src: latent.space/p/geohot twitter.com/i/web/status/1…
@abacaj look what you have done! I had work to do! Image
Yes I meant 220B.. Soorrryy 😅
Read 4 tweets
Jun 20
החלקים החסרים של GPT-4 (בעברית) 👇
.
The missing pieces of GPT-4 (Hebrew)👇
.
​טריקים שאף אחד לא מדבר עליהם ועתיד מודלי השפה הפתוחים.
***
> אמ;לק: למידה ממשוב לא-אנושי (!!)
.
העברתי הרצאה מעניינת לפני כמה ימים בכנס #GenML עם אותו השם ("החלקים החסרים של GPT-4").
בזמן שכולנו מחכים… twitter.com/i/web/status/1…
חלק 2: ההשפעה של LLaMA
----------
אתם כבר יודעים על LLaMA.
מודל שפה עוצמתי מבית מטא, שוחרר בקוד פתוח בפברואר האחרון.
המודל עומד במרכזו של מאמץ עולמי לשחזר את יכולות מודלי השפה המסחריים באופן פתוח.
בכך לספק למיליוני אנשים בכל רחבי העולם גישה לטכנולוגיה זו.
ישנם עשרות אלפי אנשים… twitter.com/i/web/status/1…
חלק 3: אימון מקדים
---------
כאן מושקעים 99% מהמאמצים, גם כוח החישוב וגם העלות הכספית. כשאתם שומעים סיפורים כמו "4 מיליון דולר לאימון.." בדרך כלל הם מתייחסים לשלב האימון הזה.
לצערנו, בדרך כלל שלב זה מחוץ להישג יד עבור רוב האנשים מסיבות ברורות.
מה אנחנו כן יודעים על השלב הזה?… twitter.com/i/web/status/1…
Read 7 tweets
May 2, 2022
How to trick AUC into scoring your stupid models higher 👌

👇A - Thread - about gaming the AUC metric!👇

There is a built-in "flaw" with the AUC metric. Today we are going to take it for a spin!
One of the questions coming up as we get stuck on improving individual models further is how to best combine multiple models.

Apart from simple the baseline Avg, Median or Avg-Median-Rocket-Science-Blending[1].
We can also use power averaging which is favoured by the AUC metric. [2]

But can we push this even further?
With AUC, upside error or downside error does not really matter. AUC only cares about how the predictions are ordered.

So, how can we take advantage of this?
Read 24 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(