Tweet

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @Yampeleg

Yam peleg

@Yampeleg

Jun 21

I think I get it now. let's try something out:
.
Comment to this thread everything you "just guess" about GPT-4 👇👇

Guess: In the dataset: Went through every undergrad major and included exams, tests & textbooks of the subject. [to create a "wow" effect for every educated person no matter the field]

twitter.com/i/web/status/1…

Guess: GPT-3's batch-size is written deliberately misleading in the paper (millions) when in real life it is much smaller. This is because NEARLY ALL THE TIME smaller batches leads to better performance. So: the training here at some point probably was done in small batches. twitter.com/i/web/status/1…

Read 8 tweets

Yam peleg

@Yampeleg

Jun 20

@realGeorgeHotz

B R E A K I N G! @realGeorgeHotz Just said:
- GPT-4 Has 220 parameters.
- GPT-4 Is a mixture of experts with 8 experts.
- GPT-4 Is doing 16 times inference (did u mean beams? or just 2 beams per model?)
-
@realGeorgeHotz It's HUGE!! Can you confirm??
src: latent.space/p/geohot twitter.com/i/web/status/1…

@abacaj

@abacaj look what you have done! I had work to do!

Yes I meant 220B.. Soorrryy 😅

Read 4 tweets

Yam peleg

@Yampeleg

Jun 20

twitter.com/i/web/status/1…

החלקים החסרים של GPT-4 (בעברית) 👇
.
The missing pieces of GPT-4 (Hebrew)👇
.
טריקים שאף אחד לא מדבר עליהם ועתיד מודלי השפה הפתוחים.
***
> אמ;לק: למידה ממשוב לא-אנושי (!!)
.
העברתי הרצאה מעניינת לפני כמה ימים בכנס #GenML עם אותו השם ("החלקים החסרים של GPT-4").
בזמן שכולנו מחכים… twitter.com/i/web/status/1…

twitter.com/i/web/status/1…

חלק 2: ההשפעה של LLaMA
----------
אתם כבר יודעים על LLaMA.
מודל שפה עוצמתי מבית מטא, שוחרר בקוד פתוח בפברואר האחרון.
המודל עומד במרכזו של מאמץ עולמי לשחזר את יכולות מודלי השפה המסחריים באופן פתוח.
בכך לספק למיליוני אנשים בכל רחבי העולם גישה לטכנולוגיה זו.
ישנם עשרות אלפי אנשים… twitter.com/i/web/status/1…

twitter.com/i/web/status/1…

חלק 3: אימון מקדים
---------
כאן מושקעים 99% מהמאמצים, גם כוח החישוב וגם העלות הכספית. כשאתם שומעים סיפורים כמו "4 מיליון דולר לאימון.." בדרך כלל הם מתייחסים לשלב האימון הזה.
לצערנו, בדרך כלל שלב זה מחוץ להישג יד עבור רוב האנשים מסיבות ברורות.
מה אנחנו כן יודעים על השלב הזה?… twitter.com/i/web/status/1…

Read 7 tweets

Yam peleg

@Yampeleg

May 2, 2022

How to trick AUC into scoring your stupid models higher 👌

👇A - Thread - about gaming the AUC metric!👇

There is a built-in "flaw" with the AUC metric. Today we are going to take it for a spin!

One of the questions coming up as we get stuck on improving individual models further is how to best combine multiple models.

Apart from simple the baseline Avg, Median or Avg-Median-Rocket-Science-Blending[1].

We can also use power averaging which is favoured by the AUC metric. [2]

But can we push this even further?
With AUC, upside error or downside error does not really matter. AUC only cares about how the predictions are ordered.

So, how can we take advantage of this?

Read 24 tweets

Share this page!

Enter Twitter Thread URL to Unroll

Yam peleg

Try unrolling a thread yourself!

More from @Yampeleg

Yam peleg

Yam peleg

Yam peleg

Yam peleg

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!