kache Profile picture
Aug 24, 2023 15 tweets 4 min read Read on X
I cannot believe zuck et al just beat gpt 3.5 at humaneval pass@1 and is approaching gpt4 with only 34b params

(47 pages, therefore reaction thread - code llama)
>trained on 16k tokens
pretty cool
>7B & 13B
>trained on infilling, instead of just prompt completion.
good for copilot replacement & custom local hacks
because gpt 3.5's init token latency was so bad, I had to retire my custom vscode extension
Having options is good!
500B tokens, then 20B tokens for long context fine tuning that's a lot of tokens
(for the foundational model that they release)
really hope they talk about the distributions of the data Image
The 500B tokens is:
-a "near deduplicated" dataset of public code
- 8% of the data is from natural language related to code (likely code documentation & public Q/A)
- they prevent forgetting langauge understanding using a sample from a natural language dataset
Their instruct dataset makes me feel itchy. It's generated, and sized at 14k
They use self instruct by creating unit tests, and then running the solutions against them to select.
:<
The problem is that the functions are interview style questions, and too localized
lol holy shit
free use unless you're google source software is going to actually beat gpt4 in a few months guaranteed
this is crazy
also - interesting to note the improvement of code llama python
fwiw a lot doesn't get captured in evals
I expect good UX models have worse evals Image
needs galactica proofreading :>
(teasing, I make a ton of mistakes too) Image
interesting
code llama is best at cplusplus human eval, vs other languages
wonder why Image
context up to 100k tokens shows decrease in ppl. very cool Image
you, also, learn to code after you learn to read and write, correct? therefore chart. Image
interesting
use low temperature for first guess, increase temperatures for subsequent guesses? Image
looooooool shade thrown
"where are the pretrained weights, sama? i though't ya'll were supposed to be open? hmmmmmmmm?" - zuck, probably Image
lol i knew it
>have access to one of the biggest compute cluster in the world
>overfit it on L1 interview questions
glad they ran the experiment, but I'm not going to bother downloading anything other than the foundation model Image
summary
- in the next month people are going to build pretty insane things on top of the 34b code foundational model
- the finetunes they created are of scientific interest, but don't download them and train your own
what a huge contribution from their team
it's not just compute
it's a lot of human hours and skill
thank you! Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with kache

kache Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @yacineMTB

Jun 21
I got fired today. I'm not sure why, I personally don't think there is a reason, or that it's important.

When I joined twitter, I joined because of the engineers I met in SF. They seemed happy. They were having fun. Engineers at play. Engineers that were enabled. It was good!
They seemed competent. They spoke clearly. They didn't make things up. They told me why they worked there. One of them said: "this is the only place where I can work with this scale"

The scale. The scale is just absurd. 1m qps shit makes your eyes bleed. Pagers, pagers, pagers!
I was ambivalent to joining before I visited. I had dingboard, and it was growing fast. For me, it was a little adventure. But after meeting those engineers, I wanted to go back.

You can take the boy out of big tech
But its hard to take the big tech out of the boy
Read 31 tweets
Mar 2
i actually don't think you could cheat the interview i give with AI. like it's laughably easy; it's something that you would have programmed yourself if you ever needed to write a tool to make a chart of your CC transactions

yet, my interviews screen out *a lot* of people
the point of a screen is an "are you alive" test and its actually pretty clear within 5 minutes of me going through it

in fact i'd say the more leetcode you do the more likely you are going to fail my screen. being overpracticed is the same as cheating
the truth is that most google programming interviews are laughably easy and are just testing whether you cheated your way through your CS degree. it's abundantly obvious when people do, no amount of "tools" will stop it

at some point we lost the plot and started LC inflation
Read 7 tweets
Feb 25
I'm going to keep this thread bumped, comparing grok 3 and claude sonnet 37. I pinky promise i won't be biased.

The sample of the questions will not be "do some code work for me", but rather, explain something technical to me.

It will be a simple point scoring system, by eod
grok 3 gets a point. score is 1 to grok and 0 to sonnet

It was able to explain lazy monadic computation graphs, comparing two examples

claude 3.7 hallucinated / missed the problem (or i didnt understand its explanation which is also not good)
actually did have a benign one off script i needed to clean up some data, claude got the point. grok made a mistake on an import

1 to grok and 1 to sonnet
Read 9 tweets
Feb 19
this is not funny, and more of this will start happening. it doesn't take a genius to be able to see what is coming next, stay strapped
remember what I said about industrial capacity to create actuators

did you think that china was the threat?
soon; carrying non-lethal birdshot will be prudent if you're anyone of note
Read 9 tweets
Dec 29, 2024
crazy that a few 1 million+ follower people tried to get me fired 3 days ago. that's actually so funny
i think the miscalculation on my part is how much damage, in aggregate, was done by the ice effect of the control of free speech by tech lefties. i don't really blame them for their strong reaction against my (somewhat tasteless) shitposting

i also, felt cornered over the 2010s
i probably will not ever get fired for shitposting. if i do ever get fired, it will be because of a lack of shipping ability.

but also, I've afforded myself relative freedom - I realized that I can just bootstrap a company if I ever really do need to, and keeping my costs low
Read 7 tweets
Dec 12, 2024
my dad has been a teacher most of his life (50 years minus some). tutoring people older than him in his adolescence to pass IB. university academic, PhD. then taught middle school and highschool math, now university

i told him about homeschooling, he surprised me with a "yes"
the reason he said yes isn't what you think. public schools do suck, yeah. the reason he said yes is because he's an avid user of LLMs, and was an avid user of google when it dropped (he taught me all the tricks!)
he speaks to advanced voice in arabic, and he taught himself python in his 60s in the last year with LLMs helping him

he seriously thinks we don't need school anymore
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(