Latest Twitter Threads by @BlancheMinerva on Thread Reader App

Feb 7 • 14 tweets • 4 min read

I had my first aha moment with OpenAI when it leaked that they had spent a year lying about that their API models being RLHF when they were really SFT.

My second was when they sent anonymous legal threats to people in the OSS AI community who had GPT-4 details leaked to them.

https://twitter.com/7oponaut/status/1887679663419629912

OpenAI had made choices I disagreed with and did things I didn't like before then, but those were the key moments driving my current attitude towards them.

Feb 20, 2024 • 6 tweets • 2 min read

"The amount of FLOPs it requires to train a LLM grows quadratically with sequence length" is a false statement for all practical purposes and cannot die quickly enough.

https://twitter.com/deliprao/status/1760009083984167090

I got distracted by Flash Attention when people asked for an elaboration, but the core reason this is true is that that's not where most of the operations are at scale. The attached image shows a breakdown of the operations.

Jan 1, 2024 • 26 tweets • 8 min read

Many people seem to think they can't do interesting LLM research outside a large lab, or are shoehorned into crowded topics. In reality, there are tons of wide-open high value questions. To prove it, I'll be tweeting one per week (every Monday) in 2024.

Please steal my ideas! The vast majority of these questions can be studied on a couple commercial GPUs or a TRC grant. If you'd like to work on one of these but desire mentorship, I'm open to helping if you show you've put some effort into getting started / have preliminary results.

Sep 29, 2023 • 10 tweets • 3 min read

This is your daily reminder that only three orgs have ever trained a LLM and released the model and full data: @AiEleuther @BigscienceW (non-OS license) @togethercompute.

Small orgs like these make science possible in the face of industry power.

https://twitter.com/taliaringer/status/1707541040016642184

Transparency is a key part of both scientific research and ethical development and deployment of AI technologies. Without transparency into training data we cannot know whose information and ideologies are being encoded in ML systems. Unfortunately, this work is increasingly hard

Apr 5, 2023 • 14 tweets • 9 min read

Have you ever wanted to do an experiment on LLMs and found that none of the existing model suites met your needs? At @AiEleuther we got tired of this happening and so designed a model suite that centers enabling scientific research as its primary goal

arxiv.org/abs/2304.01373 To do this we identified common limitations for doing research, such as training on non-public data, not releasing partially trained checkpoints, or not being able to easily know which data has been seen by which model checkpoints.

Mar 28, 2023 • 6 tweets • 4 min read

Recently I’ve been harping on how “compute optimal” and “best for use” are completely different things. This plot from @CerebrasSystems shows that really well: their compute optimal models trained on the Pile out-preform Pythia for fixed compute but underperform for fixed params

@CerebrasSystems Pythia models are trained for 300B tokens, while Cerebras’s are compute optimal. As a result, the validation loss for our 2.7B model is virtually identical to their 6.7B model and our 410M model is substantially better than their 1.3B model.

Feb 25, 2023 • 5 tweets • 4 min read

“Chinchilla optimal” means “I have a fixed TOTAL NUMBER OF FLOPS to spend. What model size and data size should I use to get the lowest loss?”

If you have a limit on either data or model size, then a chinchilla optimal model is likely not optimal for you.

https://twitter.com/yoavgo/status/1629513463578935297

Chinchilla-optimal models are very often ACTIVELY BAD FOR APPLICATIONS. A chinchilla optimal 2.7B model has seen only 50B tokens, or one sixth what EleutherAI typically trains small models for. A model trained for so few tokens might be “compute optimal” but it’s very bad.

Dec 15, 2022 • 12 tweets • 3 min read

THIS DOES NOT WORK. Don’t fall for this disinformation and destroy your websites and communities. This protest has no bearing on the performance of DALL-E2 and Stable Diffusion. It’s incredibly sad to see a basic lack of knowledge of technology enable shit like this to go viral.

https://twitter.com/tsurudraws/status/1603191985266753536

Yes, models like DALL-E2, Stable Diffusion, and Midjourney were trained on images uploaded to crowdsourced websites like Flickr and ArtStation.

HOWEVER post-release changes to these websites do not influence the AIs in any way. They don’t retrieve images on the fly.

Oct 25, 2022 • 9 tweets • 7 min read

ITT: an OAI employee admits that the text-davinci API models are not from their papers.

Until @OpenAI actually documents the connection between the models in their papers and the models released via APIs, #NLProc researchers need to stop using them to do research.

https://twitter.com/janleike/status/1584618242756132864

@OpenAI This is not a minor point either. Apparently the text-davinci-002 API “is an instruct model. It doesn't uses a similar but slightly different [sic] training technique but it's not derived from davinci. Hence it's not a fair comparison.”

Apr 20, 2022 • 12 tweets • 16 min read

Over a year ago, several brilliant people at #EleutherAI started plugging VQGAN and CLIP together and getting it to generate images. By now there are many variations and adaptations of the technique out there, but for various reasons the OG paper is only just coming out

https://twitter.com/ak92501/status/1516579338312830979

Huge props to @RiversHaveWings, @dashstander, @EricHallahan, @lcastricato, and the many other people who have iterated on and popularized this technique. I came rather late to the party, and mostly made sure that the experiments happened and their great work was showcased

Apr 4, 2022 • 16 tweets • 9 min read

Google decided that 137B and 280B weren't enough, so now they've gone and trained a 540B model.

ai.googleblog.com/2022/04/pathwa… Chinchilla is *hugely* punching above its weight here. Damn.

Feb 19, 2022 • 4 tweets • 4 min read

Phenomenal work on the linkage between LM performance and frequency of data in the pretraining dataset. As far as I am aware, this is the first paper to demonstrate such a connection outside of the work of people like @colinraffel and @katherine1ee and Carlini on memorization

https://twitter.com/AlexTamkin/status/1494937726822588423

To their credit, @OpenAI put this plot in their GPT-3 which looks like this. It appears to answer the question, but recent work (esp. @AlexTamkin’s newest paper) calls into question the validity of using a present / not present dichotomy to draw conclusion.

Jan 20, 2022 • 14 tweets • 6 min read

Excited to share my newest paper, "Neural Language Models are Effective Plagiarists" with @EdwardRaffML. We took a dataset of CS 101 assignments and asked "can a language model do a good job solving these with minimal human intervention or knowledge?"

arxiv.org/abs/2201.07406

@EdwardRaffML There's been some very interesting work recently on solving college level assignments with transformers, but that work typically uses private models and more complicated pipelines. We wanted to focus on what was available to a random student with the internet, not an AI expert.

Oct 11, 2021 • 32 tweets • 19 min read

@MSFTResearch and @NVIDIAAI announce a 540B parameter large language model, 3x larger than GPT-3, achieving superior results on a variety of tasks. Trained on the Pile and evaluated on the Eval Harness, two of #EleutherAI’s biggest projects.

A 🧵

developer.nvidia.com/blog/using-dee… @MSFTResearch @NVIDIAAI The Pile is a curated dataset of high quality data for training language models. The project was lead by @nabla_theta and myself, with contribs from many others. Released on Jan 1st 2021, it was the first public massive language model training dataset

https://mobile.twitter.com/nabla_theta/status/1345130408170541056?lang=en

Aug 23, 2021 • 60 tweets • 33 min read

Okay, time to live tweet my thoughts on @stanfordnlp @StanfordAILab's "Workshop on Foundation Models." A long thread. First and foremost: please never use the phrase "foundational models" every again. It's a garbage name that people like @mmitchell_ai @emilymbender @mer__edith have criticized at length. I'll go find some of their comments and link to them later, but the short version is:

Jul 3, 2021 • 26 tweets • 9 min read

Phenomenally interesting paper about how AI researchers talk about what they value in their research. Very glad the authors took the time to do this laborious but important work. I'm going to keep this in my desk so the next time I go on a rant about how ML is prescriptive [1/?]

https://twitter.com/Abebab/status/1410267861130620928

rather than descriptive I can wack people who disagree with this paper 😛

I would actually go further than the authors of this paper do (I don't know if they disagree with what I'm about to say, but they didn't say it): I would say that corporate AI research [2/?]

Jul 2, 2021 • 6 tweets • 2 min read

@RiversHaveWings is a phenomenal artist and her work with CLIP is simply stunning

https://twitter.com/RiversHaveWings/status/1410020043178446848

@RiversHaveWings

https://twitter.com/RiversHaveWings/status/1406347245297881088?s=20

Jul 2, 2021 • 9 tweets • 6 min read

Great write up about the crazy cool art #EleutherAI members have been learning to coax out of GANs with CLIP! Credit assignment with stuff like this is hard, but @jbusted1 @RiversHaveWings @BoneAmputee and @kialuy are some of the people who have made this happen.

https://twitter.com/sea_snell/status/1410360593350115330

@jbusted1 @RiversHaveWings @BoneAmputee @kialuy They’ve been doing some visionary work with human-guided AI-generated art for the past two months, and it’s phenomenal that they’re starting to get the recognition they deserve. Several more people who either lack twitters or whose handles I don’t know deserve applause too

Share this page!

Enter URL or ID to Unroll