Over a year ago, several brilliant people at #EleutherAI started plugging VQGAN and CLIP together and getting it to generate images. By now there are many variations and adaptations of the technique out there, but for various reasons the OG paper is only just coming out
Huge props to @RiversHaveWings, @dashstander, @EricHallahan, @lcastricato, and the many other people who have iterated on and popularized this technique. I came rather late to the party, and mostly made sure that the experiments happened and their great work was showcased
@RiversHaveWings @dashstander @EricHallahan @lcastricato VQGAN-CLIP has really taken on a life of its own, getting picked up and modified in Jupiter notebooks shared on Twitter, Instagram, and other social media platforms
@RiversHaveWings @dashstander @EricHallahan @lcastricato We have tried our best to pay homage to the work done by all the great artists out there who are working with VQGAN-CLIP, as without the community that has sprung up around this project we would have probably never written this paper.
@RiversHaveWings @dashstander @EricHallahan @lcastricato There are much larger and fancier models out today, but in anything approaching an apples-to-apples comparison, VGQAN-CLIP remains one of if not the best option for semantically conditioning image generation.
@RiversHaveWings @dashstander @EricHallahan @lcastricato And, of course, it’s cheap to run. Thanks to @unixpickle we were able to analyze the tradeoffs between pretraining a model like GLIDE and using VQGAN-CLIP. The results were startling: I did not expect the break even point to occur only after tens of thousands of dollars spent
@RiversHaveWings @dashstander @EricHallahan @lcastricato @unixpickle Making models accessible to and usable by researchers is an essential part of any open science movement. Just earlier today @RiversHaveWings and I were discussing what models to train for a forthcoming paper and her key concern was making it small enough to be accessible
@RiversHaveWings @dashstander @EricHallahan @lcastricato @unixpickle Big models get a lot more glory (and I’ve certainly trained my share of big models), but there’s a reason why BERT and GPT-2 are still so widely used.
@RiversHaveWings @dashstander @EricHallahan @lcastricato @unixpickle A model as expensive as DALL-E or GLIDE simply isn’t something many users can afford to use or researchers can’t afford to iterate on. And that last point is really important: at #EleutherAI we aren’t looking for “users”
@RiversHaveWings @dashstander @EricHallahan @lcastricato @unixpickle we want people to adapt and build off of and improve our work. And that simply isn’t an option for GLIDE. We certainly are interested in the impact of scale, but we are also very aware of the centralization of the ability to do research and the importance of democratization.
@RiversHaveWings @dashstander @EricHallahan @lcastricato @unixpickle As a reminder, thanks to @CoreWeave anyone can use this model (and several others such as CLIP-guided diffusion and latent diffusion) for free in our Discord! Come join the fun in #the-faraday-cage!

Special shout out to @BoneAmputee for writing the bot :)
Tagging some people I know will be excited to see that this paper finally exists: @mark_riedl @advadnoun @Ted_Underwood @sea_snell @EMostaque @multimodalart @KiaManniquin

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Stella Rose Biderman

Stella Rose Biderman Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @BlancheMinerva

Apr 4
Google decided that 137B and 280B weren't enough, so now they've gone and trained a 540B model.

ai.googleblog.com/2022/04/pathwa…
Chinchilla is *hugely* punching above its weight here. Damn.
@SashaMTL @TaliaRinger Hmmmm I coulda sworn I recently read something about how LLMs are Good for the Environment Actually (TM) because they're multitask models and one training run supports a lot of deployment, and yet here we are.
Read 16 tweets
Feb 19
Phenomenal work on the linkage between LM performance and frequency of data in the pretraining dataset. As far as I am aware, this is the first paper to demonstrate such a connection outside of the work of people like @colinraffel and @katherine1ee and Carlini on memorization
To their credit, @OpenAI put this plot in their GPT-3 which looks like this. It appears to answer the question, but recent work (esp. @AlexTamkin’s newest paper) calls into question the validity of using a present / not present dichotomy to draw conclusion.
@OpenAI @AlexTamkin Evaluating languages models is very hard. Even building basic frameworks for few-shot evaluation that work with many LMs and many tasks is a lot of work.

That’s why @nabla_theta and Jonathan Tow have been working to build our own framework from scratch: github.com/EleutherAI/lm-…
Read 4 tweets
Jan 20
Excited to share my newest paper, "Neural Language Models are Effective Plagiarists" with @EdwardRaffML. We took a dataset of CS 101 assignments and asked "can a language model do a good job solving these with minimal human intervention or knowledge?"

arxiv.org/abs/2201.07406
@EdwardRaffML There's been some very interesting work recently on solving college level assignments with transformers, but that work typically uses private models and more complicated pipelines. We wanted to focus on what was available to a random student with the internet, not an AI expert.
@EdwardRaffML To do that, we stuck with #EleutherAI's GPT-J, freely and publicly available at 6b.eleuther.ai. We used no prompting, no finetuning, and no tricks.
Read 14 tweets
Oct 11, 2021
@MSFTResearch and @NVIDIAAI announce a 540B parameter large language model, 3x larger than GPT-3, achieving superior results on a variety of tasks. Trained on the Pile and evaluated on the Eval Harness, two of #EleutherAI’s biggest projects.

A 🧵

developer.nvidia.com/blog/using-dee…
@MSFTResearch @NVIDIAAI The Pile is a curated dataset of high quality data for training language models. The project was lead by @nabla_theta and myself, with contribs from many others. Released on Jan 1st 2021, it was the first public massive language model training dataset

@MSFTResearch @NVIDIAAI @nabla_theta The 530B model is trained predominantly on the Pile, with a couple newer CC scrapes mixed in. The "newer" facet is quite important, as the data in the Pile was collected prior to July 31st, 2020. Any events that happened since that date (most notably the COVID pandemic)
Read 32 tweets
Aug 23, 2021
Okay, time to live tweet my thoughts on @stanfordnlp @StanfordAILab's "Workshop on Foundation Models." A long thread.
First and foremost: please never use the phrase "foundational models" every again. It's a garbage name that people like @mmitchell_ai @emilymbender @mer__edith have criticized at length. I'll go find some of their comments and link to them later, but the short version is:
@mmitchell_ai @emilymbender @mer__edith 1. There is very little intellectually "foundational" about these models
2. It's not at all clear that GPT-3 and CLIP-DALL-E are the same kind of thing
3. The motivation for this relabeling appears to be entirely about political control over language
Read 60 tweets
Jul 3, 2021
Phenomenally interesting paper about how AI researchers talk about what they value in their research. Very glad the authors took the time to do this laborious but important work. I'm going to keep this in my desk so the next time I go on a rant about how ML is prescriptive [1/?]
rather than descriptive I can wack people who disagree with this paper 😛

I would actually go further than the authors of this paper do (I don't know if they disagree with what I'm about to say, but they didn't say it): I would say that corporate AI research [2/?]
is a propaganda tool that is actively and deliberately wielded to influence policy, regulation, and ethics conversations about technology. The very way mainstream AI research - even "AI Ethics" research - is framed obliviates consequences for the companies. [3/?]
Read 26 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(