Discover and read the best of Twitter Threads about #eleutherai

Most recents (13)

Research in AI is surprisingly more accessible to people with different backgrounds compared to other fields.

Anyone (w/ relevant experience) can contribute to impactful research.

Here are 5 research orgs you can join to contribute to real, open research in deep learning ↓
1. #EleutherAI

EleutherAI may be the most famous AI open research collective. Lots of great work has been released by EleutherAI, such as the Pile dataset, GPT-J, GPT-NeoX-20B, and VQGAN-CLIP.

Link → discord.gg/zBGx3azzUn
2. @LAION

This server focuses on developing new & replicating existing multimodal models and creating datasets to support these efforts. They have released the LAION-400M and LAION-5B datasets and trained their own CLIP models

Link → discord.com/invite/eq3cAMZ…
Read 8 tweets
BLOOM by @BigScienceW is the most important AI model in the last decade.

Not DALL·E 2. Not PaLM. Not AlphaZero. Not even GPT-3. I'll explain why in this short thread.

🧵1/
In 2020 OpenAI's GPT-3 came out and redefined the guidelines for the AI industry (NLP in particular).

Current SOTA language models follow the trends: Large transformer-based and trained with lots of data, using big computers.

2/
But what truly makes them belong to the same package is they all stem from the immense resources of private tech companies.

Their goals? Staying at the forefront of AI research, earning money --and, in some cases, achieving the so-called AGI.

3/
Read 9 tweets
Deploying GPT-like language models on a chatbot is tricky.

You might wonder
• How to access the model?
• Where to host the bot?

In this 🧵I walk you through how easily I deployed a GPT-J-6B model by #EleutherAI on a #Telegram bot with @huggingface and @Gradio.

For FREE 🚀
By the end of this🧵, you’ll have your very own Telegram bot that can query the GPT-J model with any text you send it 👇
🤖Token From the BotFather

To create a bot, you must have a Telegram account.

Next, get a TOKEN from the BotFather. This TOKEN allows you to access the bot.

Keep this TOKEN private🤫. Anyone with this TOKEN can access your bot.

Read 19 tweets
Over a year ago, several brilliant people at #EleutherAI started plugging VQGAN and CLIP together and getting it to generate images. By now there are many variations and adaptations of the technique out there, but for various reasons the OG paper is only just coming out
Huge props to @RiversHaveWings, @dashstander, @EricHallahan, @lcastricato, and the many other people who have iterated on and popularized this technique. I came rather late to the party, and mostly made sure that the experiments happened and their great work was showcased
@RiversHaveWings @dashstander @EricHallahan @lcastricato VQGAN-CLIP has really taken on a life of its own, getting picked up and modified in Jupiter notebooks shared on Twitter, Instagram, and other social media platforms
Read 12 tweets
Google decided that 137B and 280B weren't enough, so now they've gone and trained a 540B model.

ai.googleblog.com/2022/04/pathwa…
Chinchilla is *hugely* punching above its weight here. Damn.
@SashaMTL @TaliaRinger Hmmmm I coulda sworn I recently read something about how LLMs are Good for the Environment Actually (TM) because they're multitask models and one training run supports a lot of deployment, and yet here we are.
Read 16 tweets
You've probably seen results showing impressive few-shot performance of very large language models (LLMs). Do those results mean that LLMs can reason? Well, maybe, but maybe not. Few-shot performance is highly correlated with pretraining term frequency. arxiv.org/abs/2202.07206
We focus on numerical reasoning (addition, multiplication, and unit conversion). We use the same formats and tasks used previously to show impressive few-shot performance, but we systematically evaluate every number and correlate performance with pretraining term frequency.
For example, a model that "knows" how to multiply should have similar performance multiplying 23*X and 24*X, for various X. We evaluate GPT-J on Y*X, for Y in [0, 100] and X in [1, 50], and plot average accuracy against Y's frequency in Pile (thanks #EleutherAI!).
Read 9 tweets
Excited to share my newest paper, "Neural Language Models are Effective Plagiarists" with @EdwardRaffML. We took a dataset of CS 101 assignments and asked "can a language model do a good job solving these with minimal human intervention or knowledge?"

arxiv.org/abs/2201.07406
@EdwardRaffML There's been some very interesting work recently on solving college level assignments with transformers, but that work typically uses private models and more complicated pipelines. We wanted to focus on what was available to a random student with the internet, not an AI expert.
@EdwardRaffML To do that, we stuck with #EleutherAI's GPT-J, freely and publicly available at 6b.eleuther.ai. We used no prompting, no finetuning, and no tricks.
Read 14 tweets
@MSFTResearch and @NVIDIAAI announce a 540B parameter large language model, 3x larger than GPT-3, achieving superior results on a variety of tasks. Trained on the Pile and evaluated on the Eval Harness, two of #EleutherAI’s biggest projects.

A 🧵

developer.nvidia.com/blog/using-dee…
@MSFTResearch @NVIDIAAI The Pile is a curated dataset of high quality data for training language models. The project was lead by @nabla_theta and myself, with contribs from many others. Released on Jan 1st 2021, it was the first public massive language model training dataset

@MSFTResearch @NVIDIAAI @nabla_theta The 530B model is trained predominantly on the Pile, with a couple newer CC scrapes mixed in. The "newer" facet is quite important, as the data in the Pile was collected prior to July 31st, 2020. Any events that happened since that date (most notably the COVID pandemic)
Read 32 tweets
Primer combines L1-BN (arxiv.org/abs/1802.09769), Conformer (arxiv.org/abs/2005.08100) and "Squared ReLU" to reach up to 4x faster convergence at no additional memory cost.

This speedup is almost as significant as Switch Transformer's (arxiv.org/abs/2101.03961). It got up to 7x speedups using 64x as many (sparse) parameters.
Primer, however, doesn't use more parameters. It's also orthogonal to Switch, so a combined 32x speedup seems plausible.
There's just one slight issue: The baseline.
Primer compares itself with a default transformer and has no ablations of individual changes.
Instead, they trained a standard 2B GPT3-XL for 2 trillion tokens, spending well over $1,000,000 on this one figure.
Read 7 tweets
In addition to the codebase, @laurel_orr1 and I wrote up a blog post (with the rest of the Propulsion team!) describing a bit more about Mistral and our journey in more detail.

Check it out here, and we'd love to hear your thoughts: crfm.stanford.edu/blog.html [1/5]
I really hope that our voices came through; we tried to keep it light, while also hitting on the hurdles we encountered along the way!

Not everything made it into the blog, so we also recorded a light & lively 25-min podcast: soundcloud.com/propulsion-mix… [2/5]
Big thanks to everyone who helped us build Mistral -- from @Thom_Wolf & @StasBekman who helped us navigate @huggingface Transformers, to @carey_phelps for providing support with @weights_biases.

Also huge shoutout to @BlancheMinerva from #EleutherAI for providing feedback! [3/5]
Read 5 tweets
Okay, time to live tweet my thoughts on @stanfordnlp @StanfordAILab's "Workshop on Foundation Models." A long thread.
First and foremost: please never use the phrase "foundational models" every again. It's a garbage name that people like @mmitchell_ai @emilymbender @mer__edith have criticized at length. I'll go find some of their comments and link to them later, but the short version is:
@mmitchell_ai @emilymbender @mer__edith 1. There is very little intellectually "foundational" about these models
2. It's not at all clear that GPT-3 and CLIP-DALL-E are the same kind of thing
3. The motivation for this relabeling appears to be entirely about political control over language
Read 60 tweets
Phenomenally interesting paper about how AI researchers talk about what they value in their research. Very glad the authors took the time to do this laborious but important work. I'm going to keep this in my desk so the next time I go on a rant about how ML is prescriptive [1/?]
rather than descriptive I can wack people who disagree with this paper 😛

I would actually go further than the authors of this paper do (I don't know if they disagree with what I'm about to say, but they didn't say it): I would say that corporate AI research [2/?]
is a propaganda tool that is actively and deliberately wielded to influence policy, regulation, and ethics conversations about technology. The very way mainstream AI research - even "AI Ethics" research - is framed obliviates consequences for the companies. [3/?]
Read 26 tweets
Great write up about the crazy cool art #EleutherAI members have been learning to coax out of GANs with CLIP! Credit assignment with stuff like this is hard, but @jbusted1 @RiversHaveWings @BoneAmputee and @kialuy are some of the people who have made this happen.
@jbusted1 @RiversHaveWings @BoneAmputee @kialuy They’ve been doing some visionary work with human-guided AI-generated art for the past two months, and it’s phenomenal that they’re starting to get the recognition they deserve. Several more people who either lack twitters or whose handles I don’t know deserve applause too
Read 9 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!