Jim Fan Profile picture
Feb 26 6 tweets 4 min read
*Decentralized* training will become a formidable force in open-source, large-scale AI developments. We need an infrastructure that enables LLM to scale as the community scales.

GPT-JT is a great example that distributes training over slow network and diverse devices.

1/
GPT-JT-6B is a fork from @AiEleuther’s GPT-J, fine-tuned on 3.53 billion tokens. The most distinguishing feature is that its training pipeline is distributed over 1 Gbps network - very slow compared to conventional centralized data center networks.

2/
This enables geo-distributed computing across cities or even countries. Now everyone can BYOC (“Bring Your Own Compute”) and join the training fleet to contribute to the open-source development. The scheduling algorithm makes no assumption about the device types.

3/
Blog from @togethercompute: together.xyz/blog/releasing…

Paper “Decentralized Training of Foundation Models in Heterogeneous Environments”: arxiv.org/abs/2206.01288

Authors: @Hades317 @Yong_jun_He, Jared Quincy Davis, @Tianyi_Zh, @tri_dao, @BeidiChen, @percyliang, Chris Re, Ce Zhang
Here’s another success story of decentralized & democratized AI training:

Leela Zero, the community effort to reproduce the mighty AlphaGo and AlphaZero for Go and Chess.
Open training is awesome, open-sourcing ideas is even better ;)

Express thread to my past writings:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jim Fan

Jim Fan Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @DrJimFan

Feb 15
The Adam optimizer is at the heart of modern AI. Researchers have been trying to dethrone Adam for years.

How about we ask a machine to do a better job? @GoogleAI uses evolution to discover a simpler & efficient algorithm with remarkable features.

It’s just 8 lines of code: 🧵
The discovered “Lion” optimizer is able to boost the accuracy of Vision Transformers (ViT) by up to 2% on ImageNet, reduce training compute by up to 2.3x for diffusion models, and achieve comparable performance on LLMs. It is more memory-efficient compared to human designs.

2/
Remarkably, the evolutionary search decides that the SIGN of gradient is all you need. For example, if the gradient is [-0.31, 0.43, -0.21], Lion turns it into [-1, 1, -1] for the update vector. This is counter-intuitive and nontrivial for human researchers to come up with.

3/
Read 5 tweets
Feb 14
Do you know that DeepMind has actually open-sourced the heart of AlphaGo & AlphaZero?

It’s hidden in an unassuming repo called “mctx”: github.com/deepmind/mctx

It provides JAX-native Monte Carlo Tree Search (MCTS) that runs on batches of inputs, in parallel, and blazing fast.
🧵 MuZero: https://www.deepmin...
MCTS is a search algorithm that solves for the best move in turn-based games by selecting → expanding → simulating → updating the nodes in a strategy tree.

It is arguably the most complex component of AlphaGo - and making it efficient is even more nontrivial.

2/ Image
DeepMind’s mctx library powers not just AlphaGo, but also AlphaZero (plays Go, Chess, and Shogi from scratch) and MuZero (AlphaZero + also solving Atari games).

AlphaZero: deepmind.com/blog/alphazero…
MuZero: deepmind.com/blog/muzero-ma…

3/
Read 4 tweets
Feb 13
We’ve seen a gazillion startups using OpenAI APIs to do “co-pilot for X”. What’s next?

Enter *physical* co-pilot! Here’s a compelling demo: you improvise by playing a “low resolution” piano, and the co-pilot compiles it real-time to Hi-Fi music! It unleashes our inner pianist.🧵
What’s behind the magic?

“Piano Genie” is a discrete autoencoder architecture that uses LSTM to map piano notes to low-res controller buttons, then decode back to the piano space. It’s trained on 1400 virtuosic performances from the International Piano-e-Competition.

2/
Remarkably, this project was done in 2018, *before* the age of LLMs and co-pilot. That’s why it used LSTM.

Brilliant work by @chrisdonahuey @iansimon @sedielem from @GoogleMagenta team. They recently did another super cool project called “SingSong” — your reverse Karaoke👇

3/
Read 4 tweets
Feb 12
Clinical Decision Transformer: a recommender system that takes in a desired range of clinical states as “goal”, and outputs a sequence of medications for the patient.

I applaud the authors for the interesting work & open-sourcing (soon), but I think the risks are immense.

1/
Just like (or even worse than) autonomous driving, a single medical accident can shake the faith of the general public for years to come. It affects not just one system or institute, but the entire AI+Medical industry.

2/
Moving forward, I think it’s important that every clinical decision is accompanied by extensive explanations and comprehensive references into the medical literature, so that doctors know when and whether they should trust the output.

3/
Read 5 tweets
Feb 6
I see Twitter as a place to open-source my ideas. I write about AI recipes, deep dives, insights of the past, and foresights of a better future.

Thanks for following. Here’s your first-class seat aboard the AI Express - all my top posts in one big 🧵. Enjoy: Galactic Express. Lexica Aperture v2
Deep dive: building *embodied* general intelligence — our NeurIPS Best Paper on “MineDojo”, open-ended agent learning in Minecraft.
Recipe: how to make virtual assistants like Siri & Alexa dramatically better.
Read 19 tweets
Feb 2
Music & sound effect industry has not fully understood the size of the storm about to hit.

There’re not just one, or two, but FOUR audio models in the past week *alone*

If 2022 is the year of pixels for generative AI, then 2023 is the year of sound waves.

Deep dive with me: 🧵 Image
MusicLM by @GoogleAI, a hierarchical text-to-audio model that generates music at 24 kHz that remains consistent over several minutes. It relies on 3 key pre-trained modules: SoundStream, w2v-BERT, and MuLan.

1.1/ Image
Among the three, MuLan is particularly interesting - it’s a CLIP-like model that learns to encode paired audio and text closer to each other in the embedding space. MuLan helps address the limited paired data issue - now MusicLM can learn from large audio-only corpus.

1.2/ Image
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(