*Decentralized* training will become a formidable force in open-source, large-scale AI developments. We need an infrastructure that enables LLM to scale as the community scales.
GPT-JT is a great example that distributes training over slow network and diverse devices.
1/
GPT-JT-6B is a fork from @AiEleuther’s GPT-J, fine-tuned on 3.53 billion tokens. The most distinguishing feature is that its training pipeline is distributed over 1 Gbps network - very slow compared to conventional centralized data center networks.
2/
This enables geo-distributed computing across cities or even countries. Now everyone can BYOC (“Bring Your Own Compute”) and join the training fleet to contribute to the open-source development. The scheduling algorithm makes no assumption about the device types.
3/
The Adam optimizer is at the heart of modern AI. Researchers have been trying to dethrone Adam for years.
How about we ask a machine to do a better job? @GoogleAI uses evolution to discover a simpler & efficient algorithm with remarkable features.
It’s just 8 lines of code: 🧵
The discovered “Lion” optimizer is able to boost the accuracy of Vision Transformers (ViT) by up to 2% on ImageNet, reduce training compute by up to 2.3x for diffusion models, and achieve comparable performance on LLMs. It is more memory-efficient compared to human designs.
2/
Remarkably, the evolutionary search decides that the SIGN of gradient is all you need. For example, if the gradient is [-0.31, 0.43, -0.21], Lion turns it into [-1, 1, -1] for the update vector. This is counter-intuitive and nontrivial for human researchers to come up with.
3/
It provides JAX-native Monte Carlo Tree Search (MCTS) that runs on batches of inputs, in parallel, and blazing fast.
🧵
MCTS is a search algorithm that solves for the best move in turn-based games by selecting → expanding → simulating → updating the nodes in a strategy tree.
It is arguably the most complex component of AlphaGo - and making it efficient is even more nontrivial.
2/
DeepMind’s mctx library powers not just AlphaGo, but also AlphaZero (plays Go, Chess, and Shogi from scratch) and MuZero (AlphaZero + also solving Atari games).
We’ve seen a gazillion startups using OpenAI APIs to do “co-pilot for X”. What’s next?
Enter *physical* co-pilot! Here’s a compelling demo: you improvise by playing a “low resolution” piano, and the co-pilot compiles it real-time to Hi-Fi music! It unleashes our inner pianist.🧵
What’s behind the magic?
“Piano Genie” is a discrete autoencoder architecture that uses LSTM to map piano notes to low-res controller buttons, then decode back to the piano space. It’s trained on 1400 virtuosic performances from the International Piano-e-Competition.
2/
Remarkably, this project was done in 2018, *before* the age of LLMs and co-pilot. That’s why it used LSTM.
Clinical Decision Transformer: a recommender system that takes in a desired range of clinical states as “goal”, and outputs a sequence of medications for the patient.
I applaud the authors for the interesting work & open-sourcing (soon), but I think the risks are immense.
1/
Just like (or even worse than) autonomous driving, a single medical accident can shake the faith of the general public for years to come. It affects not just one system or institute, but the entire AI+Medical industry.
2/
Moving forward, I think it’s important that every clinical decision is accompanied by extensive explanations and comprehensive references into the medical literature, so that doctors know when and whether they should trust the output.
3/
Music & sound effect industry has not fully understood the size of the storm about to hit.
There’re not just one, or two, but FOUR audio models in the past week *alone*
If 2022 is the year of pixels for generative AI, then 2023 is the year of sound waves.
Deep dive with me: 🧵
MusicLM by @GoogleAI, a hierarchical text-to-audio model that generates music at 24 kHz that remains consistent over several minutes. It relies on 3 key pre-trained modules: SoundStream, w2v-BERT, and MuLan.
1.1/
Among the three, MuLan is particularly interesting - it’s a CLIP-like model that learns to encode paired audio and text closer to each other in the embedding space. MuLan helps address the limited paired data issue - now MusicLM can learn from large audio-only corpus.