Tweet

@AiEleuther

@togethercompute

https://twitter.com/drjimfan/status/1627354160529285120

https://twitter.com/DrJimFan/status/1622637571431092224

More from @DrJimFan

Jim Fan

@DrJimFan

Feb 15

@GoogleAI

The Adam optimizer is at the heart of modern AI. Researchers have been trying to dethrone Adam for years.

How about we ask a machine to do a better job? @GoogleAI uses evolution to discover a simpler & efficient algorithm with remarkable features.

It’s just 8 lines of code: 🧵

The discovered “Lion” optimizer is able to boost the accuracy of Vision Transformers (ViT) by up to 2% on ImageNet, reduce training compute by up to 2.3x for diffusion models, and achieve comparable performance on LLMs. It is more memory-efficient compared to human designs.

2/

Remarkably, the evolutionary search decides that the SIGN of gradient is all you need. For example, if the gradient is [-0.31, 0.43, -0.21], Lion turns it into [-1, 1, -1] for the update vector. This is counter-intuitive and nontrivial for human researchers to come up with.

3/

Read 5 tweets

Jim Fan

@DrJimFan

Feb 14

Do you know that DeepMind has actually open-sourced the heart of AlphaGo & AlphaZero?

It’s hidden in an unassuming repo called “mctx”: github.com/deepmind/mctx

It provides JAX-native Monte Carlo Tree Search (MCTS) that runs on batches of inputs, in parallel, and blazing fast.
🧵

MCTS is a search algorithm that solves for the best move in turn-based games by selecting → expanding → simulating → updating the nodes in a strategy tree.

It is arguably the most complex component of AlphaGo - and making it efficient is even more nontrivial.

2/

DeepMind’s mctx library powers not just AlphaGo, but also AlphaZero (plays Go, Chess, and Shogi from scratch) and MuZero (AlphaZero + also solving Atari games).

AlphaZero: deepmind.com/blog/alphazero…
MuZero: deepmind.com/blog/muzero-ma…

3/

Read 4 tweets

Jim Fan

@DrJimFan

Feb 13

We’ve seen a gazillion startups using OpenAI APIs to do “co-pilot for X”. What’s next?

Enter *physical* co-pilot! Here’s a compelling demo: you improvise by playing a “low resolution” piano, and the co-pilot compiles it real-time to Hi-Fi music! It unleashes our inner pianist.🧵

What’s behind the magic?

“Piano Genie” is a discrete autoencoder architecture that uses LSTM to map piano notes to low-res controller buttons, then decode back to the piano space. It’s trained on 1400 virtuosic performances from the International Piano-e-Competition.

2/

@chrisdonahuey

Remarkably, this project was done in 2018, *before* the age of LLMs and co-pilot. That’s why it used LSTM.

Brilliant work by @chrisdonahuey @iansimon @sedielem from @GoogleMagenta team. They recently did another super cool project called “SingSong” — your reverse Karaoke👇

3/

https://twitter.com/DrJimFan/status/1620802891123671040

Read 4 tweets

Jim Fan

@DrJimFan

Feb 12

Clinical Decision Transformer: a recommender system that takes in a desired range of clinical states as “goal”, and outputs a sequence of medications for the patient.

I applaud the authors for the interesting work & open-sourcing (soon), but I think the risks are immense.

1/

Just like (or even worse than) autonomous driving, a single medical accident can shake the faith of the general public for years to come. It affects not just one system or institute, but the entire AI+Medical industry.

2/

Moving forward, I think it’s important that every clinical decision is accompanied by extensive explanations and comprehensive references into the medical literature, so that doctors know when and whether they should trust the output.

3/

Read 5 tweets

Jim Fan

@DrJimFan

Feb 6

I see Twitter as a place to open-source my ideas. I write about AI recipes, deep dives, insights of the past, and foresights of a better future.

Thanks for following. Here’s your first-class seat aboard the AI Express - all my top posts in one big 🧵. Enjoy:

https://twitter.com/drjimfan/status/1595459499732926464

Deep dive: building *embodied* general intelligence — our NeurIPS Best Paper on “MineDojo”, open-ended agent learning in Minecraft.

https://twitter.com/drjimfan/status/1595459499732926464

https://twitter.com/drjimfan/status/1612496633056620545

Recipe: how to make virtual assistants like Siri & Alexa dramatically better.

https://twitter.com/drjimfan/status/1612496633056620545

Read 19 tweets

Jim Fan

@DrJimFan

Feb 2

Music & sound effect industry has not fully understood the size of the storm about to hit.

There’re not just one, or two, but FOUR audio models in the past week *alone*

If 2022 is the year of pixels for generative AI, then 2023 is the year of sound waves.

Deep dive with me: 🧵

@GoogleAI

MusicLM by @GoogleAI, a hierarchical text-to-audio model that generates music at 24 kHz that remains consistent over several minutes. It relies on 3 key pre-trained modules: SoundStream, w2v-BERT, and MuLan.

1.1/

Among the three, MuLan is particularly interesting - it’s a CLIP-like model that learns to encode paired audio and text closer to each other in the embedding space. MuLan helps address the limited paired data issue - now MusicLM can learn from large audio-only corpus.

1.2/

Read 11 tweets

Share this page!

Jim Fan

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @DrJimFan

Jim Fan

Jim Fan

Jim Fan

Jim Fan

Jim Fan

Jim Fan

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!