Jesse Engel Profile picture
May 26 4 tweets 2 min read
One takeaway for me from (#dalle2, #imagen, #flamingo) is there's no one "golden algorithm" to unlock these new transfer learning capabilities. Contrastive, AR, Freezing, Priors, they all can work. You almost can't stop these models from exhibiting these new types of behavior...
...It reminds me a lot of early DL days, when people used to think you needed sparsity regularization to learn nice gabor filters in NNs, but then it turned out than almost any model with convolution and enough natural data would learn them on their own...
...We shifted our attention to different parts of the problem, as the features of visual transfer learning with pretrained convnets were just "a given" to be nice representations regardless of the architecture and dataset (kind of crazy when you think about it)...
...The past month has felt a lot like those 2012-2016 days of just seeing the tip of the iceberg of a new transfer learning paradigm and a new set of things that we start to take for granted ("of course" a LLM works fine for multimodal transfer to a wildly different domain...)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jesse Engel

Jesse Engel Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jesseengel

Nov 10, 2021
Check out our latest blog post on using Transformers for Music Transcription: g.co/magenta/mt3

Authors: @jpgard, @ethanmanilow, @iansimon, @fjord41, @rigeljs, @jesseengel
Rather than training domain-specific models for each dataset, we show that a seq2seq approach can jointly train on many different datasets with arbitrary combinations of instruments. This is an important step towards general purpose music transcription.
Why are we doing this if we're supposed to be working on machine learning for creativity? Transcription extracts notes from audio, which are useful both for human control and for training powerful language models on symbolic music from real audio (e.g. Music Transformer)
Read 10 tweets
Sep 18, 2021
It’s well known that neural networks model correlation and not causation.

Recently, I’ve found it helpful to think about NN blocks as literal correlations of correlations of correlations …

(incl. dense, norm, nonlin, conv, softmax, transformer, LMs, GANs, …)

🧵 1/20
This is probably obvious to a lot of people, but I found it interesting, so I thought I'd share. Corrections welcome 😀

2/
At the heart is matrix multiplication, which is just the dot product (i.e. linear correlation) of each input vector with each weight vector.

So the dimensions of an output vector are just the correlation of an input vector with each weight vector.

3/
Read 20 tweets
Apr 27, 2021
1/4 Sorry for another AI rant, I'm just reminded on a daily basis of how harmful the term really is. Almost all technologies could be much better described by saying what they actually do, where the "A" is "automation" and/or "augmentation", and hardly artificial.
Examples: automated decision making, automated policing, automated hiring, augmented writing, augmented creative tools, etc...

It gives a much clearer picture of what a technology does, how it changes power dynamics of society, and who's responsible for its creation and use.
The distinction between augmentation and automation is really a subjective one, depending on whether the process being automated is something that people feel still has value being done manually by a person. There's nothing new about that, machine learning just accelerates it.
Read 4 tweets
May 1, 2020
A lot of folks have been asking me my thoughts about the recent Jukebox work by @OpenAI, so I thought a thread might help. I feel like I have separate reactions from three different parts of my identity:

1) ML researcher
2) ML researcher of music
3) Musician

Long thread :)
1/17
1) As an ML researcher, I think the results are really impressive! The model builds directly off of the VQ-VAE2 work of @avdnoord, hierarchically modeling discrete codes with transformer priors, and autoregressive audio approaches of @sedielem.
2/17
This work shows that with meticulous engineering and TONS of data (more on that later) these models can really scale! Sander and I have had a friendly back and forth about this approach for years, and I was truly amazed the output quality. It’s really impressive research!
3/17
Read 17 tweets
Jan 15, 2020
Differentiable Digital Signal Processing (DDSP)! Fusing classic interpretable DSP with neural networks.

⌨️ Blog: magenta.tensorflow.org/ddsp
🎵 Examples: g.co/magenta/ddsp-e…
⏯ Colab: g.co/magenta/ddsp-d…
💻 Code: github.com/magenta/ddsp
📝 Paper: g.co/magenta/ddsp-p…

1/
2/ tl; dr: We've made a library of differentiable DSP components (oscillators, filters, etc.) and show that it enables combining strong inductive priors with expressive neural networks, resulting in high-quality audio synthesis with less data, less compute, and fewer parameters.
3/ An example DDSP module is an Additive Synthesizer (sum of time-varying sinusoids). A network provides controls (frequencies, amplitudes), the synthesizer renders audio, and the whole op is differentiable . Here's a simple example with harmonic (integer multiple) frequencies.
Read 16 tweets
Feb 28, 2019
Make music with GANs!
GANSynth is a new method for fast generation of high-fidelity audio.

🎵 Examples: goo.gl/magenta/gansyn…
⏯ Colab: goo.gl/magenta/gansyn…
📝 Paper: goo.gl/magenta/gansyn…
💻 Code: goo.gl/magenta/gansyn…
⌨️ Blog: magenta.tensorflow.org/gansynth

1/
2/ tl; dr: We show that for musical instruments, we can generate audio ~50,000x faster than a standard WaveNet, with higher quality (both quantitative and listener tests), and have independent control of pitch and timbre, enabling smooth interpolation between instruments.
3/ We explore a range of architectures and audio representations and find that the best results come from generating in the spectral domain, with large FFT sizes to allow for better frequency resolution (H) and generating the instantaneous frequency (IF) instead of phase directly
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(