Tweet

More from @jesseengel

Jesse Engel

@jesseengel

Nov 10, 2021

@jpgard

Check out our latest blog post on using Transformers for Music Transcription: g.co/magenta/mt3

Authors: @jpgard, @ethanmanilow, @iansimon, @fjord41, @rigeljs, @jesseengel

Rather than training domain-specific models for each dataset, we show that a seq2seq approach can jointly train on many different datasets with arbitrary combinations of instruments. This is an important step towards general purpose music transcription.

Why are we doing this if we're supposed to be working on machine learning for creativity? Transcription extracts notes from audio, which are useful both for human control and for training powerful language models on symbolic music from real audio (e.g. Music Transformer)

Read 10 tweets

Jesse Engel

@jesseengel

Sep 18, 2021

It’s well known that neural networks model correlation and not causation.

Recently, I’ve found it helpful to think about NN blocks as literal correlations of correlations of correlations …

(incl. dense, norm, nonlin, conv, softmax, transformer, LMs, GANs, …)

🧵 1/20

This is probably obvious to a lot of people, but I found it interesting, so I thought I'd share. Corrections welcome 😀

2/

At the heart is matrix multiplication, which is just the dot product (i.e. linear correlation) of each input vector with each weight vector.

So the dimensions of an output vector are just the correlation of an input vector with each weight vector.

3/

Read 20 tweets

Jesse Engel

@jesseengel

Apr 27, 2021

1/4 Sorry for another AI rant, I'm just reminded on a daily basis of how harmful the term really is. Almost all technologies could be much better described by saying what they actually do, where the "A" is "automation" and/or "augmentation", and hardly artificial.

Examples: automated decision making, automated policing, automated hiring, augmented writing, augmented creative tools, etc...

It gives a much clearer picture of what a technology does, how it changes power dynamics of society, and who's responsible for its creation and use.

The distinction between augmentation and automation is really a subjective one, depending on whether the process being automated is something that people feel still has value being done manually by a person. There's nothing new about that, machine learning just accelerates it.

Read 4 tweets

Jesse Engel

@jesseengel

May 1, 2020

@OpenAI

A lot of folks have been asking me my thoughts about the recent Jukebox work by @OpenAI, so I thought a thread might help. I feel like I have separate reactions from three different parts of my identity:

1) ML researcher
2) ML researcher of music
3) Musician

Long thread :)
1/17

@avdnoord

1) As an ML researcher, I think the results are really impressive! The model builds directly off of the VQ-VAE2 work of @avdnoord, hierarchically modeling discrete codes with transformer priors, and autoregressive audio approaches of @sedielem.
2/17

This work shows that with meticulous engineering and TONS of data (more on that later) these models can really scale! Sander and I have had a friendly back and forth about this approach for years, and I was truly amazed the output quality. It’s really impressive research!
3/17

Read 17 tweets

Jesse Engel

@jesseengel

Jan 15, 2020

Differentiable Digital Signal Processing (DDSP)! Fusing classic interpretable DSP with neural networks.

⌨️ Blog: magenta.tensorflow.org/ddsp
🎵 Examples: g.co/magenta/ddsp-e…
⏯ Colab: g.co/magenta/ddsp-d…
💻 Code: github.com/magenta/ddsp
📝 Paper: g.co/magenta/ddsp-p…

1/

2/ tl; dr: We've made a library of differentiable DSP components (oscillators, filters, etc.) and show that it enables combining strong inductive priors with expressive neural networks, resulting in high-quality audio synthesis with less data, less compute, and fewer parameters.

3/ An example DDSP module is an Additive Synthesizer (sum of time-varying sinusoids). A network provides controls (frequencies, amplitudes), the synthesizer renders audio, and the whole op is differentiable . Here's a simple example with harmonic (integer multiple) frequencies.

Read 16 tweets

Jesse Engel

@jesseengel

Feb 28, 2019

Make music with GANs!
GANSynth is a new method for fast generation of high-fidelity audio.

🎵 Examples: goo.gl/magenta/gansyn…
⏯ Colab: goo.gl/magenta/gansyn…
📝 Paper: goo.gl/magenta/gansyn…
💻 Code: goo.gl/magenta/gansyn…
⌨️ Blog: magenta.tensorflow.org/gansynth

1/

2/ tl; dr: We show that for musical instruments, we can generate audio ~50,000x faster than a standard WaveNet, with higher quality (both quantitative and listener tests), and have independent control of pitch and timbre, enabling smooth interpolation between instruments.

3/ We explore a range of architectures and audio representations and find that the best results come from generating in the spectral domain, with large FFT sizes to allow for better frequency resolution (H) and generating the instantaneous frequency (IF) instead of phase directly

Read 10 tweets

Share this page!

Jesse Engel

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @jesseengel

Jesse Engel

Jesse Engel

Jesse Engel

Jesse Engel

Jesse Engel

Jesse Engel

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?