My Authors
Read all threads
A lot of folks have been asking me my thoughts about the recent Jukebox work by @OpenAI, so I thought a thread might help. I feel like I have separate reactions from three different parts of my identity:

1) ML researcher
2) ML researcher of music
3) Musician

Long thread :)
1/17
1) As an ML researcher, I think the results are really impressive! The model builds directly off of the VQ-VAE2 work of @avdnoord, hierarchically modeling discrete codes with transformer priors, and autoregressive audio approaches of @sedielem.
2/17
This work shows that with meticulous engineering and TONS of data (more on that later) these models can really scale! Sander and I have had a friendly back and forth about this approach for years, and I was truly amazed the output quality. It’s really impressive research!
3/17
That said, from a purely “fitting the data” perspective, I think this giant black box approach has many parallels to the pitfalls of giant ImageNet models trained to answer "Is there this object in the image?"
4/17
Giant nets have gotten really good on in-domain test sets, and their features can sometimes be useful for transfer learning, but there seems to be a growing consensus of large gaps in generalization to full scene understanding.
5/17
Many image researchers are now trying different approaches (structured prediction, causality, object-oriented models, etc.) to overcome these challenges. My research definitely leans this way, which can be thought of as a structured object-oriented modeling of music.
6/17
VQ-VAEs may turn out to be a useful step to interpretable structure, but the jury is still out . The biggest advantage is that these approaches allow you to train on any audio you can find (not scores or individual instruments), which brings me to my second perspective...
7/17
2) As a ML researcher of music, I try to get my group to spend a large amount of time thinking about the intention and impact of our work on musicians and the rest of society. We strive to answer the question “What value can ML have for musical and creative expression?"
8/17
"What do (and don't) we need from it?” without assuming something will be net positive because it's new. We try to include artists in our process as much as we can: creating tools for new expression, getting direct feedback, and building new models on HCI collaborations.
9/17
One thing that sits uneasy with me about this work is that artists are _essential_ to its success, but seems have been left out of the process completely. Like many advances in deep learning, the biggest novelty is arguably the “dataset” of 1.2 million songs acquired.
10/17
Music datasets (with rights) are *tiny* compared to image and language datasets. I’m pro free use of music, but it feels disingenuous to use an artist’s data, not include them in the process, and then train your model to specifically generate “in the style” of that same artist.
OpenAI had a much more thoughtful take on the ethical implications of GPT2 and I think it’s a shame they didn’t take the same level of consideration here. Especially with their ability to create precedent.
12/17
I’m all for creating new methods for non-trained musicians to generate music and express themselves, but I just feel that _how_ you go about doing that can matter as much as _what_ you do.
13/17
3) As a musician, I was surprised to find myself feeling a bit sad in response to the work. I’ve been trying to process it, but I think that it might be that on a personal level, one of the things I get out of music is the shared awareness of intention and feeling.
14/17
It’s amazing that something so internal and inexpressible can be communicated and shared just through hitting things and making weird sounds with our mouths. I resonate with music as an action, not as an artifact, it’s something you do.
15/17
This is true for ML and computer music too. In the best of it, I still feel a connection to the process and intention of the artist, even if just in their choice of dataset, conditioning, or the way they choose to generate and present samples.
16/17
I’m sure some artists will find amazing ways to do things with these models, but for now the “uncanny valley” effect from trying to so specifically recreate individual artists, without their consent, leaves me feeling cold in a way that sample-based music and cover bands don’t.
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Jesse Engel

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!