, 25 tweets, 4 min read
My Authors
Read all threads
This is a really interesting story, worth a read and some contemplation by anyone working on recommender systems. As is often the case with media coverage of technology though, I suspect it may contain some claims that are either a stretch or not entirely accurate. 🧵
In particular, this part about how YouTube recommendations were retooled to expand your interests in a project called Reinforce:
I’ve worked on large scale recommender systems at a major streaming service, and this rang some bells for me. REINFORCE is the name of a reinforcement learning algorithm, which doesn’t (at least directly) have much to do with with expanding or guiding anyone’s interests.
In traditional (supervised) machine learning problems, you collect a bunch of historical data and train a model to predict some attribute or occurrence contained in that data. Since the data is based on the past, it’s possible know the “right” answer for each example.
In order to learn an accurate model, it’s important that the historical data used to train the model is a good approximation of the future data that the model will be asked to predict.
Recommenders (among other ML systems) introduce a wrinkle into this picture: the historical data they are trained on was generated by a previous version of the model/system, creating a feedback loop. What was recommended in the past influences what can be learned in the future.
In order to train effective models in the presence of the feedback loop, there’s a need to deliberately explore the possibilities in order to collect data to train future models.
So, instead of training a model to generate the best recommendations possible, the model is instead trained to generate a range of possibilities, with some more likely to be recommended than others. (This is known as a “policy” in reinforcement learning.)
The exploratory behavior of the system isn’t introduced in order to expand people’s interests (though it may or may not have that effect.) It’s introduced in order to avoid the accuracy/performance of the recommender from degrading over time.
Without exploration, the recommender would tend to focus on a narrower and narrower subset of the catalog of recommendable items in time, over-emphasizing what worked in the past at the expense of newer content and newer interests.
The REINFORCE algorithm (first published in 1992) provides one approach for learning from experience and exploration using neural nets: www-anw.cs.umass.edu/~barto/courses…
There was a WSDM ‘19 conference paper about how YouTube has applied it to their recommender system: alexbeutel.com/papers/wsdm201…
If those papers look daunting, that’s because they are! I’ve worked on this kind of thing, and I don’t find them easy reading. These systems are very difficult to understand and predict the behavior of, even for for the people working on them.
When people write about the adverse effects of recommender systems, they often frame the issue as being an effect of deliberate design, which I think overstates the case and misunderstands the nature of recommender systems.
That’s certainly not to absolve recommender systems of their social impacts! It’s just to say that it’s entirely possible for models to learn things that no one ever intended or designed for.
To understand how that happens, consider how you’d generate recommendations for a single video with exploration. You include some videos you’re confident in, and some farther afield that you’re less confident in.
If the seed video is about running a marathon, you’d probably include some other videos about marathon running (high confidence), and then maybe some videos about jogging for exercise and some about ultra-marathoning (lower confidence.)
They’re all related, in that they share a broad topic, but they’re also all the way across the spectrum in terms of how extreme they are. It’s important to understand that this can happen without any explicit knowledge of how extreme the content is.
It’s just a consequence of the recommender system trying to learn how related different items are by exploring and observing how people interact with them in different contexts.
In most subject areas, this isn’t particularly problematic. Very few people will become ultra-marathoners, even if you recommend them relevant videos.
On certain topics, it can be wildly problematic though, when exposure to more extreme content creates a path that leads to misinformation, incitement, and radicalization.
The issue isn’t that the system is designed from the outset to promote extreme content; it’s that the system learns from people’s behavior and people do indeed find extreme content engaging.
Is it avoidable? Well, maybe. Part of the issue is that extreme and problematic content exists on the platform in the first place. But at YouTube’s scale, human moderation of all videos is infeasible. (Aside from that, we know that moderation jobs are psychologically toxic.)
It’s a very tough problem, and I don’t envy the people tasked with addressing it. Like any new technology, recommender systems have a panoply of consequences, which are helpful, harmful, and everything in between.
(And many not entirely foreseeable at the outset, despite our hindsight bias looking back.)
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Karl Higley

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!