Tweet

Karl Higley

9 Oct, 17 tweets, 3 min read

“Why Are Deep Learning Models Not Consistently Winning Recommender Systems Competitions Yet?“

dl.acm.org/doi/abs/10.114…

My take is that we haven’t had the right model architectures. Here’s why I think that...

Going way back to the Netflix prize, multiplicative interactions have been a key component of successful modeling strategies. Matrix factorization did well on the Netflix data and became a classic approach to making recommendations.

Many further iterations on the key concept of factorizing matrices into low-rank approximations with vector embeddings per user/item/attribute have also been successful.

Drawing inspiration from word2vec, the CoFactor paper showed that you can improve the performance of MF by jointly factorizing an item-item mutual information matrix. Makes sense: there’s information in which pairs of items users like together that’s hard for MF to extract.

Factorization machines demonstrated that you can extend the concept of multiplicative interactions between vector embeddings to tabular data with side information and get improved cold-start and resilience to sparsity by incorporating metadata as side information.

Field-aware FMs have been successful in many competitions, and supercharge that idea by using separate embeddings for interactions between different features. That increases model capacity, but provides a structured inductive bias beyond increasing embedding dimensions.

What all these have in common is that the models explicitly incorporates pairwise interactions directly into their structure.

Early attempts to apply deep learning to recommendations largely did away with explicit pairwise multiplicative interactions, deciding instead to focus on data compression with autoencoders and MLPs.

Compressing a bunch of information into a low-dimensional embedding was part of the magic of matrix factorization, so it seemed like deep architectures should be able to do it better. And they can, but that wasn’t the only important element of previously successful models.

People tried concatenating user and item embeddings and feeding them through MLPs in so many different ways (Neural Matrix Factorization, Wide and Deep, DeepFMs.)

What we know now is that—despite theoretically being universal approximators—MLPs are horrendously inefficient at approximating multiplicative interactions.

(See the 2018 Latent Cross and 2020 NCF vs MF papers for details.)

So that approach hasn’t really planned out.

What we’ve seen instead is success applying RNNs—which have multiplicative interactions—to sequential recommendations, and a shift toward two-tower networks with explicit multiplicative interactions (dot products) between the towers.

What that suggests to me is that the key component of factorization-based approaches wasn’t data compression, it was multiplicative interactions.

When you look at models like FMs and FFMs, they use low-rank approximations with vector embeddings but make no attempt to compress all user/item info to one vector. They do almost the exact opposite and go wild with lots and lots of vectors!

Looking around the current DL for RecSys landscape, the only model family I know of that takes that general approach are Deep & Cross Networks, which concatenate a bunch of vectors and then form pairwise/higher-order interactions by multiplying the result by itself repeatedly.

That architecture actually makes sense, given the history of successful recommender models.

The early hype didn’t really pan out, and a lot of naive applications of DL to RecSys haven’t really worked, but I’m optimistic about deep models that build on approaches we know work.

And what do we know works?

Explicit multiplicative interactions incorporated directly into the structure of the model.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @karlhigley

Karl Higley

@karlhigley

6 Oct

https://twitter.com/aspergersgirls/status/1313543348607623168

This has been exactly my experience in the workplace as an autistic ML engineer. I’m still doing machine learning, building recommender systems, attending conferences, contributing to open source projects, publishing my work.

I’m just not getting paid, because I don’t fit in.

https://twitter.com/aspergersgirls/status/1313543348607623168

When I mention that I’m not currently employed, people sometimes reach out to see if I’d be interested in a position with their company—which I appreciate but often don’t know how to respond to.

I might be, but the truth is that I have absolutely been through the wringer with work these past five years or so, and I have some lingering workplace-related trauma that makes me very hesitant to pursue the next thing.

Read 11 tweets

Karl Higley

@karlhigley

9 Aug

It’s actually not that difficult to understand why people whose foundational worldview is “I feel safe and comfortable when I’m part of a homogeneous in-group that defines the terms of public life” would feel like they’re under siege by the (reasonable) demands from out-groups.

When people say that they feel like their freedoms are being taken away when freedoms are granted to others, this is what they’re referring to: the freedom to live in a social bubble where they don’t have to grapple with differences they find unsettling or threatening.

Here’s the thing: we all feel more comfortable in homogeneous groups that dictate the terms of interaction. Some of us grow beyond it to become capable of more engaging outside our bubbles, and some of us don’t. This mindset is in all of us to some extent and is never going away.

Read 4 tweets

Karl Higley

@karlhigley

27 Jul

This story and picture of paint-covered magazines really doesn't add up, so I did some digging and found some interesting stuff. 1/many

First of all, the paint goes entirely unremarked on and makes no sense. However, reporting from various sources indicates that federal agents claim that they are being hit with balloons full of paint. 🤔

Oh really, you don't say? Well, here's a Reuters photo from yesterday of a DHS police officer covered in red paint.

Source (slide 2): mobile.reuters.com/news/picture/t…

Read 54 tweets

Karl Higley

@karlhigley

3 Mar

https://twitter.com/sriramk/status/1222547047846297600

I’m not a PM, but I have worked on the home screen for a major streaming service and wrestled with some of these trade-offs as algorithmic recommendations and ranking were introduced. I learned a few things along the way...🧵

https://twitter.com/sriramk/status/1222547047846297600

Before the home screen is algorithmically ranked, it likely represents the product of many years of messy stakeholder interaction and negotiation, during which they all vied to have their thing at the top.

When new home screen elements are being tested, everyone wants to know how their new thing performs in the top slot. Performance at the top is in no way representative of the overall value of having the content somewhere on the page.

Read 23 tweets

Karl Higley

@karlhigley

6 Sep 19

https://twitter.com/karlhigley/status/1166399765313196032

Building recommenders without machine learning:

https://twitter.com/karlhigley/status/1166399765313196032

https://twitter.com/karlhigley/status/1162561664836915200

Where the privacy risks of recommender systems live:

https://twitter.com/karlhigley/status/1162561664836915200

https://twitter.com/karlhigley/status/1180137859975651333

The gap between published research and production:

https://twitter.com/karlhigley/status/1180137859975651333

Read 14 tweets

Karl Higley

@karlhigley

12 Jun 19

I used to think that making explainable recommendations required an interpretable model. I was wrong, and in order to understand why, you need to understand a few things about the structure of industrial recommender systems. 🧵

The structure I used to picture involved using interaction data and a model to generate vectors for users and items (matrix factorization, word embeddings, etc), and then making recs by finding items similar to each user vector with approximate nearest neighbor search.

The picture above is the sort of thing you'll often see in introductory texts about recommender systems, and while it's sufficient to generate a list of recs, it doesn't provide an easy way to explain why any particular item was selected.

Read 14 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Karl Higley

Try unrolling a thread yourself!

More from @karlhigley

Karl Higley

Karl Higley

Karl Higley

Karl Higley

Karl Higley

Karl Higley

Did Thread Reader help you today?

Like this author's thread?