Profile picture
Matthew Hindman @MattHindman
, 17 tweets, 3 min read Read on Twitter
SCOOP: I have learned how #CambridgeAnalytica built their Facebook targeting model.

How did I find out, you ask? Funny story.

I, um, emailed Aleksandr Kogan, and he told me.

theconversation.com/how-cambridge-…
Link to the story should be live shortly. But if you missed it, here’s the story so far. I was on the right track, it seems:
Big point: this was *not* a personality targeting model per se.

It was a soak-up-all-the-correlation-and-call-it-personality model.

Not quite what it said on the tin, but it does look like it worked well.
I asked if Kogan if they’d used SVD, like in the Netflix prize.

Not exactly, says Kogan: SVD has a “particularly strong pull towards centrality as a function of number of likes a person has.” True.
Instead, Kogan and Chancellor rolled their own dimensionality reduction model, based on a “multi-step co-occurrence approach.” The model is not in the public domain.
How accurate was it? According to Kogan, “the correlation between predicted and actual scores... was around .3 for all the personality dimension.”

Best possible accuracy for these models is about .7-.8.
Correlation of “around .3” seems a bit low, but plausible. Kosinki-Stillwell-Graepel’s 2013 PNAS piece and other research has done -- slightly -- better. Not clear why they’d build their own model when an off-the-shelf version was supposedly comparable in overall accuracy.
Even if all this model did was give noisy Big Five scores, and personality was only modestly related to politics, this would be still be useful for targeting *low-cost* actions.

Like -- just throwing this out there -- targeting Facebook ads.
Here’s the rub though: similar FB matrix factorization models that can predict predict personality at 0.3 are much, MUCH better at predicting traditional political variables.
Kosinki-Stillwell-Grapel had 95% accuracy for race. 93% for gender. And 85% for partisanship. All with ZERO demographics or social information added, so that’s a lower bound. pnas.org/content/110/15…
That’s nearly voter file-level accuracy even without the voter file. Match rates vary, but easily tens of millions of Facebook users can't be matched to the voter file.

And scores are surely most accurate for heaviest users -- they ones you want to target.
Dimension reduction models boil everything down into factors / components -- essentially artificial categories.

Demographics, social influences, personality all get smelted down into a big correlated lump.
These models can give estimates for *every* citizen on *any* variable available on any substantial group of users. The fill out the matrix, even with missing cells.

That means they can estimate the Big Five personality scores for every voter.
But these personality scores are the output of the model, not the input. All the model knows is that certain Facebook likes, and certain users, tend to be grouped together.
Cambridge Analytica could say that it was identifying people with low openness to experience and high neuroticism.

But the same model, with the exact same predictions for every user, could just as accurately claim to be identifying less educated older Republican men.
Lots more to unpack. Obviously this wasn't a crystal ball.

But it does look like it was an effective political tool, even if personality was only a modest part of what made it work.
One more speculative point.

I suspect that targeting by factors was *especially* good with Facebook's lookalike audiences. Models like other models that "speak their language" as it were. That is probably worth digging into.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Matthew Hindman
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!