Sol Messing Profile picture
Aug 14, 2019 17 tweets 7 min read Read on X
Thread: After @icsjournal's "APIcalypse" issue & #IC2S2 2019, it's clear many are asking what the future holds for social media data. After working on privacy & data access at FB for > a year, I have thoughts. Thread ends w little known source for fb page data so read it
In 2011, @seanjwestwood & I ran a (IRB-approved) study using Facebook's graph API to analyze participants' entire ego network on the fly, have strong v weak ties endorse experimental post (stimulus), then re-render participants’ News Feeds. Those days are over.
The API was meant for developers to build on top of the social graph, but approvals were friendly to researchers. The scope of the data was startling & thankfully Sean had the foresight to delete data beyond what was necessary for analysis & publication.
Times have changed - OPM, MyLife, Equifax, Cambridge Analytica. Privacy issues are difficult to anticipate & secure, especially for complex & relational data. & CA showed the world that APIs are open to nefarious schemes and basic research alike.
Data privacy is in the air in newsrooms & capitals across the world. As privacy advocates devise strategies and work to protect people from identity theft, scams, information & other abuse, often social scientists advocating for research aren't at the table.
In the wake of these cross currents, privacy legislation & regulation present challenges for research communities. No company can shrug off a $5 bn fine for being too permissive w data—data collected via APIs in the name of social science research.
What's more under GDPR, meaningful, socially beneficial independent social science research often maps to the generally prohibited processing of "3rd party sensitive category data collected without consent." iapp.org/news/a/how-gdp…
Now the GDPR's research exemption carves out a protected legal space for that! BUT--only if originally *collected for research purposes.* The data everyone cares about was collected for business. What's left? Case-by-case opt-in--severe confounds, overhead, paperwork, etc.
Another path is if the data in question are anonymous. BUT GDPR does not define anonymity the same way that say HIPAA does--PII removed. Instead, data is anon if it cannot "reasonably" be re-identified. Sounds great, until you get into details pdpjournals.com/docs/88197.pdf
Differential privacy provides a theoretical framework showing that firm provable guarantees can be made. BUT they are probabilistic. What does "reasonably protected from reidentification" mean if it is always possible to some degree?
GDPR was not drafted to make things difficult for social media researchers. Rather, it may not have been crafted with this kind of research in mind. We are now grappling with likely unintended consequences.
What can be done? Global companies follow applicable law & regulation. Researchers need a seat at the policy making table to incentivize corporate sharing for basic research. May be even as simple as removing some of the downside risk.
People making decisions need to understand the societal value of social science. So keep sending work to @monkeycageblog, @MisOfFact, weigh in on Twitter, talk to journalists & present to policy-makers.
And learn about privacy - as social scientists we don't have the training to deeply understand the issues & participate in the debate. So read up on differential privacy.

Briefly: johndcook.com/blog/2018/11/0…,
more here privacytools.seas.harvard.edu/files/privacyt…
and finally cis.upenn.edu/~aaroth/Papers….
Unfortunately, DP introduces noise, requires-hard to come by expertise & is suitable to answer a limited number of questions. The US Census has worked on DP for nearly a decade & assumes that it is not currently feasible for the ACS census.gov/content/dam/Ce…
And if you’ve read this far to the bitter end, here’s a tip. If you've been negatively affected by the pages API restrictions (e.g., ) apply for CrowdTangle access via @SocSciOne here socialscience.one/rfp-crowdtangle. It doesn't have everything, but it has a lot.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sol Messing

Sol Messing Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @SolomonMg

Sep 30, 2023
1/ Many said Science went overboard on its cover for those Meta studies.

Science published my take yesterday.

It shows Meta’s algorithms actually tended NOT to increase ideological segregation in general, at least in 2020 🧵 Image
2/ Key point 1: González-Bailón et al 2023 claim newsfeed ranking increases ideological segregation (Fig 2B). BUT that’s based on domain-level analysis. URL-level analysis (Fig 2C) shows *no difference* in ideological segregation before and after feed-ranking. Image
3/ So what? We should strongly prefer URL-level analysis. Domain-level analysis effectively mislabels highly partisan content as “moderate/mixed,” especially on websites like YouTube, Reddit, and Twitter (aggregation bias/ecological fallacy). Image
Read 25 tweets
Apr 3, 2023
I wrote about The Algorithm: using Musk's metrics in ship decisions, what the Republican/Democrat code means for democracy, how Twitter's API $ increase undermines transparency efforts, & on the tech bros claiming to analyze it 'so you can go viral.'

solomonmg.github.io/post/twitter-t…
What does this mean for transparency? You need algorithmic audits to really understand what's happening on twitter, and with the recent API prince increases, to $500k/yr for meaningful access, made it incredibly difficult for research to audit this code in recent weeks
Also, Twitter is not downranking tweets about Ukraine. This is a label related only to crisis misinformation, as per Twitter’s Crisis Misinformation Policy. This code specifically governs Spaces, not ordinary tweets on hometimeline.
Read 29 tweets
Jan 12, 2023
🚨MASSIVE NEW STUDY ON DIGITAL/FB POLITICAL AD EFFECTS 🚨 in @NatureHumBehav from Minali Aggarwal, @_JenAllen, @aecoppock, @dfrankow, Kelly Zhang, @jimmyeatcarbs, Andrew Beasly, Harry Hantman, Sylvan Zheng!

nature.com/articles/s4156…

Ungated: solomonmg.github.io/pdf/acronymNHB…
Pundits and media commentators often assume large campaign effects, while many past studies find extremely small effects, often indistinguishable from zero. Measuring effects of the billions spent on political ads is one of the most significant challenges in the social sciences.
A huge, recent meta-analysis from @dbroockman & @j_kalla found advertising effects close to zero cambridge.org/core/journals/…
Read 18 tweets
Oct 5, 2021
Important proposal from @persily. I was “the other side of the table” at Facebook when Nate was working on SS1, though we almost always agreed about the right way to share data. There’s a lot to like here for policymakers, researchers *and platforms* (brief thread)
First, this straight up exempts university affiliated researchers from liability for scraping data for IRB-blessed projects. It's important to enshrine this into law to (1) protect researchers and (2) make it crystal clear that this work is normatively *good* for the world.
This kind of research ought to be given safe harbor and platforms ought not to discourage it
Read 11 tweets
Sep 11, 2021
I have a few things to say about the @nytimes @daveyalba story on @SSOne FB Condor data error. It's a shame I wasn't interviewed
First it sucks if you’ve written about the data. My heart goes out to you, and I have a sense of the work you have to do as a result. I understand there are engineers working to fix this ASAP.
But missing from the NYT piece + public conversation: *every single public data set has errors.* And, this one took so long to find largely due to privacy protections—researchers couldn't see raw data.
Read 24 tweets
Jun 10, 2021
Thread: @kmmunger et al's replication of work from @seanjwestwood & I brings up a key point for scholarship on tech: in this universe, we need to care not just about the person, but even more about the way the person interacts with the interface.
This is exactly why Facebook’s algorithmically sorted News Feed has prompted such fierce public debate—what we read and how we read it are governed by the interface. Most obviously: we usually see what’s at the top of our feed and almost never see what’s at the bottom.
How does this relate to @kmmunger et al's replication? Let me explain.
Read 19 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(