Julia Angwin Profile picture
Investigative journalist. Founder, @proof__news, @NYTOpinion writer. Founded @themarkup. Priors @ProPublica @WSJ. Author: Dragnet Nation, Stealing MySpace.
Jul 16 5 tweets 2 min read
Huge investigation from @proof__news today: We reveal the trove of YouTube videos that are being used to train AI models (including Anthropic's Claude).

Yes, it includes all your favorite YouTubers - from @hankgreen to @MrBeast to @khanacademy.

proofnews.org/apple-nvidia-a… Were your favorite YouTubers' videos secretly used to train AI? Search the dataset that we compiled here:
proofnews.org/youtube-ai-sea…
Jun 13, 2023 15 tweets 9 min read
🧵Amidst rampant surveillance, one bastion of privacy remains – end-to-end encrypted messaging apps like Signal and WhatsApp. But dangerous laws are being proposed in US, UK, EU & beyond to force those apps to scan your messages.

My latest for @nytopinion nytimes.com/2023/06/13/opi… Feeling a sense of deja vu? Yes, you have heard this story before. But it’s worse this time.

Previously, the FBI sought a “master key” that could unlock encrypted content with a search warrant. But they lost after a showdown with Apple in 2016. /2

apple.com/customer-lette…
Feb 4, 2023 15 tweets 6 min read
I’m sad to report that I am leaving @themarkup to pursue other projects, which I will announce soon. It was an honor and a privilege to found @themarkup five years ago to create an investigative newsroom that integrated engineers and journalists. /1 My goal was to use the best of tech – computation, automation, machine learning – to investigate the human impacts of tech. And to do it using the scientific method as our compass rather than the fuzzy concept of “objectivity.” /2
Jan 14, 2023 8 tweets 4 min read
Let’s talk about consent. Do you feel like you ever properly consented to being surveilled online constantly, having a profile built of your interests and having that profile made available to anyone who could pay for it?

EU regulators don’t think so either. /1 Earlier this month @edpb fined Meta €390 million for not getting proper consent before profiling FB & IG users. It was hailed a huge victory for EU’s landmark privacy law, GDPR, but sadly it may not change how you are profiled. /2
Jul 30, 2022 4 tweets 2 min read
The U.S. is closer to passing a federal privacy law than ever. But there’s a catch: it sets a “ceiling” and not a “floor” for state & local privacy laws.

In this week’s newsletter @cam_kerry says that's “the price of getting strong protections.” /1

themarkup.org/newsletter/hel… But Ashkan Soltani, head of the new privacy agency in California, where a strong privacy law would go into effect next year, tells me the trade-off “is a trap.”

The federal bill “locks into amber” rules that prevent future innovation to protect privacy. /2 Image
Jun 22, 2022 13 tweets 8 min read
In light of the recent US settlement with Facebook, I want to tell y’all a story about how hard it is to make change in our algorithmic world, why you need a village of researchers, and why law enforcement agencies need to get better at tech. /1

nytimes.com/2022/06/21/tec… Six years ago, @terryparrisjr & I bought an ad on Facebook, targeted to only white people looking for housing, using a drop-down menu blocking ads from being seen by different “ethnic affinity groups.” Experts said this violated the Fair Housing Act. /2

propublica.org/article/facebo…
Feb 10, 2022 6 tweets 2 min read
Today's journalism lesson: "On background" is a request for anonymity. To be honored, it must be agreed to by both parties. It cannot be unilaterally declared.

That's why today we are publishing an email from Amazon which they insist was on background:

documentcloud.org/documents/2120… If we didn't agree it was on background, then it was not.

Because it destroys trust with readers, we only grant anonymity when a source faces retaliation and we can't get the information any other way.

getrevue.co/profile/themar…
Dec 2, 2021 16 tweets 8 min read
Critics have long suspected that predictive policing software was racially biased.

Today, we have the answer: @themarkup & @gizmodo analyzed 5.9 million algorithmic crime predictions. We found they disproportionately target Black & Latino areas. /1

themarkup.org/prediction-bia… Police across the U.S. use software from a company called PredPol (recently renamed @Geolitica_PS ) that says it predicts future crime without racial bias.

But we found it rarely predicted crime in White areas & disproportionately predicted it in Black & Latino areas. /2
Nov 18, 2021 10 tweets 4 min read
Facebook says top content on its platform comes from reputable sources like UNICEF, ABC News & the CDC.

But our #CitizenBrowser data shows that sensational partisan content from Daily Wire & The Western Journal are top performers. @corintxt reports:

themarkup.org/citizen-browse… Here’s the difference: Facebook measures “reach” - which is how many unique viewers saw each domain.

But we measured an equally, if not more important metric, “impressions” – which is how many times a piece of content is bombarded at users. /2
Oct 21, 2021 4 tweets 5 min read
Nonprofits are exempt from many state privacy laws. But should they be?

We scanned 23,000+ nonprofit sites and found they were heavily tracking visitors. Planned Parenthood was even monitoring visitor keystrokes.

@alfredwkng & @tenuous report:

themarkup.org/blacklight/202… @alfredwkng @tenuous We scanned nonprofits using the real-time privacy forensics tool that @suryamattu built — Blacklight.

Try it yourself on your favorite websites. You might be surprised at what you find:

themarkup.org/blacklight
Apr 9, 2021 5 tweets 3 min read
🧵: Yesterday we revealed how YouTube enabled advertisers to build ad campaigns around hate terms.

Today we reveal how YouTube blocked advertisers from building ad campaigns around social justice terms such as “Black Lives Matter.”

themarkup.org/google-the-gia… 2/ @leonyin and @asankin found that YouTube’s ad portal blocked search results for one-third of the 62 racial & social justice phrases tested.

For example: All the phrases we tested containing the word “Muslim” were blocked, even innocuous ones like Muslim fashion.
Apr 8, 2021 6 tweets 4 min read
🧵: We found a secret blocklist on Google Ads that hides YouTube hate videos. But @leonyin and @asankin found it was full of holes.

Blocked: heil hitler
Not blocked: heilhitler

Blocked: white nationalist
Not blocked: white nationalists
themarkup.org/google-the-gia… 2/ Many hate terms weren’t blocked in YouTube’s ad buying portal at all. We tested 86 well-known hate terms and phrases, and only one-third were blocked. Unblocked terms included:

14 words
Blood and soil
Daily Stormer
Great replacement
Zionist occupation government
Jan 31, 2021 10 tweets 2 min read
There’s been a lot of chatter on my feed about Facebook Oversight Board’s decision to release its ruling to some journalists and academics on embargo.

I thought it would be worth talking about how newsrooms can and should think about embargoes. /1 First, to address the obvious question: yes, embargoes are a PR manipulation tactic.

When you accept embargoed material, you usually cannot do what journalists normally do, which is consult experts about it. /2
Jan 5, 2021 13 tweets 8 min read
Facebook is a newstand. But no one can see which news Facebook is pushing to the top.

So we built an app for that called #CitizenBrowser. Our first finding: the sharp impact of Facebook’s political ad ban reversal in the Georgia Senate elections.
themarkup.org/citizen-browse… /1 This is the first report from our #CitizenBrowser project. There will be many more to come. But first, I want to tell you a bit about how we did it because it’s the most ambitious thing we’ve ever done @themarkup – and we do a lot of ambitious projects. /2
Dec 7, 2020 6 tweets 3 min read
I’m excited to announce that we have assembled a fantastic team to help us get Citizen Browser launched! Citizen Browser is our ambitious effort to build a national panel to audit social media algorithms: themarkup.org/citizen-browser /1 @corintxt joins the team as Data Reporter - he will be digging through the data to help us find stories. Corin has long worked as a reporter covering technology news. I love this story outing pay-for-play crypto news outlets: breakermag.com/we-asked-crypt… /2
Oct 29, 2020 4 tweets 2 min read
Facebook’s black box algorithm charged the Biden campaign higher ad rates on average than it charged the Trump campaign.

A new investigation by @jeremybmerrill:

themarkup.org/election-2020/… In swing states during July and August, Biden paid ad rates of about $34 compared with $17 paid by Trump’s campaign.

The gap narrowed in the fall - but overall Biden has paid 11% more than Trump.

As always, we show our work:
themarkup.org/election-2020/…
Sep 22, 2020 7 tweets 6 min read
Journalism is supposed to afflict the comfortable.

So we built an app for that.

Introducing Blacklight – a privacy tool that lets you scan any website and see how you are being surveilled. Built by the incomparable @suryamattu.

themarkup.org/blacklight/ @suryamattu Blacklight was born from a conversation @suryamattu and I had updating the privacy series “What They Know” that I led ten years ago at @wsj.

What did we find? The Tl;DR: surveillance has become creepier and more difficult to stop.

themarkup.org/blacklight/202…
Jul 28, 2020 4 tweets 2 min read
Remember when a Google search used to lead you somewhere?

Now it increasingly just keeps you on Google. In fact, Google results take up 62.6% of the first screen of search results in a sample of 15,000 searches.

themarkup.org/google-the-gia… It wasn't easy to measure Google search results. @LeonYin wrote two custom scrapers and 68 parsers to identify elements on Google search result pages.

As always, all our data, code and an extensive (like REALLY extensive) methodology here:
themarkup.org/google-the-gia…
Jun 18, 2020 6 tweets 3 min read
Amazon bans dangerous listings. But we found nearly 100 listings for banned weapons, drug equipment, and spy gear.

Several were listed as “Amazon’s Choice.” Five were sold by Amazon itself.

Amazing reporting from @AnnieGilbertson and @jonkeegan themarkup.org/banned-bounty/… /1 These materials are not only dangerous - but deadly. In an interview from prison, Eric Falkowski told us that he bought pill presses on @amazon and used them to make counterfeit prescription opioids. His fake pills killed two people and sickened 20 others. /2
May 28, 2020 6 tweets 3 min read
Looking to rent an apartment? Prepare to be subjected to the unaccountable algorithms that landlords across the nation are using to screen tenants.

@lkirchner reports in a joint investigation with @MattGoldstein26 of @nytimes /1

themarkup.org/locked-out/202… They found that these screening companies often use the loosest possible standards for matching names, including so-called “wild-card” searches where the records of anyone whose names shares first three letters similar as yours can be included in your report. /2 Image
Apr 16, 2020 4 tweets 2 min read
Results of our 50-state FOIA for COVID testing algorithms are in - and the differences are stark.

Just one example: If you’re a senior with a fever, you qualify for a test in Utah but not Wisconsin.

themarkup.org/coronavirus/20… At @themarkup we always show our work: here’s all the data we have from 20 jurisdictions.

Thanks to @tenuous and Emmanuel Martinez for the hard work of classifying data and @SamMorrisDesign for the hard work of beautifying it.

themarkup.org/coronavirus/20…