Huge investigation from @proof__news today: We reveal the trove of YouTube videos that are being used to train AI models (including Anthropic's Claude).
Yes, it includes all your favorite YouTubers - from @hankgreen to @MrBeast to @khanacademy.
🧵Amidst rampant surveillance, one bastion of privacy remains – end-to-end encrypted messaging apps like Signal and WhatsApp. But dangerous laws are being proposed in US, UK, EU & beyond to force those apps to scan your messages.
Previously, the FBI sought a “master key” that could unlock encrypted content with a search warrant. But they lost after a showdown with Apple in 2016. /2
I’m sad to report that I am leaving @themarkup to pursue other projects, which I will announce soon. It was an honor and a privilege to found @themarkup five years ago to create an investigative newsroom that integrated engineers and journalists. /1
My goal was to use the best of tech – computation, automation, machine learning – to investigate the human impacts of tech. And to do it using the scientific method as our compass rather than the fuzzy concept of “objectivity.” /2
Jan 14, 2023 • 8 tweets • 4 min read
Let’s talk about consent. Do you feel like you ever properly consented to being surveilled online constantly, having a profile built of your interests and having that profile made available to anyone who could pay for it?
EU regulators don’t think so either. /1
Earlier this month @edpb fined Meta €390 million for not getting proper consent before profiling FB & IG users. It was hailed a huge victory for EU’s landmark privacy law, GDPR, but sadly it may not change how you are profiled. /2
Jul 30, 2022 • 4 tweets • 2 min read
The U.S. is closer to passing a federal privacy law than ever. But there’s a catch: it sets a “ceiling” and not a “floor” for state & local privacy laws.
In this week’s newsletter @cam_kerry says that's “the price of getting strong protections.” /1
themarkup.org/newsletter/hel…
But Ashkan Soltani, head of the new privacy agency in California, where a strong privacy law would go into effect next year, tells me the trade-off “is a trap.”
The federal bill “locks into amber” rules that prevent future innovation to protect privacy. /2
Jun 22, 2022 • 13 tweets • 8 min read
In light of the recent US settlement with Facebook, I want to tell y’all a story about how hard it is to make change in our algorithmic world, why you need a village of researchers, and why law enforcement agencies need to get better at tech. /1
nytimes.com/2022/06/21/tec…
Six years ago, @terryparrisjr & I bought an ad on Facebook, targeted to only white people looking for housing, using a drop-down menu blocking ads from being seen by different “ethnic affinity groups.” Experts said this violated the Fair Housing Act. /2
Today's journalism lesson: "On background" is a request for anonymity. To be honored, it must be agreed to by both parties. It cannot be unilaterally declared.
That's why today we are publishing an email from Amazon which they insist was on background:
Critics have long suspected that predictive policing software was racially biased.
Today, we have the answer: @themarkup & @gizmodo analyzed 5.9 million algorithmic crime predictions. We found they disproportionately target Black & Latino areas. /1
themarkup.org/prediction-bia…
Police across the U.S. use software from a company called PredPol (recently renamed @Geolitica_PS ) that says it predicts future crime without racial bias.
But we found it rarely predicted crime in White areas & disproportionately predicted it in Black & Latino areas. /2
Nov 18, 2021 • 10 tweets • 4 min read
Facebook says top content on its platform comes from reputable sources like UNICEF, ABC News & the CDC.
But our #CitizenBrowser data shows that sensational partisan content from Daily Wire & The Western Journal are top performers. @corintxt reports:
themarkup.org/citizen-browse…
Here’s the difference: Facebook measures “reach” - which is how many unique viewers saw each domain.
But we measured an equally, if not more important metric, “impressions” – which is how many times a piece of content is bombarded at users. /2
Oct 21, 2021 • 4 tweets • 5 min read
Nonprofits are exempt from many state privacy laws. But should they be?
We scanned 23,000+ nonprofit sites and found they were heavily tracking visitors. Planned Parenthood was even monitoring visitor keystrokes.
For example: All the phrases we tested containing the word “Muslim” were blocked, even innocuous ones like Muslim fashion.
Apr 8, 2021 • 6 tweets • 4 min read
🧵: We found a secret blocklist on Google Ads that hides YouTube hate videos. But @leonyin and @asankin found it was full of holes.
Blocked: heil hitler
Not blocked: heilhitler
Blocked: white nationalist
Not blocked: white nationalists themarkup.org/google-the-gia…2/ Many hate terms weren’t blocked in YouTube’s ad buying portal at all. We tested 86 well-known hate terms and phrases, and only one-third were blocked. Unblocked terms included:
14 words
Blood and soil
Daily Stormer
Great replacement
Zionist occupation government
Jan 31, 2021 • 10 tweets • 2 min read
There’s been a lot of chatter on my feed about Facebook Oversight Board’s decision to release its ruling to some journalists and academics on embargo.
I thought it would be worth talking about how newsrooms can and should think about embargoes. /1
First, to address the obvious question: yes, embargoes are a PR manipulation tactic.
When you accept embargoed material, you usually cannot do what journalists normally do, which is consult experts about it. /2
Jan 5, 2021 • 13 tweets • 8 min read
Facebook is a newstand. But no one can see which news Facebook is pushing to the top.
So we built an app for that called #CitizenBrowser. Our first finding: the sharp impact of Facebook’s political ad ban reversal in the Georgia Senate elections. themarkup.org/citizen-browse… /1
This is the first report from our #CitizenBrowser project. There will be many more to come. But first, I want to tell you a bit about how we did it because it’s the most ambitious thing we’ve ever done @themarkup – and we do a lot of ambitious projects. /2
Dec 7, 2020 • 6 tweets • 3 min read
I’m excited to announce that we have assembled a fantastic team to help us get Citizen Browser launched! Citizen Browser is our ambitious effort to build a national panel to audit social media algorithms: themarkup.org/citizen-browser /1@corintxt joins the team as Data Reporter - he will be digging through the data to help us find stories. Corin has long worked as a reporter covering technology news. I love this story outing pay-for-play crypto news outlets: breakermag.com/we-asked-crypt… /2
Oct 29, 2020 • 4 tweets • 2 min read
Facebook’s black box algorithm charged the Biden campaign higher ad rates on average than it charged the Trump campaign.
Remember when a Google search used to lead you somewhere?
Now it increasingly just keeps you on Google. In fact, Google results take up 62.6% of the first screen of search results in a sample of 15,000 searches.
themarkup.org/google-the-gia…
It wasn't easy to measure Google search results. @LeonYin wrote two custom scrapers and 68 parsers to identify elements on Google search result pages.
As always, all our data, code and an extensive (like REALLY extensive) methodology here: themarkup.org/google-the-gia…
Jun 18, 2020 • 6 tweets • 3 min read
Amazon bans dangerous listings. But we found nearly 100 listings for banned weapons, drug equipment, and spy gear.
Several were listed as “Amazon’s Choice.” Five were sold by Amazon itself.
Amazing reporting from @AnnieGilbertson and @jonkeeganthemarkup.org/banned-bounty/… /1
These materials are not only dangerous - but deadly. In an interview from prison, Eric Falkowski told us that he bought pill presses on @amazon and used them to make counterfeit prescription opioids. His fake pills killed two people and sickened 20 others. /2
May 28, 2020 • 6 tweets • 3 min read
Looking to rent an apartment? Prepare to be subjected to the unaccountable algorithms that landlords across the nation are using to screen tenants.
themarkup.org/locked-out/202…
They found that these screening companies often use the loosest possible standards for matching names, including so-called “wild-card” searches where the records of anyone whose names shares first three letters similar as yours can be included in your report. /2
Apr 16, 2020 • 4 tweets • 2 min read
Results of our 50-state FOIA for COVID testing algorithms are in - and the differences are stark.
Just one example: If you’re a senior with a fever, you qualify for a test in Utah but not Wisconsin.