The Markup Profile picture
30 Nov, 17 tweets, 6 min read
🧵 We think that the Van Buren v. United States case before the Supreme Court today is a threat to data journalism. So much so that we filed an amicus brief. This is why:
The case deals with the Computer Fraud and Abuse Act (CFAA) and its definition of "exceeds authorized access" in relation to one’s intentionally accessing a computer system they have authorization to access.
Van Buren was a police officer arrested by the FBI and convicted of computer fraud in Georgia after he used his access to work databases for personal financial gain.
The CFAA is broadly used as an anti-hacking law. The problem is in how it gets interpreted. Depending on how this case goes, other data-driven activities could be considered a crime, including web scraping.
Web scraping is the act of extracting data from websites. Website data, even data you provide to a site, is controlled by the government or company that made the website with rules called affordances.
Extracting or scraping that data for analysis flips the power balance so that the audience decides what can be done with the data.
Computer science researchers use web scraping to monitor advertisements on Facebook (@LauraEdelson2 and NYU Ad Observatory).
Search engines use it to power algorithms that personalize your results.
The Internet Archive uses it to ensure that knowledge and culture don’t get lost when a webpage goes offline.
Data journalists like the team at @TheMarkup use web scraping to investigate Big Tech. It is vital to the work we do.
Most recently, The Markup’s @leonyin and @adrjeffries scraped Google trends and Google search when investigating Google’s top search results for bias. themarkup.org/google-the-gia…
Creating these new datasets allowed them to directly answer questions they were asking rather than using existing datasets that were created for other purposes.
Other great examples of journalistic scraping in the public interest include @TheAtlantic & @alexismadrigal’s essential COVID Tracking Project. covidtracking.com
. @JuliaAngwin and @suryamattu’s investigation for @ProPublica into the Amazon algorithm giving itself a boost. propublica.org/article/amazon…
. @dmehro and @dellcam’s investigation for @gizmodo into Ring’s hidden data map. gizmodo.com/ring-s-hidden-…
. @NYTimes’ results tracking during the 2020 election. alex.github.io/nyt-2020-elect…
So when we say #ScrapingIsNotACrime, this is what we mean. Web scraping is a foundational activity that allows data scientists, journalists, and others to hold powerful tech companies accountable.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with The Markup

The Markup Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @themarkup

8 Oct
1/ The House Judiciary Committee released a report Tuesday urging the breakup of Big Tech. It caps a 16-month investigation. A short thread with some context from our previous reporting, including one of our investigations cited in the report.
2/ In July, the heads of Apple, Google, Amazon, and Facebook testified before the committee together for the first time. Lawmakers grilled the CEOs, alleging the companies have abused their monopoly power. themarkup.org/2020/07/30/con…
3/ A day before the hearing, we published a months-long investigation into Google Search. @leonyin and @adrjeffries found Google gave 41 percent of the first page of search results to the company’s own properties and products—a lot of it at the top. themarkup.org/google-the-gia…
Read 7 tweets
28 Sep
🚨Job alert🚨
The Markup is hiring! Applications are open for an editor, beat reporters, chief of staff and a tech coordinator.
⬇️ More info in the thread
✉️ DMs open if you have questions
🔗 Please share with your networks
The news editor will oversee a team of reporters and freelancers who will persistently monitor and uncover the ways that tech affects people. Apply here: boards.greenhouse.io/themarkup/jobs…
The labor reporter will cover the effects of technology on work and the labor force. Apply here: boards.greenhouse.io/themarkup/jobs…
Read 7 tweets
25 Sep
Over 230,000 scan requests were made to our Blacklight tool this week! 🤯 Many readers also shared screenshots of their own tracker-free sites.

A hall of fame thread below (and feel free to add yours if we missed it):
Read 18 tweets
22 Sep
🧵New investigation: Do you know who’s informed when you visit government websites? Sites for abortion providers? Those serving LGBTQ people? We found online tracking is common, even where privacy would seem paramount. themarkup.org/blacklight/202…
2/ We spent 18 months developing Blacklight, a one-of-a-kind instant privacy inspection tool. It’s free for anyone to use: themarkup.org/blacklight
👉 Enter any URL
👉 Hit “scan site”
👉See the results of seven different privacy tests
👉😱
3/ Using Blacklight, we found more than 100 sites serving undocumented immigrants, domestic and sexual abuse survivors, sex workers, and LGBTQ people sent data about their visitors to advertising companies. themarkup.org/blacklight/202…
Read 12 tweets
12 Sep
Kids headed back to school this fall? Check out our short back-to-school tech reading list⬇️
Millions of children are now using the internet for daily schoolwork. Learn more about the laws that are supposed to protect kids online—and what the laws don’t do. themarkup.org/ask-the-markup…
The pandemic didn’t create educational disparities, but it has exacerbated them, Phyllis Jordan at @FutureEdGU said. Students might not have equipment to log in from home, or working parents may be unable to monitor attendance. themarkup.org/coronavirus/20…
Read 4 tweets
15 Aug
📰 What we’re reading 📰
A thread of some of our favorite tech stories from this week by our peers:
According to internal discussions, Facebook removed "strikes" so that conservative pages were not penalized for violations of misinformation policies. By @oliviasolon/@NBCNews.
nbcnews.com/tech/tech-news…
Google’s secrecy around sales houses is the gift that keeps on giving—to global propaganda rings. By @nandoodles and @catthekin.
branded.substack.com/p/the-secret-w…
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!