🧵 We think that the Van Buren v. United States case before the Supreme Court today is a threat to data journalism. So much so that we filed an amicus brief. This is why:
The case deals with the Computer Fraud and Abuse Act (CFAA) and its definition of "exceeds authorized access" in relation to one’s intentionally accessing a computer system they have authorization to access.
Van Buren was a police officer arrested by the FBI and convicted of computer fraud in Georgia after he used his access to work databases for personal financial gain.
The CFAA is broadly used as an anti-hacking law. The problem is in how it gets interpreted. Depending on how this case goes, other data-driven activities could be considered a crime, including web scraping.
Web scraping is the act of extracting data from websites. Website data, even data you provide to a site, is controlled by the government or company that made the website with rules called affordances.
Extracting or scraping that data for analysis flips the power balance so that the audience decides what can be done with the data.
Computer science researchers use web scraping to monitor advertisements on Facebook (@LauraEdelson2 and NYU Ad Observatory).
Search engines use it to power algorithms that personalize your results.
The Internet Archive uses it to ensure that knowledge and culture don’t get lost when a webpage goes offline.
Data journalists like the team at @TheMarkup use web scraping to investigate Big Tech. It is vital to the work we do.
Creating these new datasets allowed them to directly answer questions they were asking rather than using existing datasets that were created for other purposes.
So when we say #ScrapingIsNotACrime, this is what we mean. Web scraping is a foundational activity that allows data scientists, journalists, and others to hold powerful tech companies accountable.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
1/ The House Judiciary Committee released a report Tuesday urging the breakup of Big Tech. It caps a 16-month investigation. A short thread with some context from our previous reporting, including one of our investigations cited in the report.
2/ In July, the heads of Apple, Google, Amazon, and Facebook testified before the committee together for the first time. Lawmakers grilled the CEOs, alleging the companies have abused their monopoly power. themarkup.org/2020/07/30/con…
3/ A day before the hearing, we published a months-long investigation into Google Search. @leonyin and @adrjeffries found Google gave 41 percent of the first page of search results to the company’s own properties and products—a lot of it at the top. themarkup.org/google-the-gia…
🚨Job alert🚨
The Markup is hiring! Applications are open for an editor, beat reporters, chief of staff and a tech coordinator.
⬇️ More info in the thread
✉️ DMs open if you have questions
🔗 Please share with your networks
The news editor will oversee a team of reporters and freelancers who will persistently monitor and uncover the ways that tech affects people. Apply here: boards.greenhouse.io/themarkup/jobs…
🧵New investigation: Do you know who’s informed when you visit government websites? Sites for abortion providers? Those serving LGBTQ people? We found online tracking is common, even where privacy would seem paramount. themarkup.org/blacklight/202…
2/ We spent 18 months developing Blacklight, a one-of-a-kind instant privacy inspection tool. It’s free for anyone to use: themarkup.org/blacklight
👉 Enter any URL
👉 Hit “scan site”
👉See the results of seven different privacy tests
👉😱
3/ Using Blacklight, we found more than 100 sites serving undocumented immigrants, domestic and sexual abuse survivors, sex workers, and LGBTQ people sent data about their visitors to advertising companies. themarkup.org/blacklight/202…
Kids headed back to school this fall? Check out our short back-to-school tech reading list⬇️
Millions of children are now using the internet for daily schoolwork. Learn more about the laws that are supposed to protect kids online—and what the laws don’t do. themarkup.org/ask-the-markup…
The pandemic didn’t create educational disparities, but it has exacerbated them, Phyllis Jordan at @FutureEdGU said. Students might not have equipment to log in from home, or working parents may be unable to monitor attendance. themarkup.org/coronavirus/20…
📰 What we’re reading 📰
A thread of some of our favorite tech stories from this week by our peers:
According to internal discussions, Facebook removed "strikes" so that conservative pages were not penalized for violations of misinformation policies. By @oliviasolon/@NBCNews. nbcnews.com/tech/tech-news…