One major caveat—we are not committed to maintaining this script should the federal data pages undergo material changes. This is simply a set of instructions for interested data users (and an example of what's possible with federal data).
For inexperienced data users, this process is no more than 2 clicks. For users familiar with Python and pandas, feel free to take this code as a starting point for further exploration.
Once you've downloaded your CSV, you can create visualizations in nearly identical fashion to our prior daily work
Please read the notes section carefully to get a handle on these data sources before you viz. We hope this makes federal data more approachable for a wider audience
We have also updated our list of federal data 101 posts to point to new and relocated federal datasets.
Lastly, big thank you to @zachlipton and @anthropoco for streamlining these federal data into a single CSV. They have been tireless data wranglers behind the scenes and we deeply appreciate their work.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
As we’ve seen with many COVID-19 metrics, there’s often a veneer of uniformity obscuring quiet data discrepancies. In this piece, we look at probable COVID-19 case definitions and the decisions states made about how and when to adopt federal guidance. covidtracking.com/analysis-updat…
When states started updating their probable case definitions, as per guidance from the feds, the share of probable cases in their total case counts grew. Our research set out to explore the extent to which this growth was fueled by antigen tests alone.
What we found was that a state’s testing strategy played an important role in shaping probable case counts. States with strong antigen testing programs no longer had to rely on contact tracing and symptom tracking, both difficult to perform at scale.
During the worst moments of the pandemic, the US public health data infrastructure could not keep up with COVID-19 death counts. Our new analysis looks at the effect of reporting lags on death data reported by states and the CDC: covidtracking.com/analysis-updat…
To understand the effect of reporting lags on state COVID-19 death counts, we compared data compiled by the CDC from state dashboards to retrospective data published by some states that charts deaths on the day they actually occurred.
Real-time death counts were affected by both slowness in reporting, which made peaks in deaths appear like they happened later than they did, and by reporting capacity limits, which made deaths look like they peaked lower than they did.
State by state, federal COVID-19 testing data is getting better. Over the last few months, we have observed federal efforts to address many of the dataset’s most pressing problems. covidtracking.com/analysis-updat…
Throughout the pandemic, the federal government has struggled to count COVID-19 tests. Its dataset has long shown signs of infrastructural problems: covidtracking.com/analysis-updat…
Since we last looked at it in February 2021, the federal government has changed its data sourcing for six jurisdictions and corrected submission problems in three, often improving the data greatly.
For most of the project, we’ve been laser-focused on gathering data. We recently began a process to understand *how* our data has been used. Here’s what we found. covidtracking.com/analysis-updat…
Our largely volunteer-run effort became a definitive and trustworthy source for US COVID-19 data and analysis: for media, scientists and medical professionals, academics, and the government.
We became a major data source used by U.S. and international outlets across the political spectrum with more than 7,700 press mentions. We also responded to thousands of media and citizen requests for information on COVID-19 data.
With little guidance from the feds, states have had to make their own decisions about compiling and presenting COVID-19 data, leading to sweeping inconsistencies. Here are some of the key reporting problems we found. covidtracking.com/analysis-updat…
First up: data definitions. More than one year in, many facets of COVID-19 data are still not standardized. States have defined metrics inconsistently, making summaries and comparisons difficult if not impossible.
Next, we look at how states make their data available. Some states report metrics in percentages, without raw numbers. Some provide granular data while others report only summaries. And some states don’t report some metrics at all.
Today we’re releasing research on hospitalization data definitions. Hospitalizations are one of the most critical metrics for understanding the state of the pandemic. covidtracking.com/analysis-updat…
We found that states differ in how they track patients with confirmed COVID-19, suspected COVID-19, or both. We also observed some states lumping metrics, making comparisons difficult.