Over the past few weeks, we’ve noticed that newsrooms of all sizes—and even some government agencies—have fallen into some of the data potholes that we’ve become familiar with in our year of wrangling public COVID-19 data.
So today, we’re offering a brief cheat sheet on avoiding some of the most common errors we’ve seen. covidtracking.com/analysis-updat…
Tip 1: If you see dramatic movement in the data, look for contextual clues before interpreting it as a change in the pandemic. Day-of-week effects in data arranged by date of report produce predictable reporting swings over the course of each week.
Tip 2: Data backlogs—and the “data dumps” that occur when those backlogs are resolved—can mimic major declines and then jumps, especially in cases, tests, and deaths. Look for explanations on state dashboards and call public health officials.
Tip 3: Holiday and weather-related reporting issues happen when national or natural events occur across many states at once, and can mimic shifts in the pandemic. Look for holidays or major disruptions that might have artificially depressed—and then inflated—the data.
Tip 4: Watch out for definitional mismatches and alternate dating schemes. Be aware that different jurisdictions chose different ways of defining and reporting their metrics.
Tip 5: Get familiar with caveats. The most recent dates in epidemiological datasets are always incomplete—because, for example, the data points for people who died today won’t finish being reported for many days, weeks, or even months in the future.
Tip 6: Be cautious about what the data can say. If you’re trying to extract insights from the data itself, it can be very easy—especially within a headline—to make causal claims when only correlative evidence is available.
Over the past 10 months, we've tried to determine how COVID-19 affected some of the people most vulnerable to the virus: residents of long-term-care facilities. covidtracking.com/analysis-updat…
Based on official state figures compiled by our team, as of March 4, 2021, at least 174,474 people had died of COVID-19 in long-term-care facilities. This represents 34% of the total deaths due to COVID-19 in the US. covidtracking.com/ltc-topline-es…
We estimate that as of March 2021, about 8 percent of people who live in US long-term-care facilities have died of COVID-19: Nearly one in 12. covidtracking.com/analysis-updat…
On Friday we published our latest guide, this one on federal race and ethnicity data. We explain where you can find it, and what you need to know about its limitations.
Our Federal Data 101 about race and ethnicity data is published.
Publicly available federal race and ethnicity COVID-19 data is currently usable and improving, although it shares many of the problems we’ve found in state-reported data.
Federal race and ethnicity COVID-19 data is not comprehensive enough to represent people’s experience of the pandemic in the United States. Most data is only available nationally, not by state.
The federal data can be better, by collecting and publishing race and ethnicity data more consistently and comprehensively, presenting the data in clear, accessible ways, and being transparent about data sources and contexts.
For many weeks now, the number of cases and hospitalizations has been going down across the country. Unfortunately, that trend has now reversed in the state of Michigan. Cases * and * hospitalizations are both on the rise there.
There had been some hopes that if we did see cases rise somewhere, hospitalizations would not follow because many vulnerable people have been vaccinated. But Michigan hospitalizations have increased 45% from their February low.
Two important pieces of context: Statewide, just 28% of Black residents 65+ are known to have received a first dose of vaccine. Though that data is incomplete, CDC numbers show that 66% of the U.S. population aged 65+ has received at least one dose of vaccine.
One major caveat—we are not committed to maintaining this script should the federal data pages undergo material changes. This is simply a set of instructions for interested data users (and an example of what's possible with federal data).
For inexperienced data users, this process is no more than 2 clicks. For users familiar with Python and pandas, feel free to take this code as a starting point for further exploration.