Tyler Dukes Profile picture
Investigative reporter, @newsobserver. Data & public records. @niemanfdn fellow '17. @Duke_DeWitt adjunct. Powered by gas station coffee & Eastern NC barbecue.

Jul 28, 2021, 15 tweets

CDC now says residents of counties with "high" or "substantial" community spread should mask up indoors – even if they're vaccinated.

For NC, that's all but 21 counties.

CDC publishes a US map showing "level of community transmission," but no way to download the data (that I can see at least).

covid.cdc.gov/covid-data-tra…

But what if we REALLY WANT the data? Follow along for a few tips on prying it loose.

With some exceptions, interactives like these pipe in structured data from another source.

Maybe it's a CSV (comma-separated value).

Or a JSON (JavaScript Object Notation).

However it's structured, we can work with it – if we know where to find it.

Luckily, most modern browsers (I'm using Chrome) can help us track it down.

Right click on the page and click "Inspect" to pull up a panel that allows you to look under the hood.

What we're looking for is the "Network" tab, which doesn't look very interesting right now.

If we refresh, the Network tab shows us all kinds of external files pulled in to render the page, from images to basic styling.

But let's narrow the field.

Near the top of the Network tab, you'll see a row of options allowing you to filter. What we want is the "Fetch/XHR" filter.

That's going to be all manner of structured data loaded into the page, some of it not terribly interesting.

But if we refresh again...

We can see for example, that the page is loading a JSON called "colors."

Click on a row to see a preview, and – more importantly – the URL of the data itself.

Not everything looks like a "static" file per se. Sometimes you'll find a "call" to an Application Programming Interface (or API) that's akin to a request for data by passing certain choices (or parameters) to a specific URL.

Kind of like ordering from a menu.

You can get a clue about what the CDC page is "ordering up" by looking at the parameter, in this case what comes after "id=".

I'm particularly interested in the menu request for "integrated_county_latest_external_data". Sounds delicious.

If we click through and take a closer look, we can preview and expand the items to see there is a LOT of data in here for every US county.

And it just so happens to contain exactly the variable we want: "community_transmission_level"

In the "Headers" tab, the Request URL leads us directly to the data source: covid.cdc.gov/covid-data-tra…

It might look like gibberish, but it's actually sweet, sweet structured data! And if you have a browser extension like JSONView, reading is a little easier chrome.google.com/webstore/detai…

From there, you can hit CMD+S or Ctrl+S to save your JSON file locally. There are a few free converters out there to turn it into CSV, which you can open in a spreadsheet program (or Google Sheets).

I tried this one from @konklone, and it worked great! konklone.io/json/

Or if you're an R fan, you can pipe the JSON directly from the URL using the jsonlite and tidyverse packages with a few lines of hastily written code that might look like this: gist.github.com/mtdukes/54d198…

Either way, now you've got a handy dataset you can sort, filter and do some quick analysis on.

And you can REPEAT this every time the CDC updates.

Better yet, you can listen to @simonw tell you how to automate that repetition. simonwillison.net/2021/Mar/5/git…

This technique is worth a shot if you're stymied by a site that won't fork over the raw data.

If you're lucky, you can pry loose in a few minutes what it might take days/weeks to request.

If you're unlucky, it's probably because it's Tableau.

But that's a separate thread.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling