My statistician/data-science friends all tell me that if you're using a spreadsheet, you're not doing science, you're courting disaster. Real analysis requires Python, or, possibly, #julialang.

arstechnica.com/science/2020/1…

1/
Despite these warnings, plenty of mission-critical work gets done in spreadsheets, and (in support of these warnings), it can go horribly, horribly wrong.

It's not just the UK losing 16,000 covid cases:

popularmechanics.com/technology/a34…

2/
It's years of destructive, crushing austerity - costing real human lives; trashing cities, regions, whole countries - due to spreadsheet formula errors:

theconversation.com/the-reinhart-r…

3/
But still, we keep using spreadsheets to do real work. I did it YESTERDAY. And made a stupid mistake.

The abstinence-only approach to spreadsheets has been a failure. Clearly, we need harm-reduction.

4/
That's where "Data Organization in Spreadsheets" - @kara_woo and @kwbroman's 2017 paper in @AmstatNews comes in. It lays out a crisp set of best practices for avoiding common errors, upping your CVS catastrophe game to really powerful mistakes!

tandfonline.com/doi/full/10.10…

5/
Here's how to spreadsheet:

* Be consistent: Don't use "Male," "male" and "m" as labels. Pick one

* Don't let trailing spaces creep in

* Use a consistent code for missing values (not a blank space and ESPECIALLY not a number like -99999)

6/
* Have a column for explanations about missing data (don't fill empty cells with explanations for their emptiness)

* Use consistent variable names and subject identifiers; treat as case-sensitive. No spaces!

* Lay out all your data consistently, in every file

7/
* Have a consistent (case-sensitive, no-spaces) filename convention. Do not tempt fate by calling a file "final" lest you have to pay penance with files named "final_ver2"

* Use YYYY-MM-DD for dates. No exceptions!

* Guard against trailing spaces in data!

8/
* Don't use special characters apart from _ and - in variables (avoid $, @, %, #, &, *, (, ), !, /, and other chars that have special meanings in some programming languages

* Format cells as "Text" to keep Excel from turning things like gene-names ("Oct-4") into dates

9/
* Consider giving year, month and date their own columns to prevent Excel from munging them (or write as an integer: 20201014)

* Only put one piece of data in each cell; use column labels to indicate units (eg "45" not "45g")

* Only one row of variable names per sheet

10/
* Maintain a separate "Data dictionary" file that defines every variable

* Datasets should not contain calculations; minimize how much typing you do in your dataset files lest you contaminate them inadvertently (calculations go in separate files)

11/
* Font colors and highlights are not data - put data in cells, not formatting (this gets lost in transitions)

* Backup multiple versions of your files, onsite and offsite

* Develop data validation tactics and regularly validate your data

12/
* Use CSV, not xlsx, as your canonical file-format - good old hard-to-corrupt text, flensed of all the fooforaw that Microsoft likes to insert at random intervals

Lurking behind every one of these tips is a postmortem on a data-tragedy. Ignore them at your peril.

eof/

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Cory Doctorow #BLM

Cory Doctorow #BLM Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @doctorow

15 Oct
Today's Twitter threads (a Twitter thread).

Inside: Dystopia as clickbait; Trail of Mars; Bride of Frankenstein and the Monster; The Passenger Pigeon Manifesto; Bricked Ferrari; The Dennis Ball Show; and more!

Archived at: pluralistic.net/2020/10/15/exp…

#Pluralistic

1/ Image
Tonight's Attack Surface Lecture: ​​Intersectionality: Race, Surveillance, and Tech and Its History with Malkia Cyril and Meredith Whittaker app.gopassage.com/events/cory-do…

Full schedule: read.macmillan.com/torforge/cory-…

2/ Image
Dystopia as clickbait: Chris Brown on uncozy apocalypses.



2/ Image
Read 18 tweets
15 Oct
A great hero of the copyright wars is @realdjbc, AKA Bob Cronin, creator of the amazing groundbreaking #Beastles mashups, a virtuosic combination of the Beastie Boys and The Beatles:

djbc.net/thebeastles/co…

1/ Image
Cronin's new project is VERY different. He's hosting a Youtube channel starring a dad-joke-cracking tennis-ball puppet called The Dennis Ball Show.

2/
We recorded an episode last month that was nominally about my new book but swiftly became a discussion of the Haunted Mansion.



3/
Read 5 tweets
15 Oct
DRM is a system for prohibiting legal conduct that manufacturers and their shareholders don't like.

Laws like the US DMCA 1201 (and its equivalents all over the world) ban tampering with DRM, even if no copyright infringement takes place.

1/ Image
That means that manufacturers can design products so that doing things that displease them requires bypassing DRM, and thus committing a felony. It amounts to "felony contempt of business model."

2/
The expansive language of DRM law makes it a crime to break DRM, to tell people how to break DRM, to point out defects in DRM (including defects that make products unsafe to use), or to traffick in DRM-breaking tools.

3/
Read 21 tweets
14 Oct
In 2014, I gave a keynote at Museums and the Web on the suicide-mission of cultural institutions that had decided to sacrifice access - making their collections as broadly available as possible - for revenues (selling licenses to rich people).

mwf2014.museumsandtheweb.com/paper/glam-and…

1/ Image
I argued that rich people didn't want museums, they wanted to own the things the museums had in their collections; so if museums eschewed universal access to get crumbs from plutes, they'd end up with rich people slavering to dismantle them and no public to help them resist.

2/
Now, a group of professionals and institutions from the galleries, libraries, archives and museums (#GLAM) sector have published the "Passenger Pigeon Manifesto," in which they eloquently make the same point.

ppmanifesto.hcommons.org

3/
Read 11 tweets
14 Oct
I first encountered @jmcdaid through "Uncle Buddy's Funhouse," his 1993 ground-breaking, award-winning hypertext project - one of the first CD ROMs written up in the NY Times. It was such an exciting, original, weird and artistically satisfying piece, especially the music.

1/ Image
Later, John and I became writing colleagues, attending workshops together, and then friends - for decades now. His work remains weird, erudite, accessible, madcap and brilliant.

2/
He's just released a new album of filk/folk music: "Trail Of Mars," recorded during the plague months with an all-star set of session musicians whom John was able to contract with thanks to the unprecedented drought in musical work.

johnmcdaid.bandcamp.com

3/
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!