📓 Am rereading my class notes from grad school, as well as from mentoring students for @Coursera and @EdX courses on statistics - and thought I'd share the most common mistakes when doing data analysis.

✨Have counted 8 of 'em, with examples - please feel free to add your own!
MISTAKE #1:
Garbage in, garbage out.

🤦‍♀️Failing to investigate your input for data entry or recording errors.

📊Failing to graph data and calculate basic descriptive statistics (mean, median, mode, outliers, etc.) before analyzing it in-depth.
👉EXAMPLE #1:
It's easy to make bad decisions on shoddy input! Here you see an outlier's impact on descriptive statistics.

Also: always consider the uncertainty in your measuring instruments. Just because you've gotten an *accurate* value doesn't mean it's *actually* correct.
MISTAKE #2:
Know your tools, and when (or when not!) to use them. 🔧

📉Using the wrong statistical procedure when analyzing your data.

🤔Included in this: failing to check that necessary assumptions are met.

Consider these athletes' pulse rates from before and after a race:
👉EXAMPLE #2
Paired t-test vs. Two-sample t-test

This is a paired data design, so it makes sense to analyze with the paired t-test; but if you used a different procedure (ex: the two-sample t-test), you would not conclude that mean pulse rate is different post-exercise. 🏃‍♀️
MISTAKE #3
Implementing experiments that are poorly designed. 🔬

📊Study doesn't have enough power to call meaningful differences statistically significant.

👎Includes concluding that the null hypothesis is true - should be "not enough evidence to say that the null is false".
👉EXAMPLE #3
Testing to see if there is any impact of gender on whether a person recycles.

The hypothesis test for the experiment shown below says there is no difference in percentages, which might look surprising. Make sure to have a similar number of test cases for each group!
MISTAKE #4
Failing to report a confidence interval as well as the p-value. You need both!

👉The p-value tells you if something is statistically significant.

👬The confidence interval tells you what the population value might be.

For example, if we look at gender and phone use:
👉EXAMPLE #4
After you remove the outliers and run a two-sample t-test for phone use between the two genders, you can see there is a significant difference for what is measured between M/F.

We are *95% confident* that the *difference in averages* is more than 35 minutes.
MISTAKE #5
🎣 Fishing for significant results.

In plain terms: performing several different hypothesis tests on the same data set, but only reporting on the ones that worked.

Note, too: if alpha=0.05 and you perform 20 tests on the same data, you'll likely make 1 Type I error.
MISTAKE #6
⚖Spurious correlations!

And heck, this one is so well-known it even has a website. 😁
tylervigen.com/spurious-corre…

In plain terms: overstating the results of an observational study. Saying that one variable "caused" another instead of is "associated" or "correlated". 📈
MISTAKE #7
👨‍👩‍👦‍👦Sample bias, in all of its many shapes and forms!

More specifically: using a non-random or unrepresentative sample in your experiment. This includes extending the results of that unrepresentative sample to an entire population.

A rather depressing example:
MISTAKE #8
Use known best practices when designing and implemting your experiments.

🎲Randomization = assign by chance, not by choice

⚖Blinding = mask information from participants+tester

👩‍🔬Controlling = minimize the effects of variables other than the independent variable
💕Hopefully this was useful! ☺

If it was, will try to do more of these tweet-storms for machine learning and machine-learning-adjacent topics (linear algebra, statistics, etc.)..

Always remember to keep your course notes! And huge shout-outs to Dr. King, @hodgesse, et al. 📊
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to 👩‍💻 @DynamicWebPaige 🔜 #MSIgnite 🔥
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!