Profile picture
Michael Frasco @MichaelFrasco
, 33 tweets, 2 min read Read on Twitter
Very relevant. Thanks for sharing. If I end up going (trying to figure out my schedule), I'll let you know what I learned.
Example: google figuring out your genome
When google did this, they solved a very specific problem that made a lot of, potentially, sub optimal framing decisions
For example, you frame a problem on a particular way because it's familiar or you use a particular piece of software because you used it in the past
Data science is more computational. There are also some human decisions that led to the computation.
Theory: your estimates before you look at the data is different than your estimates after you look at the data
And this can be applied to every single step of the data science life cycle from developing the question to communicating the result
And how it relates to reproducibility of experiments
This is complicated because humans do crazy stuff to data. This might be the thesis of the talk
Explains the history of why we hate pie charts
And how can we apply that history to the entire data science process
For example 538 article where people estimated red card rate in soccer by race
Jeff used Coursera to get his data!
33499 repos on getting and cleaning data course
Most use dplyr. Overtime dplyr has grown a lot. Data table is third
They also did randomized experiments
Makes fun of himself for not using ggplot2
They randomly asked people to only use ggplot2 or base R in their experiment
Then they used peer grading to evaluate the plots
Results: ggplot2 is a bit better but it's basically the same. Small effect sizes
Let's apply this same technique to how they communicate results
When you do an inferential study but try to explain why the variables are related, people think it's a causal study
Another example, can people detect statistically significant scatter plots. Answer: no, but they get better with practice
Now onto modeling: how often do people feel pressured to find a result in a data set
Experiment: tell them that there is or isn't believed to be an effect, but give them a data set with or without a relationship
There were some weird results in this experiment
When you prime people there is a relationship, they are more likely to fit a model on that relationship
Lastly, can you prevent people from p-hacking?
Answer: probably not
Question: does this apply to phd students?
Answer: we don't know
There's a ROC curve for science (in terms of false positives of experiments). I'm perfectly comfortable with where we land. - Jeff
This was a fascinating take that I had never considered. Main argument: false positives are building blocks, not every experiment should be expected to be a phase III 10,000 human RCT
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Michael Frasco
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!