Levi Profile picture
Apr 10 9 tweets 3 min read Twitter logo Read on Twitter
Don't fall victim to the Texas Sharpshooter Fallacy!

Are you aiming for data-driven insights or just shooting in the dark?

Learn how to avoid cherry-picking data and arrive at more accurate conclusions.

1/9 The picture of the Texas sh...
What is the Texas Sharpshooter Fallacy?

It comes from the idea of a sharpshooter who fires a gun at a barn and then paints a target around the bullet holes, making it look like they hit the bullseye.

How is it present in data science?

2/9
Consider this:

A medical team want to prove that they "hit the bullseye" with a medical treatment.

They only look at patients who have responded well to the treatment, without considering those who did not respond well.

This leads to biased conclusion.

3/9
First the team shot, then wanted to prove to be correct.

First they had the result in mind and then found data to prove it.

This is cherry-picking data that supports a particular hypothesis.

4/9
You can easily make this mistake if you look for patterns in data without first formulating a hypothesis.

This fallacy is similar to confirmation bias:

One seeks out information that confirms the pre-existing beliefs.

What else can cause this trap?

5/9
- Lack of statistical significance:

Even if a pattern is present in the data, it may not be statistically significant.

- Small sample sizes:

This can increase the risk of cherry-picking data.

How to avoid the fallacy then?

6/9
The fallacy involves cherry-picking data that supports a particular hypothesis, without considering all data.

So do this:

- Formulate a clear hypothesis

- Consider all of the data

- Use appropriate statistical methods

- Approach data analysis with a skeptical mindset

7/9
That's it for today.

I hope you've found this thread helpful.

Like/Retweet the first tweet below for support and follow @levikul09 for more Data Science threads.

Thanks 😉

8/9
You should also join our newsletter, DSBoost.

We share:

• Interviews

• Podcast notes

• Learning resources

• Interesting collections of content

dsboost.substack.com

9/9

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Levi

Levi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @levikul09

Apr 12
Outliers are rare examples that do not fit in with the rest of the data.

But how can an ML model identify them?

Let me introduce one-class classification.

1/7 Image
General classification tries to distinguish between two or more classes with the training set containing data from all classes.

One-class classification on the other hand has only the target class.

The training data contains data only from one class.

2/7 Image
How can only one class be useful?

If we can identify what belongs to the class, we can also identify what doesn't belong to the class.

Consider the example below.

3/7 Image
Read 7 tweets
Apr 11
The 2 building blocks of a Machine Learning model

Parameters and Hyperparameters 101.

🧵
1. Hyperparameters

Any quantity you choose, twist, change and set before the training process is a hyperparameter.

They control the learning process and determine the structure and behavior of the model.

They are external since the model cannot change these values.

1/6
Examples:

- Batch size

- Learning rate

- Train-test split ratio

- Number of trees in random forest

- Number of layers in a neural network

2/6
Read 7 tweets
Apr 5
Are you familiar with the different distributions used in statistics?

I will share the basics now.

From Bernoulli to Power-law, there are many to explore!

1/9
1️⃣ Bernoulli

It describes the outcomes of binary events, where there are only two possible outcomes.

P represents the probability of observing value one.

For example: Customer will purchase or not?

2/9
2️⃣ Binomial

Models the number of successes in a fixed number of independent trials, each with the same probability of success.

For example: Rolling a die.

P represents the probability of a given value.

N represents the number of variables.

3/9
Read 9 tweets
Apr 4
How did Data Science help to win WWII?

A story on Survivorship Bias.

🧵
In WWII the US military examined aircraft that got back from the war.

They concluded that they should add armor to the planes' most-hit areas.

That was the wrong approach and conclusion.

Why?

1/6
Abraham Wald figured out that they are examining the wrong planes.

They should focus on those aircraft that never got back.

Those were the planes with the worst hits.

2/6
Read 7 tweets
Apr 3
How to pick the best training & testing points?

Thread on Cross Validation.

🧵
We cannot use all data for model training, because that would cause overfitting.

We can of course select randomly, but there is a better option:

Cross Validation.

1/5
The steps Cross Validation does:

1. Divides the data into groups.

2. Iterates through the groups.

- Tries group combinations as training data.

- Uses the other group as testing data.

Let's see an example!

2/5
Read 6 tweets
Apr 2
The three Machine Learning approaches

1️⃣ Supervised learning

2️⃣ Unsupervised learning

3️⃣ Reinforcement learning

🧵

1/6
1️⃣ Supervised learning

Steps to perform:

1. Collect the dataset with labeled data
2. Train model based on the dataset
3. When new data comes in, the model will predict

Applications:

- Classify medical images
- Translating between languages
- Detecting objects in images

2/6
2️⃣ Unsupervised learning

- Data is without labels
- We want to learn something interesting about the data

For example:

- We can create clusters
- Detect outliers
- Interesting pattern or sign under a noise.

3/6
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(