Vin Vashishta Profile picture
May 8 13 tweets 7 min read
What is the difference between a predictive problem and a causal inference problem? This is an essential differentiation for data scientists, and even very smart people botch the answer. 2 very smart authors did just that. Let me explain.
1/13
#DataScience #MachineLearning
They proposed 2 questions:

1. Should I hire more college graduates?
2. Should I subsidize college degrees for my employees?
2/13
#DataScience #MachineLearning
In the article, they said question 1 is a prediction problem and question 2 is a causal inference problem. They are both causal inference problems because they ask the data scientist to prescribe a policy.
3/13
#DataScience #MachineLearning
By answering yes or no to either question, I tell the decision maker that I have evidentiary support for the decision.
4/13
#DataScience #MachineLearning
Rephrase question 1 to “Are college graduates better (as defined by some set of metrics) employees?” and the question is now a predictive one. As a data scientist, I can use correlation to support my assertions. What’s the difference?
5/13
#DataScience #MachineLearning
In the rephrased question, I must support a relationship between variables. In the original questions, I must support a relationship between a decision and an outcome. Those have two distinctly different reliability requirements.
6/13
#DataScience #MachineLearning
Why? The relationship between variables implies my model accurately describes the data, and I have a confidence level that the description will hold in the future.
7/13
#DataScience #MachineLearning
The relationship between decision and outcome implies that my model is as accurate or more accurate than a person’s heuristics.
8/13
#DataScience #MachineLearning
The relationship between variables is me proposing a set of KPIs or metrics and claiming, “If you make your decision based on these, your decision quality will be high.”
9/13
#DataScience #MachineLearning
The relationship between decision and outcome is me claiming, “This decision is high quality.” To support that assertion, I must be confident that the model makes a better decision than a person using the best available KPIs.
10/13
#DataScience #MachineLearning
That’s rarely true with simple correlation unless the number of variables involved in the decision is so high that people given that data make worse decisions than people who are not.
11/13
#DataScience #MachineLearning
Otherwise, I need to establish causal relationships to support the model's understanding of the system being better than a decision maker's.
12/13
#DataScience #MachineLearning
My substack newsletter is filled with deep dives into advanced topics. Take a look and subscribe here: vinvashishta.substack.com
13/13
#DataScience #MachineLearning

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Vin Vashishta

Vin Vashishta Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @v_vashishta

May 10
#LinkedIn is the Botox of #socialmedia. It's all fake. My timeline's filled with corporate propaganda and reposts. I feel like I'm in a library and someone will tell me to keep it down if I post what I'm really thinking.
1/7
"Proud to be joining Google!" No, you're proud of that paycheck and you want your old company who wouldn't spring for a raise to feel it.

"No promotion, huh? Not good enough for a raise? Funny, GOOGLE thought I was!!!!! HAHAHAHA!!!!" Full send.
2/7
"It was a tough decision to leave my old company." No, you loved every second of writing your resignation and sending it to your idiot boss. The video I want to see is of you writing up that resignation email with a long slow-motion shot of your face when you hit send.
3/7
Read 7 tweets
May 10
MIT Sloan - “The survey also found that AI yields strategic benefits, but they mostly accrued to companies that use AI to explore new ways of creating value rather than cutting costs.” Let me explain why that's critical.
1/11
#DataScience #ArtificialIntelligence #Strategy
Translation: Our field is transitioning from cost savings to revenue generation. The business is looking for Data Scientists to lead the discovery of opportunities and deployment of new products.
2/11
#DataScience #ArtificialIntelligence #Strategy
“Those that used AI primarily to create new value were 2.5 times more likely to feel that AI is helping their company competitively compared with those that said they are using AI primarily to improve existing processes”
3/11
#DataScience #ArtificialIntelligence #Strategy
Read 11 tweets
May 9
If a Data Scientist has a Github with 3 Python projects, you don't need to give them a technical interview. If they've been working as a Data Scientist for 3+ years, they don't need a take-home project.
1/9
#DataScience #MachineLearning #Hiring
Do they have a blog with 1 or 2 years worth of posts on Machine Learning Engineering? Published research? A YouTube channel with tons of Data Science educational content? Significant open source contributions?
2/9
#DataScience #MachineLearning #Hiring
I get a better sense of a candidate's capabilities from those sources. In my experience, the generic methods have lower predictive value for employee performance.
3/9
#DataScience #MachineLearning #Hiring
Read 9 tweets
May 7
Data Scientist Job Openings On LinkedIn:
March - 138K
Now - 134K

Hiring is slowing for mid to junior-level roles. That's the first sign of tightening budgets and more changes will come quickly. Let me explain what comes next.
1/14
#DataScience #MachineLearning #Leadership
Higher costs are compressing margins for businesses across industries. Revenue growth has stagnated. Both factors mean businesses must find ways to cut costs or they are in danger.
2/14
#DataScience #MachineLearning #Leadership
Missing on revenue projections or lowering guidance for the rest of the year is a death sentence for share prices. The C Suite is measured by share price so they're moving quickly to cut costs.
3/14
#DataScience #MachineLearning #Leadership
Read 14 tweets
May 6
New hiring rules. Any test given to a candidate has to be taken by the existing team, and 80% of them have to pass it.

1/11
#DataScience #MachineLearning #Hiring
If the job description asks for a minimum of 5 years of experience, it needs to include an explanation of why 4 years isn’t enough.
2/11
#DataScience #MachineLearning #Hiring
After 2 rounds of interviews, the company needs to explain what additional information they expect to get from this round and why they didn’t get it during the last round.
3/11
#DataScience #MachineLearning #Hiring
Read 11 tweets
May 5
Data Scientists looking for a new role and Recruiters looking for candidates speak 2 different languages. Miscommunication is the most common reason candidates disengage, drop out of the interview process, and reject offers. Why?
1/12
#DataScience #Recruiting #Hiring
Candidates eventually find out the role isn’t what they expected and there's not way to keep them involved in the process after that.
2/12
#DataScience #Recruiting #Hiring
Explaining a role to a Machine Learning Engineer vs. Data Engineer vs. Applied Researcher vs. Generalist Data Scientist vs. Data Analyst are all different conversations.
3/12
#DataScience #Recruiting #Hiring
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(