I’m not saying you need to be an expert in advanced calculus to do machine learning…
BUT, there is a big difference between someone that does vs someone that does NOT have a good foundation in stats when it comes to getting & explaining business results.
My thought process back in the day was to obtain a great foundation in stats and machine learning at the same time.
So here’s what helped me. I read a ton of books.
Here are the 3 books that helped me learn data science the most...
1. R for Data Science (Wickham & Grolemund) r4ds.had.co.nz
A new paper shows how you can predict real purchase intent without asking people.
~90% of human test–retest reliability.
Here's what's inside the 28 page paper:
1. Problem with direct Likert from LLMs:
When you ask LLMs to output 1–5 ratings directly, the distributions are too narrow/skewed and don’t look like human survey data, limiting usefulness for concept testing.
Have the LLM write a short free-text purchase-intent statement, then map that text onto a 5-point Likert score using embedding cosine similarity to predefined anchor sentences (i.e., semantic matching instead of raw numbers).
Understanding P-Values is essential for improving regression models.
In 2 minutes, I'll crush your confusion.
1. The p-value:
A p-value in statistics is a measure used to assess the strength of the evidence against a null hypothesis.
2. Null Hypothesis (H₀):
The null hypothesis is the default position that there is no relationship between two measured phenomena or no association among groups. For example, under H₀, the regressor does not affect the outcome.
Understanding probability is essential in data science.
In 4 minutes, I'll demolish your confusion.
Let's go!
1. Statistical Distributions:
There are 100s of distributions to choose from when modeling data. Choices seem endless. Use this as a guide to simplify the choice.
2. Discrete Distributions:
Discrete distributions are used when the data can take on only specific, distinct values. These values are often integers, like the number of sales calls made or the number of customers that converted.