First, PRIORS. In Bayesian Statistics, we use probability distributions (like a normal, Cauchy, beta...) to represent uncertainty about the value of parameters.
Instead of choosing ONE number for a param, a distribution describes how likely a range of values are
Bayesian Stats works by taking previously known, PRIOR information (this can be from prior data, domain expertise, regularization...) about the parameter
and combining it with data to make the POSTERIOR (the distribution of parameter values AFTER including data)
But that's all conceptual. Math-wise the way this works is by taking the function (the Probability Density Function for continuous, or Probability Mass Function for discrete) for the prior and multiplying it with the function for the likelihood of the data*
* we're ignoring the marginal here (again this tweet is more for statisticians than understanding)🚨
Sometimes multiplying the likelihood and the prior gets...messy (and luckily IRL we have MCMC algos to help us!)
But when doing things by hand, we like cases where the math is easy.
E.g. it would be REALLY NICE if the prior and posterior had the same distribution 💡
Conjugate priors refer to a prior distribution you can use with a specific likelihood function to make the posterior distribution have the SAME type of distribution as the prior!
For example,
a 💫beta prior💫 + a binomial likelihood = 💫beta💫 posterior
a ✨normal✨ prior + a normal likelihood = ✨normal✨ posterior
Now that a lot of Bayesian Stats are done via sampling algorithms like a Gibbs Sampler (JAGS, BUGS) or Hamiltonian Monte Carlo (Stan, PyMC3), we don't have to worry about conjugate priors as much, because we can estimate the posterior no matter if the prior is a conjugate prior
But conjugate priors still provide a nice way to demonstrate concepts, and when appropriate are a LOT QUICKER than a solution found by an MCMC algo, because they have a "closed-form solution" (meaning we can solve for an exact answer)
FIN
again, critiques in GIF form only
( I HATE POSTING formulas and math on Twitter because inevitably I will have made a typo...🥲 But I did it for you #statsTwitter, I did it for you)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
We want a confidence interval for a two-sample z test! We're interested in the difference between replication rates for projects with and without open data.
This question has answer choices, so let's talk about my FAVORITE WAY to save time on these Qs!
FIRST: take a look at what's different between the answer choices. Here, the differences are:
- the critical z value
- the formula for calculating the standard error
so we don't even need to look at the other parts, they're all the same!
Let's start with the critical z value. The question asks for a 95% confidence interval. Either using our memory, or the z table given to us on the AP test, we can figure out that the *correct* critical z value is: 1.96.
This book was the BIBLE in Grad School. It’s incredibly in depth and dense, but not so much that you can’t get through it. It’s comprehensive, well written, and is my go to reference to understand a ML algo more deeply😍
📗Introduction to Statistical Learning with Applications in R
A gentler cousin of ESL, ISLR (and now ISLP!) is an great intro to ML algos. This book can be appreciated by undergrads, grads, and industry workers alike. The code examples are incredibly useful, and text is clear
United healthcare’s student health insurance was a HUGE stressor during grad school🙃
I had to float large sums of money waiting for reimbursement (which they messed up ~40% of the time at first), spend hours on the phone with them, and had a hard time finding a therapist😤
Healthcare is a right, and shouldn’t be tied to school/employment. It should be accessible, largely free, and it should focus on helping people NOT MAKING PROFITS.
This stuff pisses me off.
And while I’m on the subject, MAKE MENTAL HEALTHCARE MORE ACCESSIBLE.
Therapists deserve to make a living wage, AND clients who can’t afford it should have to spend $100s of dollars a month for therapy/meds.
1. Does the model ACTUALLY answer a question you have?
If you don’t have a question THEN THE BEST INFERENTIAL/PREDICTIVE MODEL IS NO MODEL. Do some EDA first!
Stop doing hypothesis testing if you don’t have a hypothesis to test. ✋
2. Is the model realistic?
We love prior predictive checks bc they allow u to generate data based on ur priors😻 but also ask whether ur leaving out important effects, whether the variable u use actually measures what you want, or if linear relationships are realistic…
SINCE @kierisi has threatened to sarcastically/chaotically say incorrect things about p-values during #sliced tonight 😱 just to annoy people 😉,
I thought I’d do a quick thread on what a 🚨p-value🚨actually is.
🧵
(1/n)
Computationally, a p-value is p(data as extreme as ours | null). Imagine a 🌎 where the null hypothesis is true (e.g. there is no difference in cat fur shininess for cats eating food A vs. food B for 2 weeks), and see how extreme your observed data would be in that 🌎 (2/n)
So, you can think of the p-value as representing our data’s compatibility with the null hypothesis. Low p-values mean our data is not very likely in a world where the null is true. High p-values mean it is relatively likely. (3/n)