Tweet

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @WomenInStat

Women in Statistics and Data Science

@WomenInStat

28 Oct

Tweetorial on going from regression to estimating causal effects with machine learning.

I get a lot of questions from students regarding how to think about this *conceptually*, so this is a beginner-friendly #causaltwitter high-level overview with additional references.

One thing to keep in mind is that a traditional parametric regression is estimating a conditional mean E(Y|T,X).

The bias—variance tradeoff is for that conditional mean, not the coefficients in front of T and X.

The next step to think about conceptually is that this conditional mean E(Y|T,X) can be estimated with other tools. Yes, standard parametric regression, but also machine learning tools like random forests.

It’s OK if this is big conceptual leap for you! It is for many people!

Read 13 tweets

Women in Statistics and Data Science

@WomenInStat

8 Oct

🧵 time! I’d love to talk about the responsibilities we have as data practitioners. In this ~~information age~~ I think it’s critical we use data, ML, stats, and algorithms fairly, and with an eye toward making the world better for people.

I found this piece on data visualization very striking: medium.com/nightingale/it…

“Knowledge is never subjective.” As the creator of a graph, you hold the narrative power.

Read 11 tweets

Women in Statistics and Data Science

@WomenInStat

7 Oct

Lots of people have asked me if studying biostats has actually been relevant in my career as a software engineer, and I’ve found the answer to be a resounding yes! It's super relevant in lots of engineering problems and in understanding the world generally. 🧵 follows!

When I worked on payment fraud prevention, I was always talking about diagnostic testing for rare diseases!

Diagnostic testing was something we studied at length in our early biostat & epi classes in grad school and it turns out “fraud” behaves similarly to a “rare disease” in a lot of ways.

Read 9 tweets

Women in Statistics and Data Science

@WomenInStat

6 Oct

Gerrymandering gets its name from one Elbridge Gerry, who in 1812 drew a voting district in Boston that looked like a salamander because it was politically expedient.

the practice persists through today, from city council districts all the way up to (arguably) the Electoral College!

math, statistics, and measurement have played a key role in several court cases related to the ongoing discussion and fight for fair and representative districts.

Read 6 tweets

Women in Statistics and Data Science

@WomenInStat

4 Sep

One more quick tweet, unrelated to the Gelman-Rubin diagnostic.

Someone asked, "I hear C++ is fast but a little hard to grasp. That true?"

Mostly yes. Like Python, R is mostly easier to learn and often is slower than C/C++.

I recommend you think about how your code will be used when you decide what language to code in. If you're coding for yourself and you probably just need to run it once, then R may be a good choice. Optimizing for speed may be overkill. (2/)

If you are writing a function/package for public consumption, then speed is much more of a concern. You can profile your code to see which parts are time-consuming. You can also just google what things R is slow at (ex loops). (3/)

Read 10 tweets

Women in Statistics and Data Science

@WomenInStat

2 Sep

Let's extend the linear model (LM) in the directio of the GLM first. If you loosen up the normality assumption to instead allow Poisson, binomial, etc (members of the "exponential family" of distributions), then you can model count, binary, etc responses. (4/)

You've probably heard of Poisson regression or logistic regression. These fall under the umbrella of GLM. (5/)

The LM regression equation is E(Y) = X Beta, where X is the model matrix, Beta is the vector of coefficients, Y is the response vector, and E(Y) is the expected val.

For Poisson regression, we have log(E(Y)) = X Beta.
For logistic regression, we have log(p/(1-p))= X Beta (6/)

Read 16 tweets

Share this page!

Women in Statistics and Data Science

Try unrolling a thread yourself!

More from @WomenInStat

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Did Thread Reader help you today?

Like this author's thread?