A couple days ago another team asked me to speak about Bayesian data analysis. I decided that instead of doing a nuts/bolts of how to fit/use Bayesian models, I would describe "Bayesian analysis: The Good Parts". <potentially controversial thread>
Good Part 1. Explicitly modeling your data generating process

Bayesian analysis means writing down complete models that could have plausibly generated your data. This forces you to think carefully about your assumptions, which are often implicit in other methods.
Good Part 2: No need to derive estimators

There are a increasingly full-featured and high-quality tools that allow you to fit almost any model you can write down. Being able to treat model fitting as an abstraction is great for analytical productivity.
Good Part 3: Estimating a distribution

Bayesian analyses produce distributions as estimates rather than specific statistics about distributions. That means you deeply understand uncertainty and get a full-featured input into any downstream decision/calculation you need to make.
Good Part 4: Borrowing strength / sharing information

A common feature of Bayesian analysis is leveraging multiple sources of data (from different groups, times, or geographies) to share related parameters through a prior. This can help enormously with precision.
Good Part 5: Model checking as a core activity

Good Bayesian analyses consider a wide range of models that vary in assumptions and flexibility in order to see how they affect substantive results. There are principled, practical procedures for doing this.
Good Part 6: Interpretability of posteriors

What a posterior *means* makes more intuitive sense to people than most statistical tests. Validity of posterior rests on underlying assumption about correctness of model, which is not hard to to reason about.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sean J. Taylor

Sean J. Taylor Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @seanjtaylor

21 Jul
Short thread on GPT-3.

I haven't worked on text models in a long time, because (TBH) I find them boring. I had been ignoring progress in that space because you could kind of see where it was heading. I don't feel *that* surprised by GPT-3 but it illustrates some useful ideas.
To me, what's big is challenging current status quo of many specialized single-task models with one general multi-task model. Expensive, pre-trained embeddings are common at large cos, but mostly used as features for specific learning tasks. Multi-task models have small # tasks.
As @sh_reya points out, big challenge becomes "how do you explain to the model what task it should be working on?" Probably a large design space here and require entirely new "meta-query" language. Also challenging to formally evaluate a model like this, hard to quantify value.
Read 7 tweets
5 Jun
I'm procrastinating tonight so I'll share a quick management tool I use. It's close to the end of H1 so performance reviews are coming. I tell this to my reports: "Your work is going to be distilled into a story, please help me tell a good one so I can represent your work well"
A good story must be easy to understand and compelling. At all times you should think about what story you'll tell about your work. It helps you do good work and it helps me (your manager) get you the credit you deserve. A story has three parts: a beginning, middle, end.
Beginning of the story:
- Your work is well motivated. It addresses a clear need that you were smart to identify.
- Help me by being deliberate with project choice, finding good opportunities, and not chasing shiny objects. Generate buy-in and excitement before starting.
Read 7 tweets
28 Mar
I think I had a tough time communicating with @yudapearl today. It’s worth sharing where I think we ended up misunderstanding each other. I don’t think he is likely to agree with me, but it's useful for me to articulate here.

Here’s the seed tweet:
I shared the Meng paper because it’s a nice discussion of how greater sample size doesn’t solve estimation problems. This is part of a strong opinion I have that collecting adequate data is the key challenge in most empirical problems. Some people will not agree with this.
Most folks thought I was talking about causal inference from the start. I was actually talking about the tool of *randomization*. IMO, Meng’s paper is an example of measuring the value of randomization for an estimation problem. Randomness is a complement to sample size.
Read 13 tweets
19 Oct 19
I think this is an interesting topic but found this visualization hard to follow (no surprise if you've been reading my complaints about animated plots).

I have nothing to do tonight so i'm going to try to re-visualize this data. Starting a THREAD I'll keep updated as I go.
The original data is from the ACS. Nathan used a tool called IPUMS to download the data set: usa.ipums.org/usa/

Looks like there's a variable called TRANTIME that is "Travel time to work." The map uses PUMA as the geography, which are areas with ~100K people each.
IPUMS is pretty annoying to use. You need an account and you create a dataset to add to your "data cart"(!!!). But I was able to download a file with the 2017 ACS responses for TRANTIME, along with PUMA, and STATEFIP. The latter two fields uniquely identify the geographic region.
Read 27 tweets
6 Sep 19
This is a pretty short and unpolished <thread> on launch criteria for experiments. Hoping for feedback!

Background: one heuristic people use to decide to "ship" in an A/B test setting is p-value < 0.05 (or maybe 0.01). How important is "stat sig" for maximizing expected value?
I simulated 10,000 A/B tests with effects drawn from Laplace(0, 0.05) (most effects are close to zero) with Normal(0,1) noise and N=2000. I'm going to ignore costs of "shipping" and assume effects are additive, both huge assumptions. Here's the distribution of effects:
Since simulated data, I know the true effects. I order the experiments left to right by one-sided p-value (H0: effect <= 0). This p < 0.05 criterion would catch a lot of good tests, but ignore a lot of other positive ones. We have high precision but low recall.
Read 8 tweets
30 Apr 19
📈Long thread on how Prophet works📈

- Instead of sharing slides I'm transcribing my QCon.AI talk to tweets with lots of GIFs :)
- Thanks to @_bletham_ for all his help.
Prophet motivation:

- there are many forecasting tasks at companies
- they are not glamorous problems and most people aren't trained well to tackle them
- 80% of these applications can be handled by a relatively simple model that is easy to use
We approach time series forecasting as a *curve fitting problem.* This has some benefits:
- curves are easy to reason about and you can decompose them
- the parameters you fit have straightforward interpretations
- curve fitting is very fast so you can iterate quickly
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!