Hey #econtwitter, I'm excited to share a new WP with @conlon_chris about micro data (e.g. consumer surveys) in BLP estimation. Alongside it, we finally released PyBLP version 1.0!
So we dug through published/recent WPs to see how "micro BLP" has been used in practice. Turns out it's been used a lot. Let us know who we missed!
Most papers seem to incorporate micro data in problem-specific ways with different notation. This makes it challenging to evaluate different estimators and replicate results.
So we developed a standardized econometric framework that covers most existing cases (and more).
In our framework, "micro data" are generated by independent surveys of selected consumers (choice-based, stratified, etc), conditional on product-level "aggregate data."
Researchers either have full data (demographics, choices) or summary stats (means, correlations, etc).
Compatibility, interpretability, confidentiality, and cost are common reasons for using summary stats.
For those with/willing to use full survey results, we derive "optimal micro moments" that match scores. They're easy to compute in a second step with optimal weights/IVs.
We use Monte Carlo experiments to highlight some practical advice.
Micro data provide *within-market* variation, which is particularly useful with limited *cross-market* variation in demographics and choice sets. See @steventberry and @PhilHaile's nber.org/papers/w27704.
What does "limited cross-market variation" mean in practice?
We highlight a few standard micro moments: characteristic-demographic covariances for estimating how preferences vary with demographics, and second choices for unobserved preferences.
When is it a bad idea to use all info in a micro dataset?
One case is when aggregate and micro data are incompatible.
We give an example where income is measured differently across datasets. Carefully-chosen summary stats use compatible info and discard incompatible info.
When should micro moments be pooled across markets?
We recommend pooling markets that seem observably similar.
In general, there's a bias-variance trade-off. Market-specific moments may contain more info, but many moments can bias GMM (e.g. Han and Phillips, 2005).
What about numerical integration?
We mostly point to our earlier paper, but highlight one pitfall.
Quadrature is great, but it performs poorly for micro moments with demographic discontinuities. We recommend more continuous micro moments or Monte Carlo methods instead.
Will micro BLP work with different data sizes?
Yes! As long as aggregate and micro data are not too small.
We consider three important asymptotic thought experiments and find that micro BLP's desirable asymptotic properties seem to translate to finite samples.
How does this work in practice?
Our empirical example uses Nielsen scanner and household survey data.
We estimate pre-2017 soda demand in Seattle, predict effects of a 2018 tax, and compare with what happened. Micro data lets us reject big differences by demographics.
A standard concern is that arbitrary assumptions -> large market size -> large logit outside substitution.
So we show how to estimate outside diversion with a quick/cheap second choice survey. We hope these types of surveys will become more common in empirical IO.
For those looking to use micro moments with PyBLP, a good place to start is our tutorial estimating Petrin's (2002) model in < 100 lines of code. Let us know what you think!
Hey #EconTwitter, I'm on the job market with a paper about open source software. OSS is a global public good, widely used and provided by the private sector, but the target of recent industrial policy.
2/ I build an empirical model to quantify the global effects of China expanding its OSS policies (and a US response).
Before the model, what's going on in China? Most OSS collaboration happens on GitHub. Less-known is its Chinese counterpart, Gitee, state-backed since 2020.
3/ Along with subsidies, China appears willing to restrict OSS contributions.
I use censorship watchdog data to show that in 2021, China made it harder for domestic developers to access GitHub directly.
Anyone who's coded up BLP themselves has probably spent time with the appendix of Nevo's (2000) practitioner's guide, which shows how to use the implicit function theorem.
Hey #econtwitter, @conlon_chris and I just released version 0.13.0 of PyBLP, our Python package for BLP-style differentiated products demand estimation.
This is part of an ongoing project of ours: standardizing and developing best practices for how to incorporate all sorts of "micro data." No tutorials yet, but stay tuned!