Follow @StatStas

12,399 views

Stas Kolenikov

Follow @StatStas

, 35 tweets, 8 min read

My Authors

I was asked recently about why the number of replicate weights is the way it is... 80 or 200 or whatever the number might be. Here's my thinking.

The numbers come from different methods, and the different methods in turn have the different requirements.

The replicate variance estimation methods work by omitting some units from the sample, and possibly doubling or tripling (or 1.5-ing) the remaining observations to keep the total population at about the right level.

The most common methods are the jackknife, the balanced repeated replication, and the bootstrap. Self-promotion opportunity: an article with review and some Stata implementations journals.sagepub.com/doi/10.1177/15…

In its basic form, the jackknife omits one PSU at a time; hence the number of replicates must be == number of PSUs.

While there are methods to combine the individual cases into fake PSU and fake strata, I personally find them unappealing: I don’t understand whether the conditions for validity of the standard errors are satisfied, nor how to check these conditions. amstat.tandfonline.com/doi/abs/10.119…

(A PSU is a unit or a collection of units that you get the very first time you take a random draw. In some designs, it is a person if you have a list of people, e.g. members of a professional society, participants in a program, etc. ...

... In most face-to-face survey designs, it is a geographic area, so inevitably there are multiple people in that area.)

Balanced repeated replication is applicable to only one design: stratified samples with 2 PSUs per stratum. BRR omits one of the two PSUs and doubles up the other. It is a fairly effective method: you need about as many replicates as there are strata in your design

(the nearest greater multiple of 4, to be precise).

The bootstrap is a method that takes samples with replacement. In the complex survey sampling world, that means samples of whole PSUs. amstat.tandfonline.com/doi/abs/10.108…

What are the magic numbers that you would see? (In software development, "magic number" refers to something written as an actual number in the code, as opposed to a variable. en.wikipedia.org/wiki/Magic_num…)

(Usually, the use of magic numbers in code is a pretty bad smell. If that number has to change, or if the developer who wrote the code and scattered those numbers without explanation leaves the org, these are hard to change properly.) en.wikipedia.org/wiki/Code_smell

So the sequence of magic numbers, from smallest to largest, is 80, 120, 200 and 500.

The number 80 smells like balanced repeated replication (BRR), a method commonly used with geographically clustered surveys that have ~80 PSUs (or fake PSUs made by collapsing some actual PSUs) -- mostly because it is so easily divisible by 4.

The number of 80 shows up in the American Community Survey which is a clustered design. Its strata are county-level collections of Census blocks. census.gov/programs-surve…

ACS is a systematic sample within these strata, and the variance estimation method, successive difference replication (SDR), reflects the systematic nature of the sample www150.statcan.gc.ca/n1/en/pub/12-0…

(although technically it cannot even be established that these estimates are unbiased. but that's the best we can do with the data that we have; some survey statisticians love the method because it provides implicit stratification with associated precision gains)

@IPUMS

@IPUMS

See @IPUMS tech documentation: usa.ipums.org/usa/repwt.shtml

Another survey that uses 80 replicate weights is PISA (Programme for International Student Assessment), coming from BRR perspective this time. Their methodological documentation is somewhat less convincing to me, so I take it with a grain of salt. oecd-ilibrary.org/docserver/9789…

Another important demographic and economic survey, the Current Population Survey, has 160 replicate weights for variance estimation.

It is a geographically stratified design, with strata (a total of about 800, so the 160 replicate weights represent a simplification of the design) and geographic clusters are much larger that those in ACS. The variance estimation method is again SDR. census.gov/prod/2006pubs/…

@IPUMS

@IPUMS

The @IPUMS technical documentation is very comaprable to that concerning #ACSdata: cps.ipums.org/cps/repwt.shtml

American Housing Survey also uses 160 replicate weights huduser.gov/portal/dataset…

@EdNCES

@EdNCES

The next magic number up is 200. It appears in some of the @EdNCES data products (nces.ed.gov/training/datau…). This is tied to BRR, as is PISA. Somehow, this method has become popular in the education survey world, apparently.

@pewresearch

@pewresearch

When I worked on @pewresearch Survey of American Jews, we ended up with 256 bootstrap replicate weights pewresearch.org/wp-content/upl….

The magic number of 500 happens north of the border (for me). Statistics Canada uses bootstrap for everything, and their code invariably uses 500 bootstrap replicate weights.

My guess is that they developed a SAS macro in early 1990s, hard-coded that number, and you never touch production code and are stuck with that number forever. www150.statcan.gc.ca/n1/pub/12-002-…

My understanding is that they landed on that number based on guidance of JNK Rao, the original developer of the theory of basically all replicate variance estimation methods, to ensure stability of variance estimates and percentiles.

So in the end of the day, it boils down to personal preferences of survey statisticians. My preference is the bootstrap, which I think is a far more straightforward method.

It has fewer mathematical assumptions necessary to justify it, and it is applicable to nearly all kinds of statistics unlike the jackknife which does not always work with non-differentiable statistics (cdfs, percentiles, Gini).

Also, while either jackknife or BRR or SDR weights have to be used together as a system, in that the different replicates must complement one another to ensure the mathematical properties, the bootstrap weights are independent.

So for pilot runs of tables, one can use maybe 20 or 50 replicates to develop the syntax, and then return to the full 200 or 500 for the production run.

Thanks for coming to my TED talk on variance estimation.

@threadreaderapp

@threadreaderapp

@threadreaderapp unroll or compile or whatever you call it

Try unrolling a thread yourself!

Related hashtags

More from @StatStas see all

Embed code for your website

Did Thread Reader help you today?