Responses indicate that even statistical professionals have zero clue as to what it takes to have a survey of 1000 randomly selected Americans every week. Proposals to have 50,000 every week would put the sample sizes on par with American Community Survey ($250M / year).
I'll expand on this a little bit.
1. The sample size: The rate of new cases in the U.S. right now is about 20 new cases per day per 100K. Thus a sample of n=1000 would capture cases at the Poisson rate of (20 cases / 100 K pop * 7 days * 1000 in sample) = 14. The prediction interval around that is...
> qpois(c(0.025,0.975), 14)
[1] 7 22
So you can only say something meaningful that you have any reasonable evidence that the rate is changing when your sample counts are outside that zone.
To be informative, the sample will probably need much larger sizes, and/or proper statistical support where the published estimate is a function of the last week's result combined with the new tests and their results (including metadata such as positivity).
2. The U.S. lacks a frame for this exercise, at least an easily available one. (@DrTomEmery alluded to that. ; NL and Nordic countries have population registers with stable, linkable IDs, so you can analyze everything from education to work to health.)
The Census Bureau has the Master Address File, and is still in the process of compiling the list of people as the result of the recent decennial census. But I don't think it is legal to use that list for sampling purposes.
The @CensusBureau have already done their best in setting up a quick turnaround survey of #COVID19 impacts census.gov/programs-surve…
Lacking that, one has to go with commercial registers, which are based on the data that USPS provides (you cannot technically say that they sell that data... so "provides"), combined with credit agencies data and all those cases where you forgot to check "don't share my data".
The technical term for that is Address-Based Sampling, which was mentioned a couple of times as a somewhat viable option in the original big tree of responses
3. The next part though is, who is going to respond? The previous tweet indicated response rate of just under 3%, and that does not surprise me the least bit, that's what one would reasonably expect to achieve on an intrusive survey.
How much public trust would a government survey with 3% response rate have is an open question.
In practice, there are groups of people who systematically respond less frequently to most surveys: anti-government people (who call themselves "libertarian" and think that government should be limited to operation of the courts and the military), ...
... folks of lower education; blacks (who, let's just say, have reasons not to trust the medical world that much); Hispanics (for both the language barrier and, let's just say, not as much desire to interact with anything federal).
All of these groups will have outcomes correlated with response propensities, which spells bias in the survey results. The freedom loving people will have (far more) riskier behaviors, and the lower SES and racial/ethnic minorities are far more likely to be the frontline workers.
... and/or live in crowded conditions, and/or have multigenerational households, all conducive for COVID19 transmission. But I digress, this is a sub-topic on nonresponse, and I am generally done with this.
4. How much will it cost, who will pay for it, how quickly can it be rolled out? You can sort of track the costs of the American Community Survey to be about $250M per year, and that buys about 4M people.
The cost of $60 is *MIND BLOWING* low for the professionals in the survey field because of (1) the scale of operation and (2) it is mostly on the Internet, so the expensive $1000 per case in-person interviews are limited to single digit fractions of the sample.
A more fair comparison could have been the National Health and Nutrition Examination Survey (NHANES) that does collect biospecimen data cdc.gov/nchs/nhanes/bi…. A two-year cycle collects data for about 8K. My semi-educated guess is that the budget is eight digits.
All of NCHS is about $160M per year, so NHANES can't be more that $20 or $30M, which makes the cost per case somewhere north of $4K. And that does not surprise me one least bit.
The hypothetical COVID survey would fall somewhere in between, so let's just say it is $500 per case... $0.5M per week... $26M per year. That's more than all of the Bureau of Transportation Statistics, and 1/6 of NCHS.
That's between a very sizeable and a huge federal program to build. (Now if you indulge the possibility of n=50,000 per week, multiply everything by 50, and you land at $1B, which is like 1/3 of all the federal statistical system.)
The budget figures are at magazine.amstat.org/blog/2020/04/0… courtesy of @ASA_SciPol; he can comment how likely the Congress is to chop out another 8 to 9 digit federal statistical program.
5. Now there are really only minor issues left. How do you ensure 1000 WEEKLY?
The only semi-reliable frame capable of producing results within days is phone, but you can't take a COVID test on the phone. (As @AAPOR president at the time @DDutwin put it, phone surveys are <strike>dead</strike> on life support now.)
ABS mail surveys (or mail push-to-web surveys) operate on the scale of 3-4 months. Even if you mail the test kit, that's several days in the mail one way, a day or so in express mail (which isn't freaking cheap, here's $50 per prepaid envelope already) the other way...
Under an absolutely ideal scenario, you mail something on Monday the week before, the recipient takes a test the following Monday, the test result is ready Wednesday, and the lab that did the test somehow reports it right away (through a secure portal that does not yet exist)
and it is all processed by Friday to get published as the statistic for this week. Hooray. Any delays, like the mail gets opened a day late, or the selected respondent can't do the test right away, or something like that -- and the whole schedule breaks down right away.
Realistically, the fastest turnaround survey that is not a survey-lite like Household Pulse is the Current Population Survey, and it is only able to produce results monthly based on (1) in-person recruitment in Month 1 and (2) phone follow-ups in subsequent months.
CPS has a respectable sample size of 60K per month... and I'd be curious to track how much it costs. (It nominally sits in BLS, which has a budget of $600M per year, so CPS alone may be $100M or so.)
tl;dr -- a weekly survey of randomly selected 1000 Americans is as much a bird in the sky as vaccinating the whole country the next week or completing that LA to SF high speed rail this year. /fin
P.S. (of course there will be a p.s.) -- as @patricksturg indicated, running a survey like that is not *ENTIRELY* impossible. But the format is different, it is a panel of the same households rather than a fresh weekly sample. ons.gov.uk/peoplepopulati…
And @ONS is a central agency that does not have only-in-America issues of passing the list of names and addresses from one agency to another.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Stas Kolenikov

Stas Kolenikov Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @StatStas

3 Jun 20
I was asked recently about why the number of replicate weights is the way it is... 80 or 200 or whatever the number might be. Here's my thinking.
The numbers come from different methods, and the different methods in turn have the different requirements.
The replicate variance estimation methods work by omitting some units from the sample, and possibly doubling or tripling (or 1.5-ing) the remaining observations to keep the total population at about the right level.
Read 35 tweets
1 Jun 19
#SDSS2019 @gdequeiroz do people need to be on Twitter to be a part of #datascience community? How do we include people who are not on twitter?
#SDSS2019 @AmeliaMN joined twitter one day apart of first opening #rstats. She encourages people to at least open a twitter account and follow people.
#SDSS2019 @dataandme ecosystem of spaces approach: different fora that exist are good for different types of interactions -- think @StackOverflow @StackStats; @RStudio created an online community; there are people who actively contribute to @GitHub
Read 36 tweets
31 May 19
At #SDSS2019, I am chairing a session on workflows with @TiffanyTimbers @mikelove @stephaniehicks at 3:45 in Regency C - it is tucked away a bit in the corridor between AB and D
Read 55 tweets
21 May 18
@MCLevenstein @MaryELosch @ICPSR @bradytwest @NAHDAP1 @DSDRdata Well in this case (as is the case with many other documents written by statisticians who assume that every researcher knows enough survey statistics to connect the dots), the documentation does not explain the use of complex weights. It just says, "weights should always be used".
@MCLevenstein @MaryELosch @ICPSR @bradytwest @NAHDAP1 @DSDRdata A clear specification should be:
- in Stata, this is your -svyset-
- in SAS, this is your PROC SURVEY ; WEIGHTS = ; CLUSTER = ; STRATA = ; setup
- in R, here's your svydesign
so that researchers could pick and drop this into their analyses.
@MCLevenstein @MaryELosch @ICPSR @bradytwest @NAHDAP1 @DSDRdata There will be cases when you would also need to say something like,
For household analyses, the specification is [BLAH]
For analysis of adults, the specification is [BLAH]
For analysis of children, the specification is [BLAH]
For analysis of urine samples, ...
Read 17 tweets
18 May 18
#SDSS2018 source of data and summary stats
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!