By how much do response rate differ between the different population groups? Today, we will explore this with the @BRFSS data (cdc.gov/brfss/annual_dā€¦) šŸ§µšŸ‘‡ 1/11
The data set makes it possible to study these differentials as it is one of the relatively rare data sets with both the design weights and the calibrated weights... in our case, _WT2RAKE and _LLCPTWT. 2/11
(It's not me who likes to YELL, it is CDC. If you don't like the variable names, janitor them the way do like) 3/11
# libraries
library(tidyverse)
library(zip)
library(haven)
library(magrittr)
library(sjlabelled)
# 4/11
# get the data -- dumped in the current directory getwd()
temp_zip <- tempfile()
download.file("cdc.gov/brfss/annual_dā€¦",temp_zip)
unzip(temp_zip)
read_xpt("LLCP2020.XPT") -> brfss2020
unlink(temp_zip)
# 5/11
# compute the raking ratio
brfss2020 %<>% mutate(raking_ratio = `_LLCPWT` / `_WT2RAKE`)
# and yes, I do like magrittr::`%<>% two-way pipe
# 6/11
# label values of education
brfss2020 %<>% mutate(EDUCA=labelled(EDUCA, label="Highest level of education",
labels=c("No school"=1, "Grade 1-8"=2, "Grade 9-11"=3,
"HS/GED"=4, "Some college"=5, "College or more"=6, "REF"=9)))
# 7/11
... waiting for another five or so years until #rstats supports labeled integers as a data structure properly... ah well...
8/11
# summary by education
brfss2020 %>% group_by(EDUCA) %>% summarize(mean_rr=mean(1/raking_ratio))

# 9/11
# difference between the extremes -- college-educated vs. HS dropouts
brfss2020 %>% group_by(EDUCA) %>% summarize(mean_rr=mean(1/raking_ratio)) %>%
ungroup() %>% summarize(diff=max(mean_rr)/min(mean_rr))

# 10/11
So the differences in response rates may be more than twofold. /fin
@threadreaderapp unroll or compile svp

ā€¢ ā€¢ ā€¢

Missing some Tweet in this thread? You can try to force a refresh
怀

Keep Current with Stas Kolenikov

Stas Kolenikov Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @StatStas

28 Sep
Quality of the #2020Census data -- a session on race and ethnicity coding.

Census follows the OMB 1997 standards, does not set the standards: 2 categories of ethnicity, 5 categories for race (AIAN, Asian, Black/AA, Native American/Pacific Islander, White) + "some other race"
The race and ethnicity questions help understand how people self-identify, so research into these is necessary to understand how the U.S. population evolves (more multiracial, more diverse than measured in the past)
There were some proposals to start offering "Middle Eastern / North African" (MENA), but they did not make it to the #2020Census.
Read 22 tweets
12 Aug
#JSM2021 panel led by @minebocek on upskilling for a statistician -- how to learn??
@minebocek #JSM2021 @hglanz no shortage of stuff to learn. First identify what you don't know -- that comes from modern media (blogs, twitter, podcasts; groups, communities -- @RLadiesGlobal or local chapters; professional organizations -- @amstatnews ).
@minebocek @hglanz @RLadiesGlobal @AmstatNews #JSM2021 @hglanz What do the job postings require these days? (This is how the content for the @CalPoly stat/data science program was developed.)
Read 64 tweets
12 Aug
#JSM2021 an exceptionally rare case of ACTUAL out of sample prediction in #MachineLearning #ML #AI: two rounds of the same health data collection by @CDCgov
@CDCgov Yulei He @cdcgov #JSM2021 RANDS 1 (fall 2015) + 2 (spring 2016): Build models on RANDS1 and compare predictions for RANDS2

ridge, lasso, elastic net, PLS, KNN, bagging, RF, GBM, XGBoost, SVM, deep learning
#JSM2021 Yulei He R-square about 30%; random forests and grad boosting reduce the prediction error by about 4%, shrinking towards the mean; standard errors are way to small (-50% than should be)
Read 4 tweets
11 Aug
I have two general questions:

1. when will the survey statisticians in the U.S. move from weird variance estimation methods (grouped jackknife) to simple and straightforward (bootstrap)
and

2. when will they move from weird imputation methods with limited dimensionality and limited ability to assess the implicit model fit (hotdeck) to those where you explicitly model and understand which variables matter for this particular outcome (ICE)?
Oh and somebody reminded me of

3. when will we move from PROC STEPWISE to lasso as the rest of statistics world has
Read 4 tweets
10 Aug
#JSM2021 @jameswagner254 Using Machine Learning and Statistical Models to Predict Survey Costs -- presentation on the several attempts to integrate cost models into responsive design systems
#JSM2021 @jameswagner254 Responsive designs operate on indicators of errors and costs. Error indicators: R-indicator, balance indicators, FMI, sensitivity to ignorability assumptions (@bradytwest @Rodjlittle Andridge papers).
@jameswagner254 #JSM2021 @jameswagner254 Cost indicators? more difficult; proxies: # of attempts (Groves & Heeringa 2006)

Some decisions are made at the sample level (launch new replicate, switch to a new phase of the FU protocol), others at case level (change incentive amount, change mode)
Read 6 tweets
10 Aug
Now let's see how @olson_km is going to live tweet while giving her own #JSM2021 talk
@olson_km #JSM2021 @olson_km Decisions in survey design: questions of survey errors and questions of survey costs. Cost studies are hard: difficult to offer experimental variation of design features, with a possible exception of incentives. Observational examinations are more typical.
#JSM2021 @olson_km When you have one (repeated) survey at a time, you can better study the impacts of variable design features (but can't provide the basis for the features that do not vary.)
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(