Stas Kolenikov Profile picture
Survey statistician. Views are not my employer's (@NORCnews). Looking forward to opportunities to collect data for you. He/him.
Jun 15, 2023 18 tweets 26 min read
@rnishimura @awmercer @jon_m_rob @kwcollins @bradytwest +1 to Andrew, +1 to Brady. @JerryTimbrook had this Lickert scale vs. Like-ert scale @AAPOR-award-winning meme. It it time to produce "entropy balancing is Case 2 of Deville and Sarndal (1992)" meme. @rnishimura @awmercer @jon_m_rob @kwcollins @bradytwest @JerryTimbrook @AAPOR It is unfortunate that economists continue to refuse to read the survey stats literature, and overblow the significance and originality of their work, but that's an uphill battle that myself and @bradytwest and @rnishimura and @MdtvieiraS are not in a position to fight.
Oct 12, 2021 18 tweets 13 min read
Webinar on @ipums data @popdatatech @nhgis @ipumsi and I am probably forgetting their other accounts - human population data… … geographic data @dcvanriper
Sep 29, 2021 12 tweets 3 min read
By how much do response rate differ between the different population groups? Today, we will explore this with the @BRFSS data (cdc.gov/brfss/annual_d…) 🧵👇 1/11 The data set makes it possible to study these differentials as it is one of the relatively rare data sets with both the design weights and the calibrated weights... in our case, _WT2RAKE and _LLCPTWT. 2/11
Sep 28, 2021 22 tweets 5 min read
Quality of the #2020Census data -- a session on race and ethnicity coding.

Census follows the OMB 1997 standards, does not set the standards: 2 categories of ethnicity, 5 categories for race (AIAN, Asian, Black/AA, Native American/Pacific Islander, White) + "some other race" The race and ethnicity questions help understand how people self-identify, so research into these is necessary to understand how the U.S. population evolves (more multiracial, more diverse than measured in the past)
Aug 12, 2021 64 tweets 47 min read
#JSM2021 panel led by @minebocek on upskilling for a statistician -- how to learn?? @minebocek #JSM2021 @hglanz no shortage of stuff to learn. First identify what you don't know -- that comes from modern media (blogs, twitter, podcasts; groups, communities -- @RLadiesGlobal or local chapters; professional organizations -- @amstatnews ).
Aug 12, 2021 4 tweets 3 min read
#JSM2021 an exceptionally rare case of ACTUAL out of sample prediction in #MachineLearning #ML #AI: two rounds of the same health data collection by @CDCgov @CDCgov Yulei He @cdcgov #JSM2021 RANDS 1 (fall 2015) + 2 (spring 2016): Build models on RANDS1 and compare predictions for RANDS2

ridge, lasso, elastic net, PLS, KNN, bagging, RF, GBM, XGBoost, SVM, deep learning
Aug 11, 2021 4 tweets 1 min read
I have two general questions:

1. when will the survey statisticians in the U.S. move from weird variance estimation methods (grouped jackknife) to simple and straightforward (bootstrap) and

2. when will they move from weird imputation methods with limited dimensionality and limited ability to assess the implicit model fit (hotdeck) to those where you explicitly model and understand which variables matter for this particular outcome (ICE)?
Aug 10, 2021 6 tweets 5 min read
#JSM2021 @jameswagner254 Using Machine Learning and Statistical Models to Predict Survey Costs -- presentation on the several attempts to integrate cost models into responsive design systems #JSM2021 @jameswagner254 Responsive designs operate on indicators of errors and costs. Error indicators: R-indicator, balance indicators, FMI, sensitivity to ignorability assumptions (@bradytwest @Rodjlittle Andridge papers).
Aug 10, 2021 12 tweets 8 min read
Now let's see how @olson_km is going to live tweet while giving her own #JSM2021 talk @olson_km #JSM2021 @olson_km Decisions in survey design: questions of survey errors and questions of survey costs. Cost studies are hard: difficult to offer experimental variation of design features, with a possible exception of incentives. Observational examinations are more typical.
Aug 10, 2021 4 tweets 1 min read
#JSM2021 virtual vs. in-person: IMO there are exactly two activities at an average JSM that dictate in-person presence: cheering at the award ceremonies and browsing the new books. Confidential coffee (job search, editorial boards) can be done with burner phones. Committee meetings should be /must be zoom calls; nobody is going back to in-person on that one. Having the presentations/files in advance/right after the event is the level of awesomeness not ever achieved by the conferences of the yester year.
Feb 26, 2021 35 tweets 9 min read
Responses indicate that even statistical professionals have zero clue as to what it takes to have a survey of 1000 randomly selected Americans every week. Proposals to have 50,000 every week would put the sample sizes on par with American Community Survey ($250M / year). I'll expand on this a little bit.
Jun 3, 2020 35 tweets 8 min read
I was asked recently about why the number of replicate weights is the way it is... 80 or 200 or whatever the number might be. Here's my thinking. The numbers come from different methods, and the different methods in turn have the different requirements.
Jun 1, 2019 36 tweets 23 min read
#SDSS2019 @gdequeiroz do people need to be on Twitter to be a part of #datascience community? How do we include people who are not on twitter? #SDSS2019 @AmeliaMN joined twitter one day apart of first opening #rstats. She encourages people to at least open a twitter account and follow people.
May 31, 2019 55 tweets 35 min read
At #SDSS2019, I am chairing a session on workflows with @TiffanyTimbers @mikelove @stephaniehicks at 3:45 in Regency C - it is tucked away a bit in the corridor between AB and D @TiffanyTimbers @mikelove @stephaniehicks #SDSS2019 See materials at
May 21, 2018 17 tweets 24 min read
@MCLevenstein @MaryELosch @ICPSR @bradytwest @NAHDAP1 @DSDRdata Well in this case (as is the case with many other documents written by statisticians who assume that every researcher knows enough survey statistics to connect the dots), the documentation does not explain the use of complex weights. It just says, "weights should always be used". @MCLevenstein @MaryELosch @ICPSR @bradytwest @NAHDAP1 @DSDRdata A clear specification should be:
- in Stata, this is your -svyset-
- in SAS, this is your PROC SURVEY ; WEIGHTS = ; CLUSTER = ; STRATA = ; setup
- in R, here's your svydesign
so that researchers could pick and drop this into their analyses.
May 18, 2018 12 tweets 8 min read
#SDSS2018 Wendy Martinez #textmining #textanalysis #SDSS2018 source of data and summary stats