#JSM2021 panel led by @minebocek on upskilling for a statistician -- how to learn??
@minebocek#JSM2021@hglanz no shortage of stuff to learn. First identify what you don't know -- that comes from modern media (blogs, twitter, podcasts; groups, communities -- @RLadiesGlobal or local chapters; professional organizations -- @amstatnews ).
#JSM2021@hglanz textbooks you can take to the beach (SK -- hate those coastal elites with their... beach... stuff) -- the free online books (Intro to Stat Learning on Hastie's webpage; #r4ds by @hadleywickham@StatGarrett; etc.)
#JSM2021@hglanz use case: how many terms do you recognize in this retweet? what is the context? what modeling do you already know? what goes into data prep? what is an ODBC connections? what needs to be permitted?
#JSM2021 I'll have to subtweet Chris Malone next as I don't know his handle
#JSM2021@chris_j_malone the program at Winona State had to evolve into two separate #DS vs Statistics programs. Chris shows how the definitions of both evolved on @wikipedia over time.
#JSM2021@chris_j_malone if 80% of your time is data prep, then undergraduates should spend 80% of their class time to learn that! They will probably do Excel and data cleaning and vizualizing the data sets before they "graduate" to #randomforest.
#JSM2021@chris_j_malone ~4 weeks of teaching worksheets is absolutely enough, students are about done by then. #DS is interdisciplinary => put #DS students in other discipline classes (up to maybe 1/3 credits), have them team-tag with other majors
@Chris_J_Malone#JSM2021@chris_j_malone in #DS the outcomes are data products, not reports. Teach how to produce data products and how to communicate them to stakeholders.
#JSM2021@DebAtStat common data science tools: computational statistics or statistical computing? the former is linear algebra, optimization, numerical issues; the latter, data structures, packages, regex, objects...
#JSM2021@DebAtStat Advice 1: take the time to learn computing well. What are the paradigms? What are the data structures? What are the code structures?
#JSM2021@DebAtStat Advice 2: learn how to learn new technologies (SK: my understanding is that proper developers learn an entirely new framework / programming language every 6 to 24 months; we have to copy that, too)
@DebAtStat#JSM2021@DebAtStat Advice 3: find friends, find partners, work on a project together, maybe start small.
@DebAtStat@BerkeleyDataSci#JSM2021 personal note now: I think overall the session is missing the whole point of "upskilling" and "keeping current". These are the issues for early/mid-career statisticians. The talks so far, except @hglanz, are about undergraduate programs.
#JSM2021 personal note ctd: I am not going to go back and enroll in an undergraduate program, that makes zero sense. I need to patch the holes in my knowledge of computing, and what I know on stat methods and data cleaning exceeds the UG programs by two orders of magnitude.
#JSM2021 Joan Combs Durso @econoprof it's not just statisticians, everybody else has to upskill in the age of data science #DS
#JSM2021@econoprof thinking as an economist: our human/intellectual capital depreciates over time... but there are also catastrophic losses -- platforms change? budget cuts? software updates?
@econoprof#JSM2021@econoprof Adult learning -- no all-nighters to learn the new #rstats package; hands-on learning; connect the new material to what you already know (and we know a lot), performance deadline pressure and technophobia; working with others
#JSM2021@econoprof No More Feedback boo by Carol Sanford (link please?) -- personal development plan, not what HR tells you to do -- start from your essence, assume intrinsic motivation, you drive it, you audit it.
#JSM2021@econoprof start with a recipe -- something you are familiar with -- and adapt it; take an old project and reproduce it with a new software, make it reproducible, get someone else to test it.
@econoprof@RLadiesGlobal#JSM2021@econoprof contribute your beginning mindset -- ask stupid questions about features of a package, become a usability tester, contribute edits to docs. Write about the journey you are going through! That way, you will have your thoughts organized, and you will help others
#JSM2021@econoprof you can learn statistics and #DS in the shower -- listen to podcasts... her favorites are Data Skeptic or Pod of Asclepius or Stats + Stories or Not so standard deviations (links maybe?)
#JSM2021@DebAtStat it's been an eye-opener to co-teach with computer scientists -- "just throw everything in and regularize". Of course we need to do both.
So I will continue this thread as I think this #JSM2021 session was a partially missed opportunity. The session had two presentations about the existing undergraduate #datascience or #DS-statistics programs.
#JSM2021 That's great to the extent that building programs is difficult. I tried building an interdisciplinary program when I was a tenure track assistant prof, and I was all but laughed at. Kudos to the people who did put it together.
However I read the title of this #JSM2021 session as "what statisticians with their terminal graduate degrees should do to keep their chops up-to-date", and answers to that were partial. Not all of the tips and tricks apply to us mid career in 30s and 40s.
#JSM2021 in no particular order: (1) Time: we don't have extra time on our hands. It's work, often more than 40 hours. It's family -- often the kids that you really want to be with on the weekend, and whom you need to taxicab to the afterschool activities.
#JSM2021 still in no order (2) projects on GitHub. Yes, I have 15 ongoing #datascience projects. No, I can't share them with you, dear hiring manager, because they are on client's data, and my employer owns the code.
#JSM2021 (3) classes to take -- no, I don't need the basic data cleaning class, I've been doing this crap artisanally for 20 years. No, I don't need the general Python introduction where they write web apps. No, I don't need Julia to solve astrophysics diffeqs.
#JSM2021 what I need to learn on the software development and version control is how to work on a project with 20 analysts where the procedural code sits with the data on a protected server, rather than the code being developed on local machines and gets compiled to an .exe file.
#JSM2021 upskilling is patching. You took LSI from CR Rao's book -- now you need to learn just a tad more about lasso, you don't need all the Gauss-Markov theory all over again. You learned BUGS in grad school -- now you need to learn HMC and why divergence may happen with it.
#JSM2021 classes need to be modules. Not the full college-style course with 50 instructor-facing calendar hours. But rather 4 modules of 12 hours each.
#JSM2021 my experience so far is that academia is generally unable to deliver it. I have been trying to upskill folks at the company, I was education officer of the @srmsasa trying to upskill that whole segment of profession... so I think I know just a bit more than a little bit.
#JSM2021 I am more likely to get a Python for data analysis course from a political scientist who figured it out for their tasks that are similar to mine at the annual #AAPOR conference than from a computer scientist at an average university.
#JSM2021 the system of sticks and carrots in academia just... isn't aligned. And that's why we have had a shortage of survey stats/methods professionals for the past couple of decades to begin with -- but I digress.
#JSM2021 I am aware of some programs that are explicitly aimed at patching -- @IPSDS1 led by @fraukolos is one of these, and I would have very much liked Frauke to have been in that session and chime in.
#JSM2021 I am being asked from time to time "which Python for data science certificate I should take", and frankly I can't say much -- they all look alike from where I sit...
#JSM2021 ... "take the one at the local school so that somebody knowledgeable could tap something on your keyboard to find that missing package" I guess?
#JSM2021 going back to the random points, in line with what @econonprof said about human capital -- (4) you very likely have a highly productive capital in something you currently do well at your job.
#JSM2021 Ditching this for ground zero #ds certification with no specialization will likely set you off, salary-wise, and the prospects in moving up in this (over)saturated entry data science jobs market are unclear.
#JSM2021 the best examples, as far as I can tell, are based on retaining your best, highly productive substantive skills that are not grounded in SAS or R or Excel but in your knowledge of how a particular subfield of medicine, public health, transportation, government, etc. ...
#JSM2021 ... how that sector operates, what the standards (and informal norms) are, what common standard databases exist in that world; and adding your #datascience things on top / as a productivity-boosting supplement.
#JSM2021 you've been producing 158 SAS cross-tabs to stare it? Great! Now let's make a shiny dashboard with the drop-down menus for row variables and column variables and subpopulations so that you don't have to scroll up and down.
#JSM2021 you've been producing 158 SAS cross-tabs to stare it and see if you have zeroes in some offending cells? Great! Now let's just stopifnot( min( cell.n ) > 0 ) and only look at what happens when it breaks.
#JSM2021 an exceptionally rare case of ACTUAL out of sample prediction in #MachineLearning#ML#AI: two rounds of the same health data collection by @CDCgov
@CDCgov Yulei He @cdcgov#JSM2021 RANDS 1 (fall 2015) + 2 (spring 2016): Build models on RANDS1 and compare predictions for RANDS2
#JSM2021 Yulei He R-square about 30%; random forests and grad boosting reduce the prediction error by about 4%, shrinking towards the mean; standard errors are way to small (-50% than should be)
1. when will the survey statisticians in the U.S. move from weird variance estimation methods (grouped jackknife) to simple and straightforward (bootstrap)
and
2. when will they move from weird imputation methods with limited dimensionality and limited ability to assess the implicit model fit (hotdeck) to those where you explicitly model and understand which variables matter for this particular outcome (ICE)?
Oh and somebody reminded me of
3. when will we move from PROC STEPWISE to lasso as the rest of statistics world has
#JSM2021@jameswagner254 Using Machine Learning and Statistical Models to Predict Survey Costs -- presentation on the several attempts to integrate cost models into responsive design systems
#JSM2021@jameswagner254 Responsive designs operate on indicators of errors and costs. Error indicators: R-indicator, balance indicators, FMI, sensitivity to ignorability assumptions (@bradytwest@Rodjlittle Andridge papers).
Some decisions are made at the sample level (launch new replicate, switch to a new phase of the FU protocol), others at case level (change incentive amount, change mode)
Now let's see how @olson_km is going to live tweet while giving her own #JSM2021 talk
@olson_km#JSM2021@olson_km Decisions in survey design: questions of survey errors and questions of survey costs. Cost studies are hard: difficult to offer experimental variation of design features, with a possible exception of incentives. Observational examinations are more typical.
#JSM2021@olson_km When you have one (repeated) survey at a time, you can better study the impacts of variable design features (but can't provide the basis for the features that do not vary.)
#JSM2021 virtual vs. in-person: IMO there are exactly two activities at an average JSM that dictate in-person presence: cheering at the award ceremonies and browsing the new books. Confidential coffee (job search, editorial boards) can be done with burner phones.
Committee meetings should be /must be zoom calls; nobody is going back to in-person on that one. Having the presentations/files in advance/right after the event is the level of awesomeness not ever achieved by the conferences of the yester year.
Found yourself in a session that’s a poor match? Just click “All agenda” and find something else.
Responses indicate that even statistical professionals have zero clue as to what it takes to have a survey of 1000 randomly selected Americans every week. Proposals to have 50,000 every week would put the sample sizes on par with American Community Survey ($250M / year).
1. The sample size: The rate of new cases in the U.S. right now is about 20 new cases per day per 100K. Thus a sample of n=1000 would capture cases at the Poisson rate of (20 cases / 100 K pop * 7 days * 1000 in sample) = 14. The prediction interval around that is...