#JSM2021 panel led by @minebocek on upskilling for a statistician -- how to learn??
@minebocek #JSM2021 @hglanz no shortage of stuff to learn. First identify what you don't know -- that comes from modern media (blogs, twitter, podcasts; groups, communities -- @RLadiesGlobal or local chapters; professional organizations -- @amstatnews ).
@minebocek @hglanz @RLadiesGlobal @AmstatNews #JSM2021 @hglanz What do the job postings require these days? (This is how the content for the @CalPoly stat/data science program was developed.)
#JSM2021 @hglanz textbooks you can take to the beach (SK -- hate those coastal elites with their... beach... stuff) -- the free online books (Intro to Stat Learning on Hastie's webpage; #r4ds by @hadleywickham @StatGarrett; etc.)
#JSM2021 @hglanz see what people do for #TidyTuesday, @Kaggle competitions.
#JSM2021 @hglanz use case: how many terms do you recognize in this retweet? what is the context? what modeling do you already know? what goes into data prep? what is an ODBC connections? what needs to be permitted?
#JSM2021 @hglanz ask for resources / links to resources teachdatascience.com
#JSM2021 that was a joint blog project @hglanz with @askdrstats @jo_hardin47
#JSM2021 I'll have to subtweet Chris Malone next as I don't know his handle
#JSM2021 @chris_j_malone the program at Winona State had to evolve into two separate #DS vs Statistics programs. Chris shows how the definitions of both evolved on @wikipedia over time.
#JSM2021 @chris_j_malone for as much as it might hurt here, the most common data science tool is not #rstats. It would be some mix of #python, #scala, #julia. Sorry guys.
#JSM2021 @chris_j_malone if 80% of your time is data prep, then undergraduates should spend 80% of their class time to learn that! They will probably do Excel and data cleaning and vizualizing the data sets before they "graduate" to #randomforest.
#JSM2021 @chris_j_malone ~4 weeks of teaching worksheets is absolutely enough, students are about done by then. #DS is interdisciplinary => put #DS students in other discipline classes (up to maybe 1/3 credits), have them team-tag with other majors
#JSM2021 @chris_j_malone speaks Yoda: "Try not learn data science, do data science"
@Chris_J_Malone #JSM2021 @chris_j_malone teach, learn and think in technology agnostic way. #DS does not start and does not end with #rstats or #python or #tableau or #sql. You need to understand the principles -- how you clean the data and why you do that.
@Chris_J_Malone #JSM2021 @chris_j_malone in #DS the outcomes are data products, not reports. Teach how to produce data products and how to communicate them to stakeholders.
#JSM2021 @DebAtStat Strategies for Staying Current docs.google.com/presentation/d…
#JSM2021 @DebAtStat common data science tools: computational statistics or statistical computing? the former is linear algebra, optimization, numerical issues; the latter, data structures, packages, regex, objects...
#JSM2021 @DebAtStat Advice 1: take the time to learn computing well. What are the paradigms? What are the data structures? What are the code structures?
#JSM2021 @DebAtStat Advice 2: learn how to learn new technologies (SK: my understanding is that proper developers learn an entirely new framework / programming language every 6 to 24 months; we have to copy that, too)
@DebAtStat #JSM2021 @DebAtStat Advice 3: find friends, find partners, work on a project together, maybe start small.

Advice 4: start with a small case study
@DebAtStat #JSM2021 @DebAtStat data science major at @BerkeleyDataSci data.berkeley.edu/academics/data… first math, then computing, then ethics, then lots of domain emphasis in the senior year; c.f. statistics that is more math, more stat, and ethics missing
@DebAtStat @BerkeleyDataSci #JSM2021 @DebAtStat for statistics UG major, computing stops at "program structures" course, does not go into "data structures" nor "development"
@DebAtStat @BerkeleyDataSci #JSM2021 personal note now: I think overall the session is missing the whole point of "upskilling" and "keeping current". These are the issues for early/mid-career statisticians. The talks so far, except @hglanz, are about undergraduate programs.
#JSM2021 personal note ctd: I am not going to go back and enroll in an undergraduate program, that makes zero sense. I need to patch the holes in my knowledge of computing, and what I know on stat methods and data cleaning exceeds the UG programs by two orders of magnitude.
#JSM2021 Joan Combs Durso @econoprof it's not just statisticians, everybody else has to upskill in the age of data science #DS
#JSM2021 @econoprof thinking as an economist: our human/intellectual capital depreciates over time... but there are also catastrophic losses -- platforms change? budget cuts? software updates?
@econoprof #JSM2021 @econoprof Adult learning -- no all-nighters to learn the new #rstats package; hands-on learning; connect the new material to what you already know (and we know a lot), performance deadline pressure and technophobia; working with others
#JSM2021 @econoprof No More Feedback boo by Carol Sanford (link please?) -- personal development plan, not what HR tells you to do -- start from your essence, assume intrinsic motivation, you drive it, you audit it.
#JSM2021 @econoprof start with a recipe -- something you are familiar with -- and adapt it; take an old project and reproduce it with a new software, make it reproducible, get someone else to test it.
#JSM2021 @econoprof find your gang -- @RLadiesGlobal, #meetup #DS groups, accountability partner.
@econoprof @RLadiesGlobal #JSM2021 @econoprof contribute your beginning mindset -- ask stupid questions about features of a package, become a usability tester, contribute edits to docs. Write about the journey you are going through! That way, you will have your thoughts organized, and you will help others
#JSM2021 @econoprof you can learn statistics and #DS in the shower -- listen to podcasts... her favorites are Data Skeptic or Pod of Asclepius or Stats + Stories or Not so standard deviations (links maybe?)
@econoprof #JSM2021 @DebAtStat calls for using "non-standard" data sets in undergraduate classes, from the local air quality monitoring network data to federal websites data.gov (yay @NCHStats @BLS_gov @uscensusbureau #ACS_data)
@econoprof @DebAtStat @NCHStats @BLS_gov @uscensusbureau #JSM2021 @DebAtStat you also need to deal way more with visualization - @chris_j_malone seconds and mentions that on a lot of government websites, the built-in visualizations are accompanied by the data, and you can redo the plot in the software of your choice the way you like it
@econoprof @DebAtStat @NCHStats @BLS_gov @uscensusbureau @Chris_J_Malone #JSM2021 @minebocek lobs a question on predictive modeling vs. "traditional inference" -- is that a #datascience #DS issue, or is that still a part of statistics curriculum?
#JSM2021 @DebAtStat it's been an eye-opener to co-teach with computer scientists -- "just throw everything in and regularize". Of course we need to do both.
#JSM2021 I still did not find Lisa Kay, the session organizer, on twitter to tag. Please tag her somebody @minebocek @chris_j_malone @hglanz @econoprof @DebAtStat
So I will continue this thread as I think this #JSM2021 session was a partially missed opportunity. The session had two presentations about the existing undergraduate #datascience or #DS-statistics programs.
#JSM2021 That's great to the extent that building programs is difficult. I tried building an interdisciplinary program when I was a tenure track assistant prof, and I was all but laughed at. Kudos to the people who did put it together.
However I read the title of this #JSM2021 session as "what statisticians with their terminal graduate degrees should do to keep their chops up-to-date", and answers to that were partial. Not all of the tips and tricks apply to us mid career in 30s and 40s.
#JSM2021 in no particular order: (1) Time: we don't have extra time on our hands. It's work, often more than 40 hours. It's family -- often the kids that you really want to be with on the weekend, and whom you need to taxicab to the afterschool activities.
#JSM2021 still in no order (2) projects on GitHub. Yes, I have 15 ongoing #datascience projects. No, I can't share them with you, dear hiring manager, because they are on client's data, and my employer owns the code.
#JSM2021 (3) classes to take -- no, I don't need the basic data cleaning class, I've been doing this crap artisanally for 20 years. No, I don't need the general Python introduction where they write web apps. No, I don't need Julia to solve astrophysics diffeqs.
#JSM2021 what I need to learn on the software development and version control is how to work on a project with 20 analysts where the procedural code sits with the data on a protected server, rather than the code being developed on local machines and gets compiled to an .exe file.
#JSM2021 upskilling is patching. You took LSI from CR Rao's book -- now you need to learn just a tad more about lasso, you don't need all the Gauss-Markov theory all over again. You learned BUGS in grad school -- now you need to learn HMC and why divergence may happen with it.
#JSM2021 classes need to be modules. Not the full college-style course with 50 instructor-facing calendar hours. But rather 4 modules of 12 hours each.
#JSM2021 my experience so far is that academia is generally unable to deliver it. I have been trying to upskill folks at the company, I was education officer of the @srmsasa trying to upskill that whole segment of profession... so I think I know just a bit more than a little bit.
#JSM2021 I am more likely to get a Python for data analysis course from a political scientist who figured it out for their tasks that are similar to mine at the annual #AAPOR conference than from a computer scientist at an average university.
#JSM2021 the system of sticks and carrots in academia just... isn't aligned. And that's why we have had a shortage of survey stats/methods professionals for the past couple of decades to begin with -- but I digress.
#JSM2021 I am aware of some programs that are explicitly aimed at patching -- @IPSDS1 led by @fraukolos is one of these, and I would have very much liked Frauke to have been in that session and chime in.
#JSM2021 I am being asked from time to time "which Python for data science certificate I should take", and frankly I can't say much -- they all look alike from where I sit...
#JSM2021 ... "take the one at the local school so that somebody knowledgeable could tap something on your keyboard to find that missing package" I guess?
#JSM2021 going back to the random points, in line with what @econonprof said about human capital -- (4) you very likely have a highly productive capital in something you currently do well at your job.
#JSM2021 Ditching this for ground zero #ds certification with no specialization will likely set you off, salary-wise, and the prospects in moving up in this (over)saturated entry data science jobs market are unclear.
#JSM2021 the best examples, as far as I can tell, are based on retaining your best, highly productive substantive skills that are not grounded in SAS or R or Excel but in your knowledge of how a particular subfield of medicine, public health, transportation, government, etc. ...
#JSM2021 ... how that sector operates, what the standards (and informal norms) are, what common standard databases exist in that world; and adding your #datascience things on top / as a productivity-boosting supplement.
#JSM2021 you've been producing 158 SAS cross-tabs to stare it? Great! Now let's make a shiny dashboard with the drop-down menus for row variables and column variables and subpopulations so that you don't have to scroll up and down.
#JSM2021 you've been producing 158 SAS cross-tabs to stare it and see if you have zeroes in some offending cells? Great! Now let's just stopifnot( min( cell.n ) > 0 ) and only look at what happens when it breaks.
#JSM2021 and so on.

I am not sure where my train of thought is going at this point so I will tag back the session participants @minebocek @hglanz @chris_j_malone @DebAtStat @econoprof to see if there are any reactions... and @fraukolos please say something, too :)
@threadreaderapp unroll or compile, whatever works for you

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Stas Kolenikov at #JSM2021

Stas Kolenikov at #JSM2021 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @StatStas

12 Aug
#JSM2021 an exceptionally rare case of ACTUAL out of sample prediction in #MachineLearning #ML #AI: two rounds of the same health data collection by @CDCgov
@CDCgov Yulei He @cdcgov #JSM2021 RANDS 1 (fall 2015) + 2 (spring 2016): Build models on RANDS1 and compare predictions for RANDS2

ridge, lasso, elastic net, PLS, KNN, bagging, RF, GBM, XGBoost, SVM, deep learning
#JSM2021 Yulei He R-square about 30%; random forests and grad boosting reduce the prediction error by about 4%, shrinking towards the mean; standard errors are way to small (-50% than should be)
Read 4 tweets
11 Aug
I have two general questions:

1. when will the survey statisticians in the U.S. move from weird variance estimation methods (grouped jackknife) to simple and straightforward (bootstrap)
and

2. when will they move from weird imputation methods with limited dimensionality and limited ability to assess the implicit model fit (hotdeck) to those where you explicitly model and understand which variables matter for this particular outcome (ICE)?
Oh and somebody reminded me of

3. when will we move from PROC STEPWISE to lasso as the rest of statistics world has
Read 4 tweets
10 Aug
#JSM2021 @jameswagner254 Using Machine Learning and Statistical Models to Predict Survey Costs -- presentation on the several attempts to integrate cost models into responsive design systems
#JSM2021 @jameswagner254 Responsive designs operate on indicators of errors and costs. Error indicators: R-indicator, balance indicators, FMI, sensitivity to ignorability assumptions (@bradytwest @Rodjlittle Andridge papers).
@jameswagner254 #JSM2021 @jameswagner254 Cost indicators? more difficult; proxies: # of attempts (Groves & Heeringa 2006)

Some decisions are made at the sample level (launch new replicate, switch to a new phase of the FU protocol), others at case level (change incentive amount, change mode)
Read 6 tweets
10 Aug
Now let's see how @olson_km is going to live tweet while giving her own #JSM2021 talk
@olson_km #JSM2021 @olson_km Decisions in survey design: questions of survey errors and questions of survey costs. Cost studies are hard: difficult to offer experimental variation of design features, with a possible exception of incentives. Observational examinations are more typical.
#JSM2021 @olson_km When you have one (repeated) survey at a time, you can better study the impacts of variable design features (but can't provide the basis for the features that do not vary.)
Read 12 tweets
10 Aug
#JSM2021 virtual vs. in-person: IMO there are exactly two activities at an average JSM that dictate in-person presence: cheering at the award ceremonies and browsing the new books. Confidential coffee (job search, editorial boards) can be done with burner phones.
Committee meetings should be /must be zoom calls; nobody is going back to in-person on that one. Having the presentations/files in advance/right after the event is the level of awesomeness not ever achieved by the conferences of the yester year.
Found yourself in a session that’s a poor match? Just click “All agenda” and find something else.
Read 4 tweets
26 Feb
Responses indicate that even statistical professionals have zero clue as to what it takes to have a survey of 1000 randomly selected Americans every week. Proposals to have 50,000 every week would put the sample sizes on par with American Community Survey ($250M / year).
I'll expand on this a little bit.
1. The sample size: The rate of new cases in the U.S. right now is about 20 new cases per day per 100K. Thus a sample of n=1000 would capture cases at the Poisson rate of (20 cases / 100 K pop * 7 days * 1000 in sample) = 14. The prediction interval around that is...
Read 35 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(