Tweet

Stas Kolenikov at #JSM2021

12 Aug, 64 tweets, 47 min read

@minebocek

#JSM2021 panel led by @minebocek on upskilling for a statistician -- how to learn??

@minebocek

@minebocek #JSM2021 @hglanz no shortage of stuff to learn. First identify what you don't know -- that comes from modern media (blogs, twitter, podcasts; groups, communities -- @RLadiesGlobal or local chapters; professional organizations -- @amstatnews ).

@minebocek

@minebocek @hglanz @RLadiesGlobal @AmstatNews #JSM2021 @hglanz What do the job postings require these days? (This is how the content for the @CalPoly stat/data science program was developed.)

@hglanz

#JSM2021 @hglanz textbooks you can take to the beach (SK -- hate those coastal elites with their... beach... stuff) -- the free online books (Intro to Stat Learning on Hastie's webpage; #r4ds by @hadleywickham @StatGarrett; etc.)

@hglanz

#JSM2021 @hglanz see what people do for #TidyTuesday, @Kaggle competitions.

@hglanz

#JSM2021 @hglanz use case: how many terms do you recognize in this retweet? what is the context? what modeling do you already know? what goes into data prep? what is an ODBC connections? what needs to be permitted?

https://twitter.com/PhDemetri/status/1421099552338239496?s=20

@hglanz

#JSM2021 @hglanz ask for resources / links to resources teachdatascience.com

@hglanz

#JSM2021 that was a joint blog project @hglanz with @askdrstats @jo_hardin47

#JSM2021 I'll have to subtweet Chris Malone next as I don't know his handle

@chris_j_malone

#JSM2021 @chris_j_malone the program at Winona State had to evolve into two separate #DS vs Statistics programs. Chris shows how the definitions of both evolved on @wikipedia over time.

@chris_j_malone

#JSM2021 @chris_j_malone for as much as it might hurt here, the most common data science tool is not #rstats. It would be some mix of #python, #scala, #julia. Sorry guys.

@chris_j_malone

#JSM2021 @chris_j_malone if 80% of your time is data prep, then undergraduates should spend 80% of their class time to learn that! They will probably do Excel and data cleaning and vizualizing the data sets before they "graduate" to #randomforest.

@chris_j_malone

#JSM2021 @chris_j_malone ~4 weeks of teaching worksheets is absolutely enough, students are about done by then. #DS is interdisciplinary => put #DS students in other discipline classes (up to maybe 1/3 credits), have them team-tag with other majors

@chris_j_malone

#JSM2021 @chris_j_malone speaks Yoda: "Try not learn data science, do data science"

@Chris_J_Malone

@Chris_J_Malone #JSM2021 @chris_j_malone teach, learn and think in technology agnostic way. #DS does not start and does not end with #rstats or #python or #tableau or #sql. You need to understand the principles -- how you clean the data and why you do that.

@Chris_J_Malone

@Chris_J_Malone #JSM2021 @chris_j_malone in #DS the outcomes are data products, not reports. Teach how to produce data products and how to communicate them to stakeholders.

@DebAtStat

#JSM2021 @DebAtStat Strategies for Staying Current docs.google.com/presentation/d…

@DebAtStat

#JSM2021 @DebAtStat common data science tools: computational statistics or statistical computing? the former is linear algebra, optimization, numerical issues; the latter, data structures, packages, regex, objects...

@DebAtStat

#JSM2021 @DebAtStat Advice 1: take the time to learn computing well. What are the paradigms? What are the data structures? What are the code structures?

@DebAtStat

#JSM2021 @DebAtStat Advice 2: learn how to learn new technologies (SK: my understanding is that proper developers learn an entirely new framework / programming language every 6 to 24 months; we have to copy that, too)

@DebAtStat

@DebAtStat #JSM2021 @DebAtStat Advice 3: find friends, find partners, work on a project together, maybe start small.

Advice 4: start with a small case study

@DebAtStat

@DebAtStat #JSM2021 @DebAtStat data science major at @BerkeleyDataSci data.berkeley.edu/academics/data… first math, then computing, then ethics, then lots of domain emphasis in the senior year; c.f. statistics that is more math, more stat, and ethics missing

@DebAtStat

@DebAtStat @BerkeleyDataSci #JSM2021 @DebAtStat for statistics UG major, computing stops at "program structures" course, does not go into "data structures" nor "development"

@DebAtStat

@DebAtStat @BerkeleyDataSci #JSM2021 personal note now: I think overall the session is missing the whole point of "upskilling" and "keeping current". These are the issues for early/mid-career statisticians. The talks so far, except @hglanz, are about undergraduate programs.

#JSM2021 personal note ctd: I am not going to go back and enroll in an undergraduate program, that makes zero sense. I need to patch the holes in my knowledge of computing, and what I know on stat methods and data cleaning exceeds the UG programs by two orders of magnitude.

@econoprof

#JSM2021 Joan Combs Durso @econoprof it's not just statisticians, everybody else has to upskill in the age of data science #DS

@econoprof

#JSM2021 @econoprof thinking as an economist: our human/intellectual capital depreciates over time... but there are also catastrophic losses -- platforms change? budget cuts? software updates?

@econoprof

@econoprof #JSM2021 @econoprof Adult learning -- no all-nighters to learn the new #rstats package; hands-on learning; connect the new material to what you already know (and we know a lot), performance deadline pressure and technophobia; working with others

@econoprof

@econoprof #JSM2021 @econoprof useful software for economists: maxbruche.net/useful_softwar….

@econoprof

#JSM2021 @econoprof No More Feedback boo by Carol Sanford (link please?) -- personal development plan, not what HR tells you to do -- start from your essence, assume intrinsic motivation, you drive it, you audit it.

@econoprof

#JSM2021 @econoprof start with a recipe -- something you are familiar with -- and adapt it; take an old project and reproduce it with a new software, make it reproducible, get someone else to test it.

@econoprof

#JSM2021 @econoprof find your gang -- @RLadiesGlobal, #meetup #DS groups, accountability partner.

@econoprof

@econoprof @RLadiesGlobal #JSM2021 @econoprof contribute your beginning mindset -- ask stupid questions about features of a package, become a usability tester, contribute edits to docs. Write about the journey you are going through! That way, you will have your thoughts organized, and you will help others

@econoprof

#JSM2021 @econoprof you can learn statistics and #DS in the shower -- listen to podcasts... her favorites are Data Skeptic or Pod of Asclepius or Stats + Stories or Not so standard deviations (links maybe?)

@econoprof

@econoprof #JSM2021 @DebAtStat calls for using "non-standard" data sets in undergraduate classes, from the local air quality monitoring network data to federal websites data.gov (yay @NCHStats @BLS_gov @uscensusbureau #ACS_data)

@econoprof

@econoprof @DebAtStat @NCHStats @BLS_gov @uscensusbureau #JSM2021 @DebAtStat you also need to deal way more with visualization - @chris_j_malone seconds and mentions that on a lot of government websites, the built-in visualizations are accompanied by the data, and you can redo the plot in the software of your choice the way you like it

@econoprof

@econoprof @DebAtStat @NCHStats @BLS_gov @uscensusbureau @Chris_J_Malone #JSM2021 @minebocek lobs a question on predictive modeling vs. "traditional inference" -- is that a #datascience #DS issue, or is that still a part of statistics curriculum?

@DebAtStat

#JSM2021 @DebAtStat it's been an eye-opener to co-teach with computer scientists -- "just throw everything in and regularize". Of course we need to do both.

@minebocek

#JSM2021 I still did not find Lisa Kay, the session organizer, on twitter to tag. Please tag her somebody @minebocek @chris_j_malone @hglanz @econoprof @DebAtStat

@threadreaderapp

@threadreaderapp unroll

So I will continue this thread as I think this #JSM2021 session was a partially missed opportunity. The session had two presentations about the existing undergraduate #datascience or #DS-statistics programs.

#JSM2021 That's great to the extent that building programs is difficult. I tried building an interdisciplinary program when I was a tenure track assistant prof, and I was all but laughed at. Kudos to the people who did put it together.

However I read the title of this #JSM2021 session as "what statisticians with their terminal graduate degrees should do to keep their chops up-to-date", and answers to that were partial. Not all of the tips and tricks apply to us mid career in 30s and 40s.

#JSM2021 in no particular order: (1) Time: we don't have extra time on our hands. It's work, often more than 40 hours. It's family -- often the kids that you really want to be with on the weekend, and whom you need to taxicab to the afterschool activities.

#JSM2021 still in no order (2) projects on GitHub. Yes, I have 15 ongoing #datascience projects. No, I can't share them with you, dear hiring manager, because they are on client's data, and my employer owns the code.

#JSM2021 (3) classes to take -- no, I don't need the basic data cleaning class, I've been doing this crap artisanally for 20 years. No, I don't need the general Python introduction where they write web apps. No, I don't need Julia to solve astrophysics diffeqs.

#JSM2021 what I need to learn on the software development and version control is how to work on a project with 20 analysts where the procedural code sits with the data on a protected server, rather than the code being developed on local machines and gets compiled to an .exe file.

#JSM2021 upskilling is patching. You took LSI from CR Rao's book -- now you need to learn just a tad more about lasso, you don't need all the Gauss-Markov theory all over again. You learned BUGS in grad school -- now you need to learn HMC and why divergence may happen with it.

#JSM2021 classes need to be modules. Not the full college-style course with 50 instructor-facing calendar hours. But rather 4 modules of 12 hours each.

@srmsasa

#JSM2021 my experience so far is that academia is generally unable to deliver it. I have been trying to upskill folks at the company, I was education officer of the @srmsasa trying to upskill that whole segment of profession... so I think I know just a bit more than a little bit.

#JSM2021 I am more likely to get a Python for data analysis course from a political scientist who figured it out for their tasks that are similar to mine at the annual #AAPOR conference than from a computer scientist at an average university.

#JSM2021 the system of sticks and carrots in academia just... isn't aligned. And that's why we have had a shortage of survey stats/methods professionals for the past couple of decades to begin with -- but I digress.

@IPSDS1

#JSM2021 I am aware of some programs that are explicitly aimed at patching -- @IPSDS1 led by @fraukolos is one of these, and I would have very much liked Frauke to have been in that session and chime in.

#JSM2021 I am being asked from time to time "which Python for data science certificate I should take", and frankly I can't say much -- they all look alike from where I sit...

#JSM2021 ... "take the one at the local school so that somebody knowledgeable could tap something on your keyboard to find that missing package" I guess?

#JSM2021 going back to the random points, in line with what @econonprof said about human capital -- (4) you very likely have a highly productive capital in something you currently do well at your job.

#JSM2021 Ditching this for ground zero #ds certification with no specialization will likely set you off, salary-wise, and the prospects in moving up in this (over)saturated entry data science jobs market are unclear.

#JSM2021 the best examples, as far as I can tell, are based on retaining your best, highly productive substantive skills that are not grounded in SAS or R or Excel but in your knowledge of how a particular subfield of medicine, public health, transportation, government, etc. ...

#JSM2021 ... how that sector operates, what the standards (and informal norms) are, what common standard databases exist in that world; and adding your #datascience things on top / as a productivity-boosting supplement.

#JSM2021 you've been producing 158 SAS cross-tabs to stare it? Great! Now let's make a shiny dashboard with the drop-down menus for row variables and column variables and subpopulations so that you don't have to scroll up and down.

#JSM2021 you've been producing 158 SAS cross-tabs to stare it and see if you have zeroes in some offending cells? Great! Now let's just stopifnot( min( cell.n ) > 0 ) and only look at what happens when it breaks.

@minebocek

#JSM2021 and so on.

I am not sure where my train of thought is going at this point so I will tag back the session participants @minebocek @hglanz @chris_j_malone @DebAtStat @econoprof to see if there are any reactions... and @fraukolos please say something, too :)

@threadreaderapp

@threadreaderapp unroll or compile, whatever works for you

https://twitter.com/lisalendway/status/1425804289788760066

P.S.

https://twitter.com/lisalendway/status/1425804289788760066

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Stas Kolenikov at #JSM2021

Try unrolling a thread yourself!

More from @StatStas

Stas Kolenikov at #JSM2021

Stas Kolenikov at #JSM2021

Stas Kolenikov at #JSM2021

Stas Kolenikov at #JSM2021

Stas Kolenikov at #JSM2021

Stas Kolenikov at #JSM2021

Did Thread Reader help you today?

Like this author's thread?