John Myles White Profile picture
Oct 28, 2018 13 tweets 3 min read Read on X
This post is pretty bizarre, but it manages to hit on so many false beliefs that I've seen hurt junior data scientists that it deserves some explicit corrections:
nanx.me/blog/post/why-…
(1) The notion that R is well-suited to "building web applications" seems totally out of left field. I don't feel like most R loyalists think this is a good idea, but it's worth calling out that no normal company will be glad you wrote your entire web app in R.
(2) It is true that Python had some issues historically with the 2-to-3 transition, but it's not such a big deal these days. On the flip side, I have found interesting R code that doesn't run in modern R interpreters because of changes in core operations (e.g. assignment syntax).
(3) "Most of the time we only need a latest, working interpreter with the latest packages to run the code" -- this is where things get real and reveal some things that hurt data scientists. If this sentence is true, it's likely because you don't share code with coworkers.
(3) Really is a broader issue in data science: people only think of what they need to do their work if no one else existed and code was never maintained. Junior data scientists almost always operate on projects they start from scratch and don't have to maintain for long.
(3) Especially astonishing is this claim, "The version incompatibility and package management issues would almost surely create technical, even political problems within large organizations." In reality, updating packages unnecessarily can itself be a source of problems.
(4) "To do this in R, we merely need to do b = a". The idea that assignment is intrinsically a copying operation seems to have just been made up. Making lots of copies is one of the things that slows R down and all R loyalists seem to admit this. Copying != purity.
(5) "as a functional programming language": Some folks keep claiming that R is a functional language, but they never define the term well. R is not pure by default. R code is riddled with mutations to the symbol table; library(foo) has to emit warnings for exactly that reason.
(6) "Eventually, such functional designs save human time — the more significant bottleneck in the long run." This belief is extremely common among R users and it really holds them back in situations in which performance does matter. Large projects often demand high performance.
(7) "In fact, the abstraction of vector, matrix, data frame, and list is brilliant." This belief really holds R users back when talking with engineers about implementations. At some point, everyone needs to learn what a hash table is, but its absence from base R confuses folks.
(8) "Beyond that, I also love the vector-oriented design and thinking in R. Everything is a vector:" This belief also seems common in the R community, even though the creator of R has said it's the biggest mistake they made. Scalars are always good and sometimes essential.
(9) If the most important of an IDE is an object inspector, maybe "No decent IDEs, ever" is true, but I think this is another case where the author has just never interacted with software engineers or understood their needs.
Putting it all together, there's a very troubling (and self-defeating) tendency in the data science world to embrace insularity and refuse to learn about the things software engineers know. Both communities have important forms of expertise; more sharing is the way forward.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with John Myles White

John Myles White Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @johnmyleswhite

Jan 5, 2020
There's a fascinating thread about centralized efforts on the Julia Discourse that highlights some of the deep contradictions at the heart of all the modern OSS communities I've worked with: discourse.julialang.org/t/how-can-we-c…
Many of these contradictions start with the puzzling fact that open source communities, despite their intensely and aggressively egalitarian rhetoric, exhibit extreme inequalities in the contributions being made and/or being welcomed by self-reported members of the community.
Most language communities have a sharp divide between language users and language developers. The divide is even sharper if you insist that "language developer" means "makes one or more contributions to the core language repo" rather than authoring packages or libraries.
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(