Since various SW benchmarks are going around today... A short thread on why I use #rstats.
Put simply, it offers by far the fastest & most efficient tools for the work I do (i.e. mostly data wrangling & applied econometrics).
(Disclaimer: This thread is *not* tying to get you to change from your preferred SW. You should use whatever you feel comfortable with. But I will try to highlight some objective facts that matter to me.)
(The tidyverse obviously provides another extremely rich data wrangling framework in R & comes w/ its own set of awesome features: SQL, Spark, Arrow etc. integration.)
Bottom line: even if I grant you gtools (which you should install), an MP license ($$), and constrain the no. of cores that R uses, R is consistently faster.
For fixed-effect regressions, {fixest} is insanely quick... as much as a 100x faster than lfe and reghdfe (both great packages in their own right). github.com/lrberge/fixest/
And... there’s more! It also supports non-linear models (logit, etc.)
Or, maybe you’re interested in LASSO. To the best of my knowledge, the {biglasso} package is easily the fastest and most memory efficient implementation. github.com/YaohuiZeng/big…
A quasi-related issue is code concision/syntax. This is veering off the “objective” path (I don’t have detailed stats) but I can only smile at claims that R requires more lines of code than, say, Stata. The opposite is almost always true IME.
Fwiw, compare the following bits of code. This is literally the most recent bit of Stata code that I rewrote in R.
Again, though: concision isn’t necessarily a goal unto itself. Good code is code that you (and your collaborators) find easy to write and understand. There’s nothing wrong with writing more verbose code that achieves these goals. Code shaming is despicable IMO.
In summary, I use #rstats because it offers the best tools for *my* needs. The awesome community and zero price tag don’t hurt either ;-)
Your needs and tolerance to learn a new SW language may differ. But you should know that performance loss is *not* a reason to avoid it. /fin
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Here I'm quickly creating a local repo with a PDF and HTML file. I'm using my lecturenotes Rmd template, but that's unimportant. The main thing is that I have some files that look good locally, but now I want to share on GH.
So, I push them to GitHub and.. ughh.
The PDF version is okay (though I can't easily print or resize like I would if it was rendered in my browser).
But GitHub won't even let me look at the HTML, let alone render it. Minging.
I'm teaching a "data science for economists" course this semester.
If you're interested in learning more about #rstats, Git(Hub), programming, databases, cloud computation, ML, etc., I'll be making all of my course material publicly available here: github.com/uo-ec607
As I say in the syllabus, this course basically covers all of the things I wish I'd been taught in grad school. At the same time, I've benefited immensely from so many people making their teaching materials (and software!) publicly available. This is me trying to pay it forward.