Discover and read the best of Twitter Threads about #rstats

Most recents (24)


I’m feeling festive! I want to spread the cheer this holiday season by giving the #GiftOfCode for making beautiful figures! #EpiTwitter #StatsTwitter

When I was starting out, I remember how it would take HOURS to get my figures looking just right.

Nothing was more valuable than example code for learning different #DataViz options 🎁
First up:

These options in #stata for making this graph presentation-ready!

adjustrcspline, link(logit) scheme(vg_palec) ///
ylabel(, nogrid angle(horizontal)) ///
xtitle("X (Units)", size(medium)) ytitle("Outcome Risk (%)", size(medium)) ///
Read 7 tweets
I wonder how many 1000s of hours of some of the brightest minds in science are being wasted creating cool #rstats packages that only 10 people will ever use to shave 5 minutes off the time taken to solve a problem and which will stop working with dependency issues 2 years later.
I don't follow all that many people and yet every other day there's a new cool package in my timeline that only a few people will ever have time to learn. But someone will use it, and in a couple of years time someone else will want to run their code, and it will choke.
The standard answer to this seems to be "But it's open source, so you can fix the package when it breaks". Great. So now anyone who wants to use code written by A has to be able to fix B's package, of arbitrary complexity.
Read 4 tweets
For R users wanting to understand how to write their own R functions:

A function is just like a receipe for baking a pie.

Read the thread below for more mouth watering details. #rstats
Always start writing a function by writing this:

myfun <- function(){


This is the function 'skeleton'.

The function name, myfun, is R's name for the receipe.

The receipe ingredients will go inside function().

The receipe steps will go inside {}, along with end product.
Let's say we'll bake a pie which requires a single ingredient. The receipe is:

myfun <- function(x){

res <- x*2



When we call the function:

mypie <- myfun(5)

R puts the pie in the oven and returns the end result (x*2 = 5*2=10) .
Read 10 tweets
Soon in all your Shiny Apps:
#RStats #RShiny
Ever wanted an 8-bit dataTable?
Read 4 tweets
Not the most glaring issue with #JurassicPark, but this plot from Michael Crichton's original 1990 novel looks pretty darn weird. /1
In context: This is the height distribution of the Procompsognathid dinosaurs ("compys") in the park. The distribution looks Gaussian (bell curved), and we would expect the height distribution to look approximately Gaussian if the dinosaurs were a natural population. /2
However, the dinosaurs in Jurassic Park weren't supposed to be a "natural population" --- their breeding was controlled. So this bell curve was used as a clue that the dinosaurs were mating. /3
Read 19 tweets
This semester, i've been TAing an Election Data Science class at @UF taught by @ElectProject. Today is the first part of final project presentations!! I'll update this thread with the cool stuff done by @UFPoliSci students
First we have "Voter Registration Rates: Identifying Where People Are Not Registered and Why", a really cool piece on North Carolina data. Nathan models VEP using a random walk, and explores voter reg statistics using #rstats. Check it out at…
Next we have "Youth Voter Disillusionment — Fact or Fiction" which is done in #rstats by @annabaringer. Another interesting analysis of the North Carolina voter file, but this one focused on deadwood in registrations by racial group. Check it out at…
Read 7 tweets
[Notes on terminology]
"A package is not a library."
Library refers to:
- "A directory into which packages are installed"
- "That used by the operating system, as a shared, dynamic or static library or (especially on Windows) a DLL, where the second L stands for ‘library’."
"Installation takes a source package and installs it in a library using R CMD INSTALL or install.packages."
"Source packages can be built. This involves taking a source directory and creating a tarball ready for distribution"
Read 24 tweets
✨🧠 The ecosystem that has grown up around @TensorFlow in the last few years blows my mind. There's just so much functionality, compared to some of the other, newer frameworks.

👉Consider this an ever-expanding thread for me to take notes + wrap my brain around products. Ready?
1) @TensorFlow Extended (TFX)

It's no secret that I 💕 #TFX and all of its tooling for deploying machine learning models into production. If you care about keeping your models up-to-date and monitoring them, you should check out the product + its paper.
2) @TensorFlow Hub

If you want to train your model on a small data set, or improve generalization, you'll need to use something called transfer learning. #TFHub modules make it easy—and are available in an #OSS marketplace:

Read 39 tweets
I made a firework with #rstats and gganimate! But the accidents along the way are at least as interesting as the final result. (1/n)
For example, backwards fireworks are holes (2/n)
Randomizing x makes funny clockwork particles
Read 10 tweets
In the spirit of #EpiWritingChallenge, I thought I'd share some elements of my writing workflow in Rstudio and R Markdown. I'm changing this all the time and would love to hear about yours! You can find this example on my GH here:… #rstats 1/8
The best thing about markdown is the way you can intermingle text with code, but I often find I need to focus on the words. The first step for me is to zoom to the the source pane using ctrl-shift-1 #rstats 2/8
Then, I collapse everything using the fold fall keyboard shortcut, which is opt+cmd+o on Mac and alt+o on Windows. I only unfold the section I want to work on and focus on that #rstats 3/8
Read 9 tweets
On the eve of the #2ndAnniversary of the #DeMonetisation that we know was a #DemonetisationDisaster I present a Tweet thread on my analysis of just one of the many claims made for it - that it would boost #cashless #digital transactions. It did not do that. Here's my evidence:
The analysis and paper (Draft 1) is here:
on issuu:…
on Scribd:…
But here in this thread I present the main results. Previous analysts used a simplistic approach of doing a before-after comparison. But this is a spurious method
It leads to the Post hoc ergo propter hoc fallacy. See,…
My analytical approach was simple. I used the figures from RBI and took the monthly data upto Oct 2016 on each idicator and treated it as a regular time series data. (see
Read 18 tweets
Hey all! I'm excited to reveal my new R package. It fills a gap I've been sorely missing since I've been moving from Stata to R: a quick and easy VARIABLE BROWSER! One you can use Find-in-Page on, and variable labels! Introducing the package vtable()! (thread)
vtable() takes your data and spits out a formatted table of information about your variables. It then opens up that table in the Viewer pane (in RStudio) or in a web browser (elsewhere) so you can continue to refer to it while working on your data. Can give value details too.
It works with variable labels. Provide your own (in one of three flexible formats), or if you've embedded some with Hmisc/haven/sjlabelled, it will pick them up. It detects sjlabelled value labels too. All where Ctrl-F can repeatedly and reliably help you sort through vague names
Read 11 tweets
Following our launch event two weeks ago, we are looking forward to holding the second session of Social Data Science Seminars in @CUSocSci. In the launch event, we decided to follow chapters of @hadleywickham's R4DS book until most participants get up to speed. 1/5
This week, we plan to cover Chapter 1(…) and Chapter 4 (…). Reading these chapters before the session is advised. 2/5
If we have time left, we might play with the great data viz package ggplot (…) or dplyr verbs in Chapter 5 ( 3/5
Read 6 tweets
over in Slack-land a colleague asked for resources on learning #rstats, with a particular emphasis on resources aimed at beginners.


here is a thread of my personal favorites:
Modern Dive is 🙌🙌🙌 and probably one of the best "I NEED TO LEARN DATA SCIENCE IN R RIGHT MEOW" resources out there because it gets you up, running, and through the whole DS cycle pretty quickly.
R for Data Science is perfect for leveling up your R skills in an applied context, and gives you plenty of practice exploring the tidyverse 🌌
Read 7 tweets
For #DeclareDesign #launchday, here’s a thread about our five #rstats packages for research design and analysis: DeclareDesign, fabricatr, estimatr, randomizr, and DesignLibrary.

5 📦’s in 5 📣’s!

@graemedblair @jasperjcooper @maqartan
DeclareDesign is “ggplot for research designs.” You add together design elements – data generating processes, sampling and assignment schemes, and estimators to declare a design.…
fabricatr simulates fake data to help imagine your data before you collect it, especially hierarchical data common in the social sciences (students in classes in schools). Aaron Rudkin and Neal Fultz deserve big credit for development.
Read 8 tweets
If you want to make code/data “available”, GitHub isn’t enough.

You must deposit at a DOI-issuing data repository @figshare & @ZENODO_org are both free & awesome; can be synced w/ a GitHub repo

Why GitHub not enough? 1/4
#OpenAccess #OpenData
GitHub is a place for things to be worked on, not for them to live forever.

- Links are fragile (username, repo name)
- Users can delete repos
- GitHub could make your code/data unavailable in the future.

DOI-issuing data repositories preserve your stuff for the future 2/4
Depositing on @KaggleDatasets isn’t good enough for #OpenAccess #OpenData either.

- No API for accessing files without an account
- Fragile URLs
- Kaggle Datasets is a commercial thing.

Do all three! GitHub repo, Kaggle Dataset and @figshare or @ZENODO_ORG 3/4
Read 4 tweets
What does one get when mindlessly applying logistic regression to a too small dataset? Well…

Just to be clear, I have zero interest in shaming these authors. Small sample logistic regression analyses are very common! But for those of you interested here is the link:…
“The aim of this study was to evaluate the impact of medical student placement of Foley catheters on rates of postoperative catheter-associated urinary tract infection (CAUTI)”.
Read 24 tweets
Getting ready to teach dplyr joins to new #rstats users tomorrow, so naturally I productively procrastinated by getting to know the new gganimate. It is the coolest!
The readme is a great intro --…. I basically just replaced `+facet_wrap(~ key)` with `+transition_states(key)`.
I really like how easily the gganimate calls fit in with the ggplot pipeline. It's very clear @thomasp85 put a lot of thought and care into the API design.
Read 8 tweets
Thread: Though there is lots of community interest in model-agnostic model explanations (eg. LIME), I’ve been interested in an explainer specific to random forests. Came across a great one here:… #rstats
To begin, train an RF model and run the explain_forest() function. #rstats
Next thing you know, a variety of “views” into the forest appear on your machine. Can explore interactions between metrics and variables. For more details, check out this vignette:… #rstats
Read 5 tweets
#RStats — What's up CRAN?

In the past 7 months:

🆕 1178 new 📦

🆓 18 unarchived 📦

⏹ 493 archived 📦
At the end of June, there were around 12500 📦 on the CRAN
Changes on CRAN check ✅

See… for more info
Read 9 tweets
💬 "If you're aiming at humans only, you can build a shiny app. If you want machines to access your data, build an API"

💬 "And remember that any other language can access your R API. You're opening a lot of doors."

Some important things to know when working with APIs:

🕸 About http requests :…

🚀 About REST API :…
Read 14 tweets
[Thread #RStats #Programming ]
Some important points taken from "Object-Oriented Programming, Functional Programming and R", by John M. Chambers…
An article about what is Functional Programming (FP), and how does it differ from Object Oriented Programming (OOP) ?
Note that both these paradigms are used in R, and interact with each others.
Here’s the definition of what FP is :
- Programming == creating function
- Functions returns values computed from arguments (and only these)
- A function has no side effect
Read 20 tweets
Why do we use arrow `<-` instead of equal `=` for assignment in #RStats :
It’s an historical choice: R comes from S, which used <- for assignment. S uses `<-` partly because it is inspired by APL, which had the ← operator for assignment, as it was developed for this keyboard, which has a key for arrow :…
Not that at that time, with APL, the arrow was chosen because it distinguished from the equal operator (there were no `==` for testing equality).
See :…
Read 16 tweets
⬇️ Thread of thoughts on this topic ⬇️

I've taught #D3js to nearly 300 faculty, staff, and students at primarily the grad/PhD level across various workshops and bootcamps.

Some key reasons behind why D3 is hard, I'll outline below (in case anyone is learning or curious)
While you DO need to understand empathy, design, color perception, visual cognition, story-telling, geometric thinking, graphical conventions, basic statistics, data structures, linear/spatial algebra, and trigonometry to do VISUALIZATION well, D3 makes all of this *EVEN HARDER*
Why? Because the CORE of D3 is in its name: 'Data Driven Documents.' Data-driven is one thing, but the 'Documents' part refers to the DOM, which is a broad term referring to the *stuff* that makes up the web:
& more - all at once
Read 11 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!