Discover and read the best of Twitter Threads about #Rstats

Most recents (24)

Over the course of our @USATODAY investigation into broken adoptions, we traced 60,000 children adopted out of foster care from 2008 to 2010.

Nearly 2,000 returned. We calculated how each of a child's characteristics affected their risk of returning.…
Most states make it difficult to detect when a child adopted from foster care reenters the system.

They issue new identification numbers to these children in reports submitted to the federal government’s foster care database.
Sixteen states, however, provide data that makes it possible to track children from foster care to adoption to reentry in the foster system, according to a 2020 report submitted to the federal Administration for Children and Families.… Map of 16 states included in the analysis: Arizona, Florida,
Read 11 tweets
#SQL from #R is insane. Here’s why. 1/6

Image Credit: @yutannihilat_en

2/6 Thanks to Hiroaki Yutani for putting this amazing example together.

This small example demonstrates how you can learn R and automatically use SQL.
3/6 Under the hood, the dbplyr library converts R’s tidyverse syntax to SQL

So we don’t need to memorize the SQL translation.
Read 6 tweets
Our paper, “How do archaeologists write about racism? Computational text analysis of 41 years of @SAAorg Annual Meeting Abstracts” is now #openaccess in @AntiquityJ… with @LiyingWang10 & @benmarwick. Our key findings 🧵 … An image of the first page of the PDF of our journal article
Inspired by #BLM & work by @aflewellen @ArchFieldNotes @AliciaOdewale @HendrattaAli @melissa_cronin, we examine how 1000s of archaeologists have written about #racism using computational text analysis with racism-related keywords: “race", "racial", "racism", and “racist”. 1/ Poster with the text "black lives matter" in black
Our computational methods come from the innovative work of @ted_underwood @mljockers @dmimno & only possible thx to the excellent #rstats pkgs by @kenbenoit @opencpu @juliasilge & @drob. Our data are 60,000+ openly available conference abstracts from @SAAorg. 2/ Image of the society of American Archaeology logo.
Read 22 tweets
I've been studying #Bayesian methods in #rstats since the beginning of this year.

The more I learn, the more excited I get about Bayesian.

Here's why... Image
One of the key R packages I've been experimenting with is BRMS (Bayesian Regression Models using Stan).

BRMS allows us to model a wide range of statistical models including:

- linear,
- count data,
- survival,
- multi-effects,
- non-linear (& more!)
The important point is that Bayesian modeling implements a special technique called Markov Chain Monte Carlo (MCMC).

MCMC is a game changer.
Read 6 tweets
It's easy to include broader impacts in your research with the "coding for broader impact" framework developed by students @TempleUniv in @ESAFrontiers (1/4)…
#DataScience #coding #ClimateCrisis #Sustainability #environmental #RStats
Coding for broader impact addresses key challenges to effective public communication by developing personalized stakeholder reports along side tasks needed for publishing a paper (2/4) #publicspeaking #STEMeducation #peerreview #science #publishorperish
The #rstats coding for broader impact workflow is in this repository (3/4)
Read 6 tweets
👋🏽Hello new followers

I wanted to share my open-access resources and introduce myself. I am a professor focused on statistics and data science education. I think/write/talk a lot about learning/teaching (with) R. Here are some of my resources 🧵

First, I compile free R resources at
I co-authored a Bayesian book with Alicia Johnson and @Miles_Ott. The book is freely available at

I owe a big part of my current teacher-scholar identity to writing this book. I have learned a lot in the process.
Read 6 tweets
Here is a selection of the *coolest* 3D maps and map-like worlds I've seen on Twitter in the past months. Wonderful dynamic, interactive, static, creative or artistic visuals. #Dataviz #3dMap #GIS #3D #Cartography #GISchat 1/23🧵⬇️
Thomas Kole | @ThomasKoleTA 's stunning realistic 3D model of Tenochtitlan capital of the Aztec Empire (now Mexico City) made in #blender using #geometrynodes. Stunning water, clouds & details. #b3d #CDMX 2/23 🧵
Craig Taylor | @CraigTaylorViz 's mesmerizing ship data animation of the Gulf of Mexico. Made in #Cinema4D, with pleasant colours and satisfyingly well-made animation and light reflections. As always impressive work. 3/23 #3dmaps
Read 22 tweets
Since the beginning of the war, 🇷🇺diplomats have tweeted disinformation thousands of times. In my analysis for @AP @washingtonpost @ABC et al., I find that 300+ 🇷🇺diplomats have gained millions of engagements, and are inconsistently labeled by @twitter /1.…
Since late January, 🇷🇺 diplomats have posted over 2,000 times a week. These tweets were highly engaged with, gaining over a million likes, replies, RTs/QTs in most weeks. In addition, 🇷🇺diplomats themselves retweeted more than 50k times, ~90% retweeting their own 🇷🇺colleagues. /2
Many of these tweets contain outright disinformation around alleged biolabs, nuclear weapons programs and false flag provocations by Ukraine. In one case, Twitter deleted two false tweets about the 🇷🇺 bombing of a hospital, but... /3
Read 14 tweets
This semester I had the pleasure (+ challenge) of teaching Regression Analysis. Much of the content was originally designed by the inimitable @MT_statistics, I’m handing it back w/ 3 updates:

🧹 #tidymodels
🔮 more predictive modeling
🔵 @quarto_pub!

🔗 Course hex logo. The text on the logo reads "STA 210&qu
The curriculum now features a roughly equal dose of inferential and predictive models, using tidyverse and tidymodels for the #rstats code + all course content (website, slides, documents) created w/ Quarto. Source code for everything can be found at….
And importantly, students learned to use @quarto_pub as well! Roughly 75% (of 90 students) came in with one semester of R Markdown experience and the remaining with no R/R Markdown experience, and getting started with Quarto went without a hitch!
Read 6 tweets
Last week, I held a short workshop teaching #RStats to Economics students without prior programming experience.

Here are 6 lessons I learnt that I wish I had known in advance.
1⃣ Do less.

An ambitious curriculum is usually my thing. But I did not anticipate how much time it would take to get students back on track.

Without any programming experience, students will get stuck in unexpected places. Plan to do less and you will feel less pressured.
2⃣ Teach through typing

I thought letting students literally type along was silly.

Well, I thought wrong. Having typed a code snippet themselves seemed to make students have a better feeling for the code. Later on during exercises, they were better at adjusting the code.
Read 9 tweets
The {bookdown} package is a great way to publish an #rstats book online. It's what I'm using to make R Without Statistics.

Here's a thread with step-by-step video instructions to help you get started with {bookdown}.

Prefer a blog post? Check out Screenshot of R Without Statistics
First, you'll want to install bookdown, same as any other package.
Next you'll want to create a new RStudio project to hold all of the materials for your bookdown project. When you do so, choose "book project using bookdown." You'll be able to select between the gitbook and bs4 themes (R Without Statistics uses the latter).
Read 24 tweets
Five quick #rstats tips I've picked up in the plots I've made for the first category of the #30DayChartChallenge 🧵👇

1. Need to plot a million points and ggplot() is taking too long but you want to stay within the tidyverse?

👉 Use 📦{scattermore} with geom_scattermore() 🥳
2.Using several subplots together to tell a story and they each have their own colour scheme?

👉 Create a function to blend in a bit of the same colour into all the others to create a more unified look Code snippet: blend <- func...
3. Struggling with squished images when using {ggimage}?

👉 Plot the data with a ratio of 1:1 and use coord_equal()
👉 If that's not an option (e.g. x is categorical), adjust the ratio of the x to y increments within coord_fixed(), e.g.: Code snippet:   coord_fixed...
Read 5 tweets
My biggest mistakes were never my insights. They were in over-confidence.

An #rstats + #bayesian 🧵
2/n In business, I've made great regression models that have predicted how much sales we were going to make.

In fact, this helped me increase revenue from $3M to $15,000,000 per year at one of the companies I worked at.

BUT my models were NOT perfect.
3/n In fact, I'd argue that the BIGGEST flops were due to over-confidence.

Believing my model was better than it actually was.

Here's what hurt me...
Read 6 tweets
CRAN Task Views relaunched with many improvements! #rstats #cran @_r_foundation

CRAN Task Views provide a means to find and install pkgs that are relevant for a certain task (e.g., "MachineLearning", "Optimization", "Econometrics", etc.).

Thread (1/7): Screenshot of the task views overiew web page on CRAN: https
Each task view page provides:

- Overview of how to use it.

- Information on #rstats pkgs for the topic.

- List of CRAN pkgs that can be installed.

- Links to other resources and pkgs on other repositories etc.

Example: Econometrics task view.… Screenshots of the econometrics task view web page on CRAN: Screenshots of the econometrics task view web page on CRAN: Screenshots of the econometrics task view web page on CRAN:
Pkgs can be installed with pkg #ctv:

ctv::install.packages("Econometrics", coreOnly = TRUE)

installs all the econometrics core packages or


installs all packages that are not yet installed and up-to-date.

Read 8 tweets
The james bond of data science.

I saw this post from @v_vashishta who described the JB of DS as someone who's resume makes recruiters cry, can cook Michelin 3 start meals with 1 hand, & run an entire #datascience team with the other...

A #rstats #python 🧵 Image

It's funny, because it's true! Job descriptions perpetuate this unicorn.

BUT, here's the cold hard facts:

1. You don't need to be a unicorn. In fact trying to become one will hurt your progress.

2. You don't need a PhD degree. In fact most of my students don't have computer science backgrounds. YET, they are getting jobs at Apple, Microsoft, Google...
Read 8 tweets
There are so many ways to visualize data. How to select one that is suitable for your dataset and the message you want to convey?
Here are some tools that can help you choose the right chart type – right in time for the #30DayChartChallenge 🧵[1/9]
#DataViz #DataVisualization
The Visual vocabulary from the Financial Times (@FinancialTimes) provides a great overview and some general information about each chart type. Very useful for getting inspiration. [2/9]…
@FinancialTimes If you already have a specific goal or message that you want to communicate, then tools that allow you to filter charts based on their function come in handy: [3/9]
Read 9 tweets
There are a lot of changes & new features coming to #rstats #tidycensus in April; you can try out the new features now with `remotes::install_github("walkerke/tidycensus")`. Read on in this thread to get up to speed on the updates:… ImageImageImage
`get_acs()` and `get_pums()` now both default to the brand-new 5-year American Community Survey estimates. If you need other years, be sure to use the `year` argument to get data for that year ImageImage
However, the 1-year ACS experimental estimates _are not_ available in tidycensus, and requesting them will throw an error. If you need 1-year ACS data, be sure to explicitly request data for a different year using the `year` argument ImageImage
Read 7 tweets
@WifeyAlpha recently posted a thread with 16 buy-and-hold asset allocation schemes, i.e. fixed-weight portfolios that can be implemented with ETFs. I decided to write some code to test these in #R #RStats. The code is available on #RPubs:…
The code is a quick & dirty calculation using monthly returns with monthly rebal. It does not take into account transaction costs. The code can be easily adapted to test other asset allocation schemes.
I use the following #R packages: quantmod to get prices from Yahoo; PerformanceAnalytics for calculation of performance metrics. All backtests start when data for all necessary tickers becomes available.
Read 18 tweets
From standard output to powerful storytelling.

Here's how you can recreate the storytelling with data look in #ggplot2.

A detailed thread 🧵⬇️
#rstats #dataviz
This thread is based on my newest blog post. Find it at…

There, you will find more details and ressources than I can offer in this thread.

The blog post was inspired by a makeover video by storytelling with data. You can find it at
This is our original dataviz. We will improve it to tell a compelling story.
Read 17 tweets
A quick tweetorial on using #RStats to hit an API!

I used the {httr} & {jsonlite} pkgs to migrate 789 journal entries from my old platform to Notion. It's a great case study. I'm going to show you my building blocks. 🧱

You can use what I learned for any API project!


First, I went to the "Getting Started" page of the Notion API docs. Makes sense, since I was getting started.

It told me crucial details like how to create an integration, give it permissions on a Notion DB, and get the DB ID.…

But before long I was awash in code examples that were either in node.js or curl, and to be frank I wasn't sure how to work with them.

I know, I should, but hey, this is a tutorial for people like me who… didn't really know how to do it easily. A basic example in the docs...
Read 16 tweets
People are shocked when they learn that I picked #R over Python. It’s a true story.

I began learning data science with Python. This decision actually cost me 2-months and caused me to quit data science for a little bit.

Why? (a 🧵)
#rstats Image
2/n Python was too much like coding. And I came from a business background. My main tool was Excel. Python was unnatural.

A friend of mine recommended me to try #R.

I was instantly surprised at how much more intuitive it was for me given my Excel background. Here's what R had:
👉#R has functions just like #Excel. I could quickly summarize my data using mean(), sd(), sum(), and friends. These functions were very similar to AVERAGE(), STDEV(), and SUM() from Excel.
Read 8 tweets
When you have a data like shown in this figure (continuous x and y variable) and a clear change in slope at a specific point. What is the best way to capture both elements, the slope prior to 1950 and after 1950? Two regressions? #rstats
Shown in the figure is the loess curve using geom_smooth(), if it wasn’t obvious
Thanks, everyone! Seems I have enough to go on.
Read 3 tweets

Here's 5 super ✨unique✨ Statistics resources you probably haven't seen before:
#1- Seeing Theory: this interactive visual introduction to statistics helps you understand concepts like frequentist inference, bayesian inference, and regression analysis in such a clever way…
#2 - StatQuest: free YouTube playlist, with some of the toughest concepts explained simply and visually.

TBH I needed the Maximum Likelihood video as a refresher!…
Read 8 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!