Discover and read the best of Twitter Threads about #RStats

Most recents (24)

Ever wondered how to join big data in R?

A thread on using Spark in R👇🧵

#rstats #spark #datascience
This is thread # 3 in a series exploring using Spark in R with the sparklyr package. You can find the others here:

# 1:

# 2:

Here's what you'll learn reading this thread:

1. How a regular left join works.

2. The sort-merge-join algorithm (the default way to join big data with Spark and sparklyr).

3. Broadcast joins.

4. Salted joins.

All directly in R. Let's go.

Read 18 tweets
Following the @_useRconf 2022 conference and right before the RStudio conference, here are, in my opinion, the main trends in the #R language 🌈:
➡️ MLOps 🚀
➡️ Data 🏎
➡️ Documentation ❤️

#rstats #opensource #DataScience
MLOps - I was not aware of any #MLOps framework in R until the release of the 𝐯𝐞𝐭𝐢𝐯𝐞𝐫 package. The vetiver package by @juliasilge from RStudio provides an MLOps framework for both R and #Python. Image
The vetiver supports MLOps applications for tidymodels, #XGBoost, mlr, caret in R, and for #PyTorch and scikit-learn in Python Image
Read 9 tweets
No matter how long I’ve been doing data science, I still get smacked in the face every once in a while.

A true story. 🧵

#rstats Image
Listen, I’ve been coding in R since 2013.

I’ve published 10 R packages now.

My top 3 combine for over 2,500,000 downloads.

I’ve taught over 3,000 data scientists, many work for Apple, Microsoft, Google, Netflix.

AND, I still get smacked in the face every once in a while.
Case in point!

I was building a Shiny App for my 2018 R/Finance presentation.

(I’d been coding in R for over 5 years & by this point I was pretty solid).
Read 8 tweets
Hey Data Scientists,

I'm hosting a free geospatial analysis training tomorrow, and I'd love to help you learn spatial analysis with #Rstats.

Here's what's coming. 🧵 Image
Geospatial analysis is a high-demand skill that companies are paying people fat salaries to help them.

I want to help you learn geospatial for free (so you can get one of these lucrative jobs).

Here's what I'm teaching live tomorrow...
1. Learn the key function for fast interactive data visualization
Read 9 tweets
. 🦠COVID is not over ❗️
⬆️Big recent increases here in my country 🇮🇱 ; a fresh new wave.
1⃣new cases:… 2⃣hospitalizations:… #COVID19 #openscience #rstats #serverless #snrtg #Israel
I was not expecting a new COVID wave until at least Autumn. I was wrong.
🦠SarsCov2 is unpredictable. It does not abide by seasons.
The figure of new cases could be a vast underestimation:
"Israel's Covid czar: Likely up to 20,000 cases
every day in current outbreak"… #scicomm #COVID19 #Israel #openscience
Read 5 tweets
⚠️Potentially dangerous
new 🦠#COVID19 variants🧬
▪️The pandemic won't end until we prevent this phenomenon
❗️Get💉 vaxxed to shield & protect the immunocompromised… #bioinformatics #WomenInSTEM #openscience #snrtg #immunology #genomics #raredisease
. "NOBODY is safe from the virus until
we are ALL safe from the virus"
~Dr. Tedros Ghebreyesus (WHO Director General)
Viruses🦠do strange things when unchallenged for long periods of times in hosts.
Unfortunately mutating🧬is one of those things.
Read 8 tweets
Ever heard of logistic regression? Or Poisson regression? Both are generalized linear models (GLMs).

They're versatile statistical models. And by now, they've probably been reframed as super hot #MachineLearning. You can brush up on their math with this 🧵. #rstats #Statistics
Let's start with logistic regression. Assume you want to classify a penguin as male or female based on its

* weight,
* species and
* bill length

Better yet, let's make this specific. Here's a data viz for this exact scenario. It is based on the {palmerpenguins} data set.
As you can see, the male and female penguins form clusters that do not overlap too much.

However, regular linear regression (LR) won't help us to distinguish them. Think about it. Its output is something numerical. Here, we want to find classes.
Read 25 tweets
Ever wondered how to manipulate big data with R?

A thread on using Spark in R👇🧵

#rstats #spark #datascience
Big data is any data which is too large to be loaded into memory all in one go. Ever tried to read a large dataset into R and everything fell over or slowed waaaaaay down? This is where Spark comes in.

Spark is an open source tool for processing big data. It splits data into partitions for processing to overcome RAM limitations and writes to disk when needed. It is complicated but Spark handles most of the difficult parts. And you can work with Spark directly from R.

Read 19 tweets
Mês passado eu apresentei no VI SER quatro pacotes que buscam dados do @TesouroNacional e geram dataviz. São dados de RTN, GFS, COFOG e SICONFI. Alguns desses pacotes emulam alguns paineis que construímos. Segue o fio. (1/alguns)

#RStats #EconTwitter #dataviz ImageImageImage
O {rgfs} é o pacote que permite consumir dados do GFS e com isso fazer comparações internacionais de estatísticas de finanças públicas.
Os scripts da primeira figura permitem não só baixar os dados como gerar gráficos. As outras três figuras são exemplos de gráficos gerados
(2/n) ImageImageImage
O {rcofog} traz dados e gera gráficos que permitem comparar as séries temporais de despesa por funções de governo. (3/n) ImageImageImage
Read 7 tweets
Signs of a new
COVID-19 wave🦠
Here in 🇮🇱 Israel
⬆️Quadrupling ➕of new cases in past week
😷Indoor masking mandate could return (source: Haaretz)… #CovidIsntOver #snrtg #serverless #openscience #COVID19 #rstats #Israel Image
🦠COVID-19 🇮🇱:
⬆️New daily cases had been in the 1,100 to 1,600
range for the past 3 months. Now a very dramatic,
very sudden upturn.
⬆️29.77% positive test rate today.
⬆️ROI at 1.44
Not particularly good signs.
Our 🇮🇱 Health Ministry is already recommending that people begin to mask again indoors.
😷A debate to require indoor masking again will commence in the coming days against the dramatic rise in new cases.
Read 3 tweets
New paper!

We introduce #GAMs as a tool for analysing medical monitoring data. We also present the idea that GAMs can be used to decompose a medical waveform into individual, physiologically relevant, components.

w/ @ucfagls and @SimonTVistisen…
Example 1: Pulse pressure ("the strength of a heart beat") varies with ventilation (mechanical). We can use a cyclic spline to represent the repeating effect of ventilation on pulse pressure. The smooth effect of respir...
This is especially useful when we only have a few beats per ventilation (high respiratory rate, low heart rate), where the pulse pressure variation in any individual ventilation does not represent the "true" pulse pressure variation. The smooth effect of respir...
Read 6 tweets
Fantastic Bioinformatics Courses and Where to Find Them!

Diving into Bioinformatics is absolutely worthy and it could be much easier with these FREE online and open sources

A thread ... 🧵 (1/n)
#Bioinformatics #biology #Biologia #programming #freecourses #learning #RStats Image
(2/n) Introduction to programming for Bioinformatics with Python

Python is fundamental in Bioinformatics analyses. This course will let you learn Python programming for biological data manipulation and scientific research

Level: All levels…
(3/n) Genomics Workshop by The Carpentries

Open source lessons from The Carpentries @thecarpentries on Cloud computing and analysis of sequencing samples using the cloud
Level: Intermediate/Advanced…
Read 9 tweets
I am rebuilding my #rstats blog from the bottom up with #quarto. This will let me use quarto's cool new tricks like tabs and easy columns.

I've already spent hours using quarto's great docs to build a custom blog. If you want to do the same, let me show you what I did. ImageImage
Today, I will show you the first of many steps to your own quarto blog. First, create a new quarto blog project via RStudio.

Make sure to create a git repo as well. This lets you revert changes when you break your blog. You can follow along my repo at…
You can render your blog with `Render Website` from RStudio's `Build` tab.

The first easy changes happen in the `_quarto.yml` file.

1⃣ Set `theme: default`
2⃣ Name your blog via `title`
3⃣ Link your GitHub profile etc.

This will change the navbar at the top of your blog. ImageImage
Read 10 tweets
Congratulations to Sally Reynolds and René Bobe, for this amazing new book! I’ll take this opportunity to talk a bit about the chapter I contributed to, led by @bobe_rene and co-authored by me, @carvalhoprimate and Meave Leakey.
The Koobi Fora Formation in the Lake Turkana Basin is one of the most important rock units in the world, and continues to provide one of the richest records of our evolution in Africa. #paleoanthropology

There are nearly 250 hominins documented from East Turkana. It includes periods of astonishing hominin diversity with Homo habilis, H. rudolfensis, H. erectus, and Paranthropus boisei occupying the region between 2—1.4 million years ago.
Read 15 tweets
Cases of Mokeypox by Location (Casos de Viruela del Mono por Lugar) #MonkeyPox #RStats #IDtwitter #ViruelaDelMono #VirueladelSimio #VarioleSinge #VarioleDuSinge #DataScientist #elcarteldeSINADEF #Analytics $BAVA $BAVA.CO $SIGA #AI #100DaysofCode #AWS #TensorFlow #Python🧵(1/2) Image
Cumulative Cases of Monkeypox per Day (Acumulado de Casos de Viruela del Mono por Día) & Statistical Trend in the Count of Cases (Tendencia Estadística en Casos) #MonkeyPox #ViruelaDelMono #VirueladelSimio #VarioleSinge #VarioleDuSinge #IDtwitter $BAVA $BAVA.CO #RStats 🧵(2/2) Cumulative Confirmed Cases ...
Read 3 tweets
Over the course of our @USATODAY investigation into broken adoptions, we traced 60,000 children adopted out of foster care from 2008 to 2010.

Nearly 2,000 returned. We calculated how each of a child's characteristics affected their risk of returning.…
Most states make it difficult to detect when a child adopted from foster care reenters the system.

They issue new identification numbers to these children in reports submitted to the federal government’s foster care database.
Sixteen states, however, provide data that makes it possible to track children from foster care to adoption to reentry in the foster system, according to a 2020 report submitted to the federal Administration for Children and Families.… Map of 16 states included in the analysis: Arizona, Florida,
Read 11 tweets
#SQL from #R is insane. Here’s why. 1/6

Image Credit: @yutannihilat_en

2/6 Thanks to Hiroaki Yutani for putting this amazing example together.

This small example demonstrates how you can learn R and automatically use SQL.
3/6 Under the hood, the dbplyr library converts R’s tidyverse syntax to SQL

So we don’t need to memorize the SQL translation.
Read 6 tweets
Our paper, “How do archaeologists write about racism? Computational text analysis of 41 years of @SAAorg Annual Meeting Abstracts” is now #openaccess in @AntiquityJ… with @LiyingWang10 & @benmarwick. Our key findings 🧵 … An image of the first page of the PDF of our journal article
Inspired by #BLM & work by @aflewellen @ArchFieldNotes @AliciaOdewale @HendrattaAli @melissa_cronin, we examine how 1000s of archaeologists have written about #racism using computational text analysis with racism-related keywords: “race", "racial", "racism", and “racist”. 1/ Poster with the text "black lives matter" in black
Our computational methods come from the innovative work of @ted_underwood @mljockers @dmimno & only possible thx to the excellent #rstats pkgs by @kenbenoit @opencpu @juliasilge & @drob. Our data are 60,000+ openly available conference abstracts from @SAAorg. 2/ Image of the society of American Archaeology logo.
Read 22 tweets
I've been studying #Bayesian methods in #rstats since the beginning of this year.

The more I learn, the more excited I get about Bayesian.

Here's why... Image
One of the key R packages I've been experimenting with is BRMS (Bayesian Regression Models using Stan).

BRMS allows us to model a wide range of statistical models including:

- linear,
- count data,
- survival,
- multi-effects,
- non-linear (& more!)
The important point is that Bayesian modeling implements a special technique called Markov Chain Monte Carlo (MCMC).

MCMC is a game changer.
Read 6 tweets
It's easy to include broader impacts in your research with the "coding for broader impact" framework developed by students @TempleUniv in @ESAFrontiers (1/4)…
#DataScience #coding #ClimateCrisis #Sustainability #environmental #RStats
Coding for broader impact addresses key challenges to effective public communication by developing personalized stakeholder reports along side tasks needed for publishing a paper (2/4) #publicspeaking #STEMeducation #peerreview #science #publishorperish
The #rstats coding for broader impact workflow is in this repository (3/4)
Read 6 tweets
👋🏽Hello new followers

I wanted to share my open-access resources and introduce myself. I am a professor focused on statistics and data science education. I think/write/talk a lot about learning/teaching (with) R. Here are some of my resources 🧵

First, I compile free R resources at
I co-authored a Bayesian book with Alicia Johnson and @Miles_Ott. The book is freely available at

I owe a big part of my current teacher-scholar identity to writing this book. I have learned a lot in the process.
Read 6 tweets
Here is a selection of the *coolest* 3D maps and map-like worlds I've seen on Twitter in the past months. Wonderful dynamic, interactive, static, creative or artistic visuals. #Dataviz #3dMap #GIS #3D #Cartography #GISchat 1/23🧵⬇️
Thomas Kole | @ThomasKoleTA 's stunning realistic 3D model of Tenochtitlan capital of the Aztec Empire (now Mexico City) made in #blender using #geometrynodes. Stunning water, clouds & details. #b3d #CDMX 2/23 🧵
Craig Taylor | @CraigTaylorViz 's mesmerizing ship data animation of the Gulf of Mexico. Made in #Cinema4D, with pleasant colours and satisfyingly well-made animation and light reflections. As always impressive work. 3/23 #3dmaps
Read 22 tweets
Since the beginning of the war, 🇷🇺diplomats have tweeted disinformation thousands of times. In my analysis for @AP @washingtonpost @ABC et al., I find that 300+ 🇷🇺diplomats have gained millions of engagements, and are inconsistently labeled by @twitter /1.…
Since late January, 🇷🇺 diplomats have posted over 2,000 times a week. These tweets were highly engaged with, gaining over a million likes, replies, RTs/QTs in most weeks. In addition, 🇷🇺diplomats themselves retweeted more than 50k times, ~90% retweeting their own 🇷🇺colleagues. /2
Many of these tweets contain outright disinformation around alleged biolabs, nuclear weapons programs and false flag provocations by Ukraine. In one case, Twitter deleted two false tweets about the 🇷🇺 bombing of a hospital, but... /3
Read 14 tweets
This semester I had the pleasure (+ challenge) of teaching Regression Analysis. Much of the content was originally designed by the inimitable @MT_statistics, I’m handing it back w/ 3 updates:

🧹 #tidymodels
🔮 more predictive modeling
🔵 @quarto_pub!

🔗 Course hex logo. The text on the logo reads "STA 210&qu
The curriculum now features a roughly equal dose of inferential and predictive models, using tidyverse and tidymodels for the #rstats code + all course content (website, slides, documents) created w/ Quarto. Source code for everything can be found at….
And importantly, students learned to use @quarto_pub as well! Roughly 75% (of 90 students) came in with one semester of R Markdown experience and the remaining with no R/R Markdown experience, and getting started with Quarto went without a hitch!
Read 6 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!