Discover and read the best of Twitter Threads about #RStats

Most recents (24)

¿Cambió tanto la #NBA en los últimos 20 años? ¿Por qué un partido es tan distinto a lo que veías cuando eras chico? Eso es lo que voy a intentar responder en el siguiente hilo pero vamos a los bifes. Acá se ven todos los tiros de la liga en el 2000 versus 2019.
Se analizaron 4 millones de tiros desde la temporada 2000 hasta la 2019 con la API de la #NBA y luego usamos magia de #Rstats para jugar con los números. Lo primero que surge es que hoy se tiran casi 25.000 tiros más por temporada que en el 2000. Se juega mucho más rápido.
Como dato divertido aportado por el amigo @GustavinhoCARP el equipo más lento de la #NBA hoy (Charlotte Hornets) juega aún más rápido que el equipo más rápido de la temporada 2005 Phoenix Suns conocidos por su filosofía innovadora de jugar a 7 segundos o menos.
Read 14 tweets
A bit of #rstats #ggplot mapping today:

How Sinn Féin surged to win the popular vote in the Irish general election ft.com/content/26a7a7…
Here’s the whole thing in four blocks of #rstats code:

1) Scrape the results by looping through constituency pages, finding the results tables for 2020 and extracting the first preference vote numbers
2) Unzip and read in a constituency boundaries shapefile, join that to the results data
Read 5 tweets
Alright folks, here's a thread of some the tools that I use to get stuff done.

All these tools save me time AND they're a delight to use
For citations I use @zotero. The web browser plugins make it super easy to extract citation data from journal websites. Plus, it warns you if any papers in your library have been retracted! I don't know why I stuck with EndNote for so long...
For version control of my word documents both for myself and with my collaborators, I use @simuldocs
Read 18 tweets
Starting a hands on class on data visualisation at an arts school tomorrow. I'll just give the students an assignment, and a list of tools. Here are the #dataviz tools
1/16
Category "Desktop GUI": @msexcel . Very good for starters, data manipulation integrated. Limited charts and options, but can be brute forced to produce basically anything
@msexcel Adobe Illustrator: the Graph tool is old and clunky, so many manual edits needed to get where you really want. Upgrade with @datylon Graph tool datylon.com/graph for more chart options and #ai2html for web ready + responsive output ai2html.org
3/16
Read 16 tweets
My #rstats origin story: just about 20 years ago, I was working for a company that made turnkey Unix computer systems. I had written a collection of Perl scripts that ran in them and collected performance indicators, logging them as compressed CSV files.
When a customer reported performance issues, we'd download the CSV files and do statistical analysis, in Excel or Minitab. And we used the data to calibrate our capacity planning benchmarks.
Minitab has this visualization called a "six-pack" (support.minitab.com/en-us/minitab/…). Management loved these, so we had macros to make them and all that.
Read 14 tweets
Pleased to announce that #tidybayes v2.0 (SLABS FOR DAYS edition) hit CRAN today. #rstats

Lots of new stuff in this version: A THREAD
The biggest thing is the new slab+interval meta-geom, generalizing old #tidybayes geoms and enabling a bunch of new ones. This is a flexible FAMILY of #ggplot geoms for visualizing probability distributions and uncertainty using slabs (densities, cdfs, etc), points, and intervals
The slab+interval meta-geom now drives old standards like eyes and half-eyes...
Read 18 tweets
O que mudou na pesquisa econômica nos últimos 20 anos?

A CAPES disponibiliza uma base de dados de todas as suas teses desde o ano de 1987. Assim, a partir desses dados, pude extrair algumas informações relevantes..

Segue.. 🤟

#rstats #TidyVerse #TextMining 1/n
Em 20 anos, sem dúvida, desenvolvimento econômico se torna um dos temas mais abordados nas teses de mestrado e doutorado em economia.

Inflação sempre é um tema recorrente, mas há 20 anos atrás a abordagem era outra: ainda se estudava mais a questão das metas + 2/n
ou mesmo o próprio plano real.

De fato, desde 2008, "desenvolvimento econômico" foi o tema referente a pesquisa que mais aparece, seguido por "finanças". Se levarmos em conta teses em inglês, "finance" ainda tem uma expressão considerável nos temas mais + 3/n
Read 6 tweets
A thread of classifiers learning a decision rule. Dashed line is optimal boundary. Animations with #gganimate by @thomasp85 and @drob. #rstats

Logistic regression {stats::glm} with each class having normally distributed features. (1/n)
Quadratic discriminant analysis {MASS::qda} with normal features. The QDA model is the same as the data model in this case, and so it fits the optimal boundary very closely. (2/n)
MARS {earth::earth} on normal features. (3/n)
Read 12 tweets
@KTorresStats Hi Katy!

The resume looks good to me!

I'm starting a thread with some ideas.

Goal 1: Connect with #rstats data science community
- Tons of academic and industry here on Twitter
- do informational interviews with acquaintances
- join @RLadiesGlobal Slack @R4DScommunity
@KTorresStats @RLadiesGlobal @R4DScommunity Goal here is to meet someone who can get your name to a hiring manager
- referrals == stronger chance of interview/hire
- R4DS and #rladies Slack are places outside Twitter to find job posts, ask questions, network
- Info interviews help you understand what pro DS do/expect
@KTorresStats @RLadiesGlobal @R4DScommunity Goal #2: read livebook.manning.com/book/build-you…

By @robinson_es and @skyetetra

Tons of good ideas and biggest additional step IMO == get a public portfolio

Essentially show the industry that you can do the job already!
Read 9 tweets
Ano passado, saiu um gráfico no Wash. Post que mostrava que no sistema escolar 🇺🇸 o % de alunos negros excedia (e muito) o % de professores negros.

Achei um dado impactante e decidi replicar para as Inst. de Ensino Superior Públicas 🇧🇷. Segue: 👇 #rstats #TidyTuesday #ggplot2
No gráfico acima, cada ponto é uma Inst. de Ensino Superior Pública (IESP). A principal informação que ele expressa é que, assim como no gráfico do Wash. Post (abaixo), há uma desproporção. Há muito mais estudantes negros do que professores negros, o que tem diversos impactos.
IES que tem um mesmo % de estudantes e professores negros deveriam se situar sobre a linha pontilhada que corta o gráfico. Porém, apenas 12 das 212 IESP presentes na gráfico estão sobre ou acima da linha. A enorme maioria está abaixo: + estudantes do que professores negros.
Read 12 tweets
Revirando os dados do ISP-RJ, resolvi fazer um apanhado de infos sobre a segurança pública no RJ (estado), dando uma ênfase a microrregião do RJ e especialmente à Niterói, São Gonçalo e RJ..

segue 🤘🤘
#rstats #ggplot2 #tidymonday #TidyTuesday #RiodeJaneiro
Separei os dados de duas formas: números absolutos e a cada 100.000 habitantes (IBGE) e destaquei três tipos de crimes - estupro, roubo, e homicídio (doloso). Analisando os gráficos RJ SG e Nit seguem o mesmo padrão, frente aos crimes analisados. 1/n
Roubos totais sofre uma alta no fim do Gov Witzel (para as 3 cidades), e o número de estupros reportados (em geral) tem uma tendência crescente, bem como Roubos em geral 2/n
Read 7 tweets
As 2019 comes to a close, I want to thank all of the lovely people in the #rstats world who have made my year a professional success. For each person in this thread, I'm going to tweet one thing they've done that I particularly appreciate.
I struggled with learning R for a long time — until I made the switch to the tidyverse. As the driving force behind the tidyverse, I'm incredibly grateful for the all of the work that @hadleywickham has done.

tidyverse.org
I like to think about workflow. No one has been more influential in my thinking in this area than @JennyBryan. The What They Forgot to Teach You About R materials she has put together (along with @jimhester_) have helped me tremendously.

rstats.wtf
Read 67 tweets
So this great study was published and, as always, everyone is confused about interpreting non-inferior design.
Can a simple Bayesian approach help?
@ADAlthousePhD,
@Michael_Harhay, @otavio_ranzani, @reverendofdoubt, others pls
correct me if I am wrong!

jamanetwork.com/journals/jama/…
As with any Bayesian analysis, we should have a prior for the success rate of first attempt intubation. I am no expert on this, but in Brazil is it probably around 70%, but values of 60-80% are possible.
The prior can then be assumed to be a beta distribution. A B(65,20) seems to do the trick (picture). This distribution is a probability distribution we will consider as prior for this data playing.
Read 16 tweets
Cut an #rstats scripts runtime from 2+ hours to <5 minutes and feel extremely powerful (even though arguably the first version was just bad code)

Don’t know who needs this but a few random tips below. Easy once you’ve heard them but often outside of intro content 👇🏻
Run iterations in parallel! If you’re using {purrr} this is *ridiculously* easy with @dvaughan32 ‘s {furrr}

You truly just add ‘future_’ prefixes to map functions
Remove anything from the iteration that can be done outside including data preprocessing (eg type conversion) or post processing (eg normalizing everything by the same constant)
Read 7 tweets
The great thing about #rstats is that there are so many open source learning resources, it is tough to know where to start. Sometimes resources designed by beginners (or people who have just learned the thing you are trying to learn) are best. 1/n
Why? People who have only just learned something don’t yet suffer from the curse of knowledge. They still know what it feels like to be totally overwhelmed and make fewer assumptions about what you might already know. 2/n

Last year @djnavarro @williamslisaphd and I put together a set of online modules for #rstats beginners called #RYouWithMe. It starts with Basic Basics. This module gives you an tour of RStudio, shows you how to find your way around and get data into R. tinyurl.com/ruwm-basics 3/n
Read 8 tweets
So you want to learn #rstats? I’ll tell you about my favourite learning resources in a minute, but first… some tips and tricks. 1/n
Tip 1: Find a reason. It is tough to get motivated to learn something new when your old workflow, while inefficient and not reproducible, gets the job done. If you try to learn #rstats in service of a real project, you are more likely to persevere. 2/n
Tip 2: Make it fun with side projects. Try sentiment analysis on Jane Austen novels juliasilge.com/blog/you-must-… Or learn regular exp by on analyzing your bank trans data r-bloggers.com/analyse-your-b…. Check out #tidytuesday for vis practice datasets 3/n
Read 7 tweets
Taking notes on methods for uncovering mechanism in #impsci #DIScience19. Yet another packed room.
Edward Miech w/ VA:. Using cfir constructs and coincidence analysis to look at implementation success. Interested in role of champions in successful implementation. #DIScience19
Acute stroke care as focus. Used cfir framework.
Read 27 tweets
If you are interested in analyzing #SingleCell #RNAseq data in #Bioconductor using #rstats, please check out our paper Orchestrating single-cell analysis (OSCA) with @Bioconductor that was published in @naturemethods this week! #genomics #scRNAseq #dataviz #methodsmatter
@Bioconductor @naturemethods OSCA is a rich, reproducible, accessible (from beginners to experts!) resource with many #scRNAseq workflows & datasets. The resource is an online #bookdown book that compiles every night to track development by the open-source and open-development @Bioconductor #rstats community
@Bioconductor @naturemethods Now, OSCA is not the only set of packages / workflows for the analysis of #scRNAseq data. #Scanpy (scanpy.readthedocs.io/en/stable/) and #Seurat (satijalab.org/seurat/) and are two incredibly popular packages in #python and #rstats, respectively.
Read 9 tweets
**#rstats BLACK FRIDAY "DEALS"** (thread)

100% OFF on these awesome, always free ebooks I've read and/or recommended this year

BOGO: in true R fashion, each thoughtfully covers both code and theory

Thankful to all these authors for openly sharing such great content🙏

(1/n)
I'm sure there is a ton that I am forgetting in the below, so please feel free to add on your own favorites!
@robjhyndman 's Forecasting: Principles and Practices

otexts.com/fpp2/

Fantastic intro to forecasting building from basic principles to complex models. Also gives context to appreciate a lot of exciting work happening in {tidyverts} tidyverts.org
Read 10 tweets
How has the world changed? This 🧵 compiles many of my plots on, eg., child mortality, fertility, GDP, women's education, and life expectancy. Thanks to @Gapminder for the data! #rstats

First, child mortality has dropped precipitously all over the world. #GoodNewsGraphs (1/N)
The decline in child mortality, plotted against per capita (log) GDP (2/N)
Read 48 tweets
THREAD/RANT: Just watched a great #rstats webinar. Presenter is an author of many📦s. At the end, the host suggested that the (100s of) attendees go file issues on the 📦s Github repos & comment on the presenter's blog posts to ask Qs they didn't get answered in the webinar. 1/?
This is absolutely NOT how you get answers to #rstats questions. Not only will you waste the package author's time, you will waste your own time. So, here are some tips on how to get help in #rstats. TL;DR: filing Github issues, tweeting/emailing pkg author are LAST RESORTS. 2/?
(These tips are ordered based on my experience to get you an answer as fast as possible.)

#1: Read the relevant documentation. #rstats Many packages have websites & vignettes. Consult those, too.
Read 18 tweets
#neurotwitter🎶welcome to the #natverse - we’ve got tools for brains!🎶Open software to analyse neurons, connections, brains #bioRxiv biorxiv.org/content/10.110… Others are using it, why should you? natverse.org: @gsxej lab @MRC_LMB @flyconnectome @CamZoology #rstats 1/10
The natverse allows you to easily plot neurons and neuroanatomical volumes, measure features and implement interesting algorithms, e.g. flow centrality @csdashm and NBLAST @martamcosta2 natverse.org/gallery/. We give lots of examples: github.com/natverse/nat.e… 2/10
Read 11 tweets
👋 @LucyStats here! Today we’re going to do a little stats primer on testing for non-linear terms when fitting a model.
What do you do when trying to decide whether to include a non-linear term in a model?

1️⃣ test the nonlinear term, if significant leave it in
2️⃣ if you have enough dfs, include the nonlinear term regardless of significance
3️⃣ never include nonlinear terms
4️⃣ comment
It turns out if you make a decision to include the nonlinear term based on a significance test, you are at risk of inflating your Type 1 error 😱

📃 source: onlinelibrary.wiley.com/doi/abs/10.100…
Read 12 tweets
A basic guide to trouble shooting problems with #rstats
(A thread for my students and others new to R)

Something in R not working? Weird error message?
Go through this list of steps to try to resolve the problem. [thread; suggestions & other tips highly welcome]
@DimperioJane
1) Did you run a line of code without a ")" at the end?

Look at your code in the console. Is there a little "+" at the far left? R is probably waiting for a complete line of code. Click into the console, press ESC, add the ")", and try again
2) Is R waiting for user input?
Occasionally R will ask for user input via a pop up Window or within the console itself. Look in the console see if you need to give a numeric response (option 1, option 2) or look for the GUI's popup.
Read 21 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!