Women in Statistics and Data Science's Threads

Dec 14, 2021 • 31 tweets • 9 min read

You know how excited @daniela_witten gets about SVD? I have about the same thing with kernels. Except that I'm not sure I explain them as well as she does SVD. Still, you're getting a thread on kernels! Maybe one way of putting it is that kernels are dot products on steroid. The dot product is already pretty cool.

1) It's easy to compute and you learn about it in high school math (at least I did, who knows what kids learn in high school now).

Dec 14, 2021 • 14 tweets • 5 min read

Good morning everybody! Let's talk a bit about how I came to develop statistical / machine learning tools for genomics, healthcare and drug discovery. I trained as an engineer at @IMTAtlantique, with a specialization in computer science. I didn't really enjoy statistics and graduated in 2005, back when AI belonged to scifi and nobody knew what machine learning was.

Dec 13, 2021 • 5 tweets • 3 min read

OK, so a bit of background about me: I'm French (and tweeting from Paris), and I'm currently an associate professor at an engineering school called @MINES_ParisTech. @MINES_ParisTech The research group I'm in (CBIO) has a partnership with @institut_curie, which is a cancer research institute. CBIO has four PIs, working on various topics related to, you've guessed it, statistics / machine learning & cancer.

Dec 4, 2021 • 6 tweets • 2 min read

I have organized multiple conferences over the years.

Tips to conference organizers to support women at your meeting

1- Actively consider gender and career stage balance in speakers.
2- Women and minorities may take a longer route to success, try to avoid ageist selection. 3- Provide lactation rooms (with equipment & milk storage). Pumps are heavy and a pain to carry around a meeting. The room should be close-by not a long walk away
4.- Small babies are welcome. Check there is a changing table accessible to dads & mums.

Dec 3, 2021 • 7 tweets • 8 min read

@Bioconductor provides genome annotation for thousands of species and its packages are used in almost every biological discipline including
Immunology
Oncology
Evolution and Phylogenetics
cheminformatics
comparative genomics
epigenetics
pharamacogenomics
systems biology
etc @Bioconductor @Bioconductor is a repository for R packages. There are daily builds & checks of packages.

All packages submitted to Bioconductor are reviewed to check they are documented, have tutorials (vignettes) and pass R CMD Check and BiocCheck

Reviews are friendly and help developers

Dec 3, 2021 • 5 tweets • 5 min read

Lets chat @Bioconductor A thread on
- what is @Bioconductor
- Why use @Bioconductor
- How to get started and contribute to @Bioconductor @Bioconductor @Bioconductor is an open source open development project in the #RStats language for genomics

It has approx 100K distinct downloads every month and is used globally

bioconductor.org/packages/stats…

Dec 1, 2021 • 5 tweets • 6 min read

@RiyueSunnyBao asked me to talk about batch effect & scRNAseq alignment within/between studies. When using chi-sq stat like Pearson or Freeman-Tukey each value represents the strength of association or co-dependence between cell and gene. This has several advantages

https://twitter.com/WomenInStat/status/1465836337442045957

@RiyueSunnyBao We tested the impact of using CA compared to PCA or CCA for batch alignment. Using the scMixology data from @mritchieau, we see better alignment with Corralm correspondence analysis of platforms (10x, celseq, dropseq) than with PCA or CCA.

Oct 15, 2021 • 16 tweets • 7 min read

One of my favorite things to talk about in causal inference is the target *estimand*

🤓 Let's talk about a few different estimands and how to estimate them!

(image cred: @simongrund89)

A lot of my research is in the observational study space. This basically mean that participants in the study were not randomly assigned treatments or exposures, but rather we just observe how a certain exposure affects an outcome

Oct 13, 2021 • 6 tweets • 4 min read

Is our sample of #rstats packages tweeted about during the past week or so representative? Let's find out!

@thomas_mock has an amazing dataset of 737,466 tweets with the #rstats hashtag (with the earliest dating back over 10 years ago!)

Let's run the same code on this data!

@thomas_mock @eddelbuettel pointed out that my first lines to get all #rstats packages on CRAN could be simplified by just running

r_pkgs <- rownames(available.packages())

(thanks!)

Oct 13, 2021 • 11 tweets • 5 min read

I'm curious about the most tweeted about #rstats packages in the past ~week, let's explore this using R! 🧵

I'm going to use these packages:
*️⃣ tidyverse
*️⃣ rtweet
*️⃣ rvest
*️⃣ tidytext

you can install them with:
install.packages(c("tidyverse", "rtweet", "rvest", "tidytext")) First, let's get a vector of all #rstats packages on CRAN!

library(rvest)

r_pkgs <- read_html('cloud.r-project.org/web/packages/a…') %>%
html_nodes('td:nth-child(1)') %>%
html_text()

Oct 12, 2021 • 12 tweets • 5 min read

👋 @LucyStats here! It's been a very exciting week for folks in Causal Inference with the Nobel Prize announcements, I thought it'd be neat to dive back in history to hear about a previous Nobel winner, Ronald Ross

1/

This topic is fun because it spans a whole myriad of my interests!
✔️We've got stats!
✔️We've got poetry!
✔️We've got infectious disease epidemiology!

Much of what I'll chat about today I learned from this article: pubmed.ncbi.nlm.nih.gov/12762435/

2/

Sep 30, 2021 • 6 tweets • 5 min read

Today, I would like to share some resources on causal inference. - a thread ⬇️ I came to this topic, while working with clinicians who use IPW and matching on a daily basis (they are not familiar with double robust approaches). I don’t know for you, but I am so admirative of them as they combine their work with patients with research to advance knowledge

Sep 29, 2021 • 5 tweets • 3 min read

Now, I would like to mention an R package, FactoMineR that I use on a daily basis to explore and visualize heterogeneous data: quantitative, categorical, with group structures, (multiple) contingency tables.

At its core, SVD! (I am also an SVD fan, @daniela_witten ;-). @daniela_witten FactoMineR is indebted to the “French School of Data Analysis” (see arxiv.org/abs/0805.2879 or juliejosse.com/data-analysis/ for historical background), a field of statistics I was trained in.

Sep 29, 2021 • 12 tweets • 2 min read

Hello!
So today, I will share a few thoughts and advice I usually give to my PhD students. I hope this might be helpful for a wider audience, even if it is obvious and already stated by others. Anyway, as a teacher, we know repetition is important ;) - a thread ⬇️ 1)Ask questions
Ask questions
Ask questions
….
Ask questions!
I mean that: don't hesitate to ask questions in seminars (in France in particular, we don't dare enough). Be curious, don’t be shy.

Sep 28, 2021 • 18 tweets • 9 min read

Missing Data, a thread ⬇️
Missing values are everywhere! We have listed more than 150 R packages in cran.r-project.org/web/views/Miss…
So let us give few pointers:
The method of handling missing data depends on the purpose of the analysis: estimation, completion, prediction, etc. 1) For inference with missing values, estimating as well as possible a parameter and giving a confidence interval, consider likelihood approaches (using EM algorithms) or multiple imputation

Sep 27, 2021 • 4 tweets • 2 min read

I have been working for >10 years on missing data.
My passion for data science mainly comes from its transversality: as a statistician, we can interact with so many scientific fields!
With missing data the same is true but within statistics, as it can pop up in all its branches. When I first meet a scientist for a new project, I always start the conversation by asking “Show me the data!” to understand the underlying challenges.
So far, I have never been shown a complete dataset... (of course there might be some bias!).

Apr 30, 2021 • 12 tweets • 2 min read

I’m going to begin today with a bold claim: Being an applied statistician is a lot like being an ethnographer. I say this both based upon years of experience working in collaborative projects and consulting and based on my experience studying ethnography. (Recall: before my PhD in statistics, I started and quit a PhD in sociology).

Apr 28, 2021 • 8 tweets • 3 min read

Yesterday I tweeted about nested data, with multi-level models (MLM) versus OL + cluster-robust variance estimation (CRVE). This made me think about another confusion that arise, between what are called fixed versus random effects. Let’s begin with a simple relationship between a covariate X and Y in nested data, e.g. students i nested in school j. We are interested in understanding the relationship between X and Y at the student level.

Apr 27, 2021 • 19 tweets • 4 min read

I work primarily with nested data. One example is in experiments, with students nested in schools. Another is meta-analysis, with effect sizes nested in studies. In this thread, I’ll focus on students nested in schools, but this applies more generally. Question 1: Do you need to take nesting into account in your analysis? Our world is naturally nested – students in classrooms in teachers in schools in districts and so on. Does this mean we need to take all of these levels into account? No.

Apr 26, 2021 • 13 tweets • 6 min read

Hello everyone – I’m so excited (and nervous!) to get to tweet with you all this week. I’ll start by telling you some general things about myself. I’m an Associate Professor of Statistics at Northwestern University and a Faculty Fellow at the Institute for Policy Research. I also Co-Direct the Statistics for Evidence-Based Policy and Practice Center. For more info see here: bethtipton.com

Apr 23, 2021 • 12 tweets • 3 min read

The #DataFeminism book also made me look inward and examine my own biases, which I am exceedingly grateful for.

Namely, it forced me to reckon with some of my fundamental operating assumptions as a statistician & data scientist.

Examples threaded below... In chapter 3, the authors discuss the role of emotion in data visualization, specifically calling out giants in the field like Edward Tufte and Alberto Cairo (no snitch tagging, please) for what is presented as an anti-emotion stance.

Share this page!

Enter URL or ID to Unroll