Women in Statistics and Data Science Profile picture
Weekly RoCur (Rotating Curator) @AmstatNews Committee on Women in Statistics #WSDS Learn & sign up: https://t.co/aV1Fcdxd5c This week:
4 subscribers
Dec 14, 2021 31 tweets 9 min read
You know how excited @daniela_witten gets about SVD? I have about the same thing with kernels. Except that I'm not sure I explain them as well as she does SVD. Still, you're getting a thread on kernels! Maybe one way of putting it is that kernels are dot products on steroid. The dot product is already pretty cool.

1) It's easy to compute and you learn about it in high school math (at least I did, who knows what kids learn in high school now).
Dec 14, 2021 14 tweets 5 min read
Good morning everybody! Let's talk a bit about how I came to develop statistical / machine learning tools for genomics, healthcare and drug discovery. I trained as an engineer at @IMTAtlantique, with a specialization in computer science. I didn't really enjoy statistics and graduated in 2005, back when AI belonged to scifi and nobody knew what machine learning was.
Dec 13, 2021 5 tweets 3 min read
OK, so a bit of background about me: I'm French (and tweeting from Paris), and I'm currently an associate professor at an engineering school called @MINES_ParisTech. @MINES_ParisTech The research group I'm in (CBIO) has a partnership with @institut_curie, which is a cancer research institute. CBIO has four PIs, working on various topics related to, you've guessed it, statistics / machine learning & cancer.
Dec 4, 2021 6 tweets 2 min read
I have organized multiple conferences over the years.

Tips to conference organizers to support women at your meeting

1- Actively consider gender and career stage balance in speakers.
2- Women and minorities may take a longer route to success, try to avoid ageist selection. 3- Provide lactation rooms (with equipment & milk storage). Pumps are heavy and a pain to carry around a meeting. The room should be close-by not a long walk away
4.- Small babies are welcome. Check there is a changing table accessible to dads & mums.
Dec 3, 2021 7 tweets 8 min read
@Bioconductor provides genome annotation for thousands of species and its packages are used in almost every biological discipline including
Immunology
Oncology
Evolution and Phylogenetics
cheminformatics
comparative genomics
epigenetics
pharamacogenomics
systems biology
etc @Bioconductor @Bioconductor is a repository for R packages. There are daily builds & checks of packages.

All packages submitted to Bioconductor are reviewed to check they are documented, have tutorials (vignettes) and pass R CMD Check and BiocCheck

Reviews are friendly and help developers
Dec 3, 2021 5 tweets 5 min read
Lets chat @Bioconductor A thread on
- what is @Bioconductor
- Why use @Bioconductor
- How to get started and contribute to @Bioconductor @Bioconductor @Bioconductor is an open source open development project in the #RStats language for genomics

It has approx 100K distinct downloads every month and is used globally

bioconductor.org/packages/stats…
Dec 1, 2021 5 tweets 6 min read
@RiyueSunnyBao asked me to talk about batch effect & scRNAseq alignment within/between studies. When using chi-sq stat like Pearson or Freeman-Tukey each value represents the strength of association or co-dependence between cell and gene. This has several advantages @RiyueSunnyBao We tested the impact of using CA compared to PCA or CCA for batch alignment. Using the scMixology data from @mritchieau, we see better alignment with Corralm correspondence analysis of platforms (10x, celseq, dropseq) than with PCA or CCA. The first three principal c...(B) The first two principal...CorralM on counts, and logc...
Oct 15, 2021 16 tweets 7 min read
One of my favorite things to talk about in causal inference is the target *estimand*

🤓 Let's talk about a few different estimands and how to estimate them!

(image cred: @simongrund89) A lot of my research is in the observational study space. This basically mean that participants in the study were not randomly assigned treatments or exposures, but rather we just observe how a certain exposure affects an outcome
Oct 13, 2021 6 tweets 4 min read
Is our sample of #rstats packages tweeted about during the past week or so representative? Let's find out!

@thomas_mock has an amazing dataset of 737,466 tweets with the #rstats hashtag (with the earliest dating back over 10 years ago!)

Let's run the same code on this data! @thomas_mock @eddelbuettel pointed out that my first lines to get all #rstats packages on CRAN could be simplified by just running

r_pkgs <- rownames(available.packages())

(thanks!) r_pkgs <- rownames(availabl...
Oct 13, 2021 11 tweets 5 min read
I'm curious about the most tweeted about #rstats packages in the past ~week, let's explore this using R! 🧵

I'm going to use these packages:
*️⃣ tidyverse
*️⃣ rtweet
*️⃣ rvest
*️⃣ tidytext

you can install them with:
install.packages(c("tidyverse", "rtweet", "rvest", "tidytext")) First, let's get a vector of all #rstats packages on CRAN!

library(rvest)

r_pkgs <- read_html('cloud.r-project.org/web/packages/a…') %>%
  html_nodes('td:nth-child(1)') %>%
  html_text() library(rvest)  r_pkgs <- read_html('https://cloud.r-project
Oct 12, 2021 12 tweets 5 min read
👋 @LucyStats here! It's been a very exciting week for folks in Causal Inference with the Nobel Prize announcements, I thought it'd be neat to dive back in history to hear about a previous Nobel winner, Ronald Ross

1/ This topic is fun because it spans a whole myriad of my interests!
✔️We've got stats!
✔️We've got poetry!
✔️We've got infectious disease epidemiology!

Much of what I'll chat about today I learned from this article: pubmed.ncbi.nlm.nih.gov/12762435/

2/
Sep 30, 2021 6 tweets 5 min read
Today, I would like to share some resources on causal inference. - a thread ⬇️ I came to this topic, while working with clinicians who use IPW and matching on a daily basis (they are not familiar with double robust approaches). I don’t know for you, but I am so admirative of them as they combine their work with patients with research to advance knowledge
Sep 29, 2021 5 tweets 3 min read
Now, I would like to mention an R package, FactoMineR that I use on a daily basis to explore and visualize heterogeneous data: quantitative, categorical, with group structures, (multiple) contingency tables.

At its core, SVD! (I am also an SVD fan, @daniela_witten ;-). @daniela_witten FactoMineR is indebted to the “French School of Data Analysis” (see arxiv.org/abs/0805.2879 or juliejosse.com/data-analysis/ for historical background), a field of statistics I was trained in.
Sep 29, 2021 12 tweets 2 min read
Hello!
So today, I will share a few thoughts and advice I usually give to my PhD students. I hope this might be helpful for a wider audience, even if it is obvious and already stated by others. Anyway, as a teacher, we know repetition is important ;) - a thread ⬇️ 1)Ask questions
Ask questions
Ask questions
….
Ask questions!
I mean that: don't hesitate to ask questions in seminars (in France in particular, we don't dare enough). Be curious, don’t be shy.
Sep 28, 2021 18 tweets 9 min read
Missing Data, a thread ⬇️
Missing values are everywhere! We have listed more than 150 R packages in cran.r-project.org/web/views/Miss…
So let us give few pointers:
The method of handling missing data depends on the purpose of the analysis: estimation, completion, prediction, etc. 1) For inference with missing values, estimating as well as possible a parameter and giving a confidence interval, consider likelihood approaches (using EM algorithms) or multiple imputation
Sep 27, 2021 4 tweets 2 min read
I have been working for >10 years on missing data.
My passion for data science mainly comes from its transversality: as a statistician, we can interact with so many scientific fields!
With missing data the same is true but within statistics, as it can pop up in all its branches. When I first meet a scientist for a new project, I always start the conversation by asking “Show me the data!” to understand the underlying challenges.
So far, I have never been shown a complete dataset... (of course there might be some bias!).
Apr 30, 2021 12 tweets 2 min read
I’m going to begin today with a bold claim: Being an applied statistician is a lot like being an ethnographer. I say this both based upon years of experience working in collaborative projects and consulting and based on my experience studying ethnography. (Recall: before my PhD in statistics, I started and quit a PhD in sociology).
Apr 28, 2021 8 tweets 3 min read
Yesterday I tweeted about nested data, with multi-level models (MLM) versus OL + cluster-robust variance estimation (CRVE). This made me think about another confusion that arise, between what are called fixed versus random effects. Let’s begin with a simple relationship between a covariate X and Y in nested data, e.g. students i nested in school j. We are interested in understanding the relationship between X and Y at the student level.
Apr 27, 2021 19 tweets 4 min read
I work primarily with nested data. One example is in experiments, with students nested in schools. Another is meta-analysis, with effect sizes nested in studies. In this thread, I’ll focus on students nested in schools, but this applies more generally. Question 1: Do you need to take nesting into account in your analysis? Our world is naturally nested – students in classrooms in teachers in schools in districts and so on. Does this mean we need to take all of these levels into account? No.
Apr 26, 2021 13 tweets 6 min read
Hello everyone – I’m so excited (and nervous!) to get to tweet with you all this week. I’ll start by telling you some general things about myself. I’m an Associate Professor of Statistics at Northwestern University and a Faculty Fellow at the Institute for Policy Research. I also Co-Direct the Statistics for Evidence-Based Policy and Practice Center. For more info see here: bethtipton.com
Apr 23, 2021 12 tweets 3 min read
The #DataFeminism book also made me look inward and examine my own biases, which I am exceedingly grateful for.

Namely, it forced me to reckon with some of my fundamental operating assumptions as a statistician & data scientist.

Examples threaded below... In chapter 3, the authors discuss the role of emotion in data visualization, specifically calling out giants in the field like Edward Tufte and Alberto Cairo (no snitch tagging, please) for what is presented as an anti-emotion stance.