Will Doyle Profile picture
Professor of Higher Education, Vanderbilt University

Jan 21, 2020, 12 tweets

I'm excited to share a project I'm working with @ozanjaquette @karinaGsalazar @btskinner and Patricia Martin. We're looking at algorithmic bias in enrollment management. The project is available here: github.com/eddatasci/unro…
1/n

We're working on this in a really different way-- inspired by @drob's talk on "Unreasonable effectiveness of public work" (tinyurl.com/ugggdkv) we're posting everything that we do publicly on github. 2/n

Please feel free to comment and suggest improvements or changes! We're working in #rstats, using the #tidyverse as the basis for much of the work. 3/n

In addition, we're making heavy use of the @topepos #tidymodels approach to modeling. It's kind of amazing what's been done to streamline the approach to preprocessing data and implementing different models. 4/n

I've also learned a lot by reading about the approach to development that @EmilyRiederer details here: emilyriederer.netlify.com/post/rmarkdown… 5/n

Our first step in the project was to get something (anything!) up and running using the tidymodels framework. Starting in rmarkdown we downloaded NCES ELS data and did some basic wrangling 6/n

Then we got the data structured for cross validation using mc_cv from rsample, and applied the recipe function from tidymodels using the approach laid out here brodrigues.co/blog/2018-11-2… from @brodriguesco Super helpful! 7/n

The result was a "dataset of datasets" split training/testing, with predictions for the testing datasets. Since we're predicting graduation as 0/1 we used AUC as a measure of accuracy. We generated a distribution of AUC from the cv data (it's not good) 8/n

That's the first iteration. Now we're working on converting the different pieces to functions, so we can create modularized chunks, most of which will be functions. We can then implement a bunch of different approaches to prediction. 9/n

The first goal is to come up with some reasonably accurate predictions of graduation based on student characteristics. The second goal is to simulate what would happen if these predictions were used in different ways by decisionmakers 10/n

Who would benefit when certain decisions are made based on these predictions? Who might be hurt? In many ways this work is inspired by the work of @Irisonhighered and others on the use of predictive analytics in higher ed: tinyurl.com/qssfhry 11/n

I've benefited a ton from others being willing to post their work and explain what they've learned, and I definitely feel an obligation to pay that forward, even if most of what I can share is the mistakes I make along the way. /End

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling