My Authors
Read all threads
If hundreds of scientists created predictive algorithms with high-quality data, how well would the best predict life outcomes? Not very well. Fragile Families Challenge: paper in PNAS w 112 authors doi.org/10.1073/pnas.1… & Special Collection of Socius journals.sagepub.com/topic/collecti…
We started with high-quality data. The Fragile Families and Child Wellbeing Study (@FFCWS) measured numerous domains of life for a cohort of families over many years. It has been used in more than 750 scientific papers. ffpubs.princeton.edu
We used these data in a new way: the common task method. We picked 6 outcome variables (eg GPA). Approved researchers who agreed to our terms received predictors for all families (background) & outcomes for half (training). Goal: predict outcomes they did not receive (holdout).
160 teams tried. No one was very successful. For every outcome, the best algorithm was much closer to simple guessing than it was to perfect prediction. And it was only slightly better than a 4 variable regression model (dashed).
What does an R^2_holdout of 0.2 look like? Here is the most accurate submission predicting GPA.
We thought perhaps some algorithms would predict some observations well, and other algorithms would predict other observations well. Nope. They missed pretty much the same way for all families.
For policymakers deploying predictive algorithms in high-stakes decisions, our result is a reminder of a basic fact: one should not assume that algorithms predict well. That must be demonstrated with transparent, empirical evidence.
For scientists, our result raises an understanding/prediction paradox: understanding has been generated by these data (as demonstrated by more than 750 published journal articles), yet the very same data could not yield accurate predictions.
The paradox is resolvable in at least three ways: (1) our understanding is poor, (2) prediction is a poor measure of understanding, or (3) our understanding is incomplete without a theory that points toward poor prediction. Future research is needed.
Poor predictions by 1 team could be ignored. The collective failure of 160 teams is harder to ignore. This mass collaboration illustrates a broader idea: some social research questions may be better solved collectively rather than individually. We can do more together than alone.
Paper at doi.org/10.1073/pnas.1…. Replication materials at doi.org/10.7910/DVN/CX….
Filiz Garip (@ProfFilizGarip) wrote a thoughtful commentary on our paper: What failure to predict life outcomes can teach us doi.org/10.1073/pnas.2…
The Socius special collection includes 12 papers by participants describing their approaches to the Challenge, 3 papers by our group that will be helpful to researchers creating other mass collaborations, and 1 comment.
Salganik, Lundberg, Kindel, and McLanahan. “Introduction to the Special Collection on the Fragile Families Challenge.” @msalganik @IanLundberg1 @alextkindel doi.org/10.1177/237802…
Ahearn and Brand. “Predicting Layoff among Fragile Families.” @JennieBrand1 doi.org/10.1177%2F2378…
Altschul. "Leveraging Multiple Machine Learning Techniques to Predict Major Life Outcomes from a Small Set of Psychological and Socioeconomic Variables: A Combined Bottom-Up/Top-Down Approach." @dremalt doi.org/10.1177%2F2378…
Carnegie and Wu. "Variable Selection and Parameter Tuning for BART Modeling in the Fragile Families Challenge." doi.org/10.1177%2F2378…
Compton. "A Data-Driven Approach to the Fragile Families Challenge: Prediction through Principal Components Analysis and Random Forests." doi.org/10.1177%2F2378…
Davidson. "Black-Box Models and Sociological Explanations: Predicting High School GPA Using Neural Networks." @thomasrdavidson doi.org/10.1177%2F2378…
Filippova, Gilroy, Kashyap, Kirchner, Morgan, Polimis, Usmani, and Wang. "Humans in the Loop: Incorporating Expert and Crowdsourced Knowledge for Predictions Using Social Survey Data." @anna_fil @ccgilroy @ridhikash07 @alliecmorgan @kpolimis doi.org/10.1177%2F2378…
Goode, Datta, and Ramakrishnan. "Imputing Data for the Fragile Families Challenge: Identifying Similar Survey Questions with Semi-automated Methods." @devDdata @profnaren @VT_DAC doi.org/10.1177%2F2378…
McKay. "When 4 ≈ 10,000: The Power of Social Science Knowledge in Predictive Performance." @SocialPolicy doi.org/10.1177%2F2378…
Raes. "Predicting GPA at Age 15 in the Fragile Families and Child Wellbeing Study." @TiUEconomics doi.org/10.1177%2F2378…
Rigobon, Jahani, Suhara, Al-Ghoneim, Alghunaim, Pentland, and Almaatouq. "Winning Models for GPA, Grit, and Layoff in the Fragile Families Challenge." @eamanjahani @suhara @khazgh @azizkag @alex_pentland @amaatouq doi.org/10.1177%2F2378…
Roberts. "Friend Request Pending: A Comparative Assessment of Engineering and Social Science Inspired Approaches to Analyzing Complex Birth Cohort Survey Data." doi.org/10.1177%2F2378…
Stanescu, Wang, and Yamauchi. "Using LASSO to Assist Imputation and Predict Child Wellbeing." @EHWpolisci doi.org/10.1177%2F2378…
Kindel, Bansal, Catena, Hartshorne, Jaeger, Koffman, McLanahan, Phillips, Rouhani, Vinh, and Salganik. "Improving Metadata Infrastructure for Complex Surveys: Insights from the Fragile Families Challenge." doi.org/10.1177%2F2378…
Fisher. “Data-specific Functions: A Comment on Kindel et al.” @jacob_c_fisher doi.org/10.1177%2F2378…
Liu and Salganik. “Successes and Struggles with Computational Reproducibility: Lessons from the Fragile Families Challenge.” @dayvidliu @msalganik doi.org/10.1177%2F2378…
Lundberg, Narayanan, Levy, and Salganik. "Privacy, Ethics, and Data Access: A Case Study of the Fragile Families Challenge." @IanLundberg1 @random_walker @karen_ec_levy @msalganik doi.org/10.1177%2F2378…
To promote computational reproducibility, there are Docker images for the Socius papers (see Liu and Salganik 2019) @dayvidliu @msalganik: hub.docker.com/r/2018dliu/fra…
The Fragile Families Challenge was supported by grants from the Russell Sage Foundation, NSF, and NICHD. @RussellSageFdn @NSF @NICHD_NIH
The Fragile Families Challenge builds on more than 20 years of work on the Fragile Families and Child Wellbeing Study, which was supported by grants from NICHD and a consortium of private foundations, including the Robert Wood Johnson Foundation. @ffcws
We are grateful to the Fragile Families Challenge Board of Advisers. fragilefamilieschallenge.org/#about
Thank you to everyone who participated in the Fragile Families Challenge!
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Matthew Salganik

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!