Can machine learning outperform baseline logistic regression for predicting complex social phenomena? Many prominent papers have claimed highly accurate civil war prediction. In a systematic review, @sayashk and I find these claims invalid due to errors. reproducible.cs.princeton.edu
We are not political scientists and the main point of our paper is not about civil war. Rather, we want to sound the alarm about an oncoming wave of reproducibility crises and overoptimism across many scientific fields adopting machine learning methods. We have an ongoing list:
Incidentally, we learned about one of the systematic surveys in the above list because it found pitfalls in a paper coauthored by me. Yup, even researchers whose schtick is skepticism of AI/ML are prone to overoptimism when they use ML methods. Such is the allure of AI.
We don’t attribute reproducibility failures to the carelessness of individual researchers. We view applied-ML research as methodologically immature and far more prone to pitfalls than applied-stats research. For now, we must treat applied-ML research findings with caution.
Last year I offered a course on limits to prediction with @msalganik. In general we found that fancy ML techniques offer little benefit over traditional statistics for predicting social outcomes, and offered 9 hypotheses that might together explain why. msalganik.github.io/cos597E-soc555…
My coauthor @sayashk took the course and was intrigued by claims about civil war prediction that seemed to contradict this general trend. He dug deeper and found widespread methodological flaws that seem to have led to a feedback loop of overoptimism about machine learning.
To be clear, our work is a pre-print and the authors have not yet had a chance to publicly respond. We invite scrutiny of our claims; our Supplement and reproduction materials are available on the website. reproducible.cs.princeton.edu
Although ML is unlikely to produce leaps in accuracy for predicting social outcomes, small gains are often possible. If so, how do we weigh these benefits against drawbacks such as explainability? We hope to have more to say about this critical policy question. Stay tuned.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
In my dream version of the scientific enterprise, everyone who works on X would be required to spend some percentage of their time learning and contributing to the philosophy of X. There is too much focus on the "how" and too little focus on the "why" and the "what are we even".
Junior scholars entering a field naturally tend to ask critical questions as they aren't yet inculcated into the field's dogmas. But the academic treadmill leaves them little time to voice concerns & their lack of status means that even when they do, they aren't taken seriously.
One possible intervention is for journals and conferences to devote some fraction of their pages / slots to self-critical inquiry, and for dissertation committees to make clear that they will value this type of scholarship just as much as "normal" science.
We shouldn't shrug off dark patterns as simply sleazy sales online, or unethical nudges, or business-as-usual growth hacking. Dark patterns are distinct and powerful because they combine all three in an effort to extract your money, attention, and data. queue.acm.org/detail.cfm?id=…
At first growth hacking was about… growth, which was merely annoying for the rest of us. But once a platform has a few billion users it must "monetize those eyeballs". So growth hackers turned to dark patterns, weaponizing nudge research and A/B testing. queue.acm.org/detail.cfm?id=…
I study the risks of digital tech, especially privacy. So people are surprised to hear that I’m optimistic about tech’s long term societal impact. But without optimism and the belief that you can create change with research & advocacy, you burn out too soon in this line of work.
9 years ago I was on the academic job market. The majority of professors I met asked why I chose to work on privacy since—as we all know—privacy is dead because of the Internet and it's pointless to fight it. (Computer scientists tend to be technological determinists, who knew?!)
At fist I didn't expect that "why does your research field exist?" would be serious, recurring question. Gradually I came up with a pitch that at least got interviewers to briefly suspend privacy skepticism and hear about my research. (That pitch is a story for another day.)
The news headlines *undersold* this paper. Widely-used machine learning tool for sepsis prediction found to have an AUC of 0.63 (!), adds little to existing clinical practice. Misses two thirds of sepsis cases, overwhelms physicians with false alerts. jamanetwork.com/journals/jamai…
This adds to the growing body of evidence that machine learning isn't good at true prediction tasks as opposed to "prediction" tasks like image classification that are actually perception tasks.
Worse, in prediction tasks it's extremely easy to be overoptimistic about accuracy through careless problem framing. The sepsis paper found that the measured AUC is highly sensitive to how early the prediction is made—it can be accurate, or clinically useful, but not both.
Academia rewards clever papers over real world impact. That makes it less useful. But it also perpetuates privilege—those with less experience of injustice find it easier to play the game, i.e. work on abstruse problems while ignoring topics that address pressing needs.
I have no beef with fundamental research (which isn't motivated by applications). But most scholarship that *claims* to be motivated by societal needs happens with little awareness of what those needs actually are, and no attempt to step outside academia to actually make change.
Like many of academia's problems, this one is structural. Telling individual scholars to do better is unlikely to work when the incentives are all messed up. Here are some thoughts on what might work. I'd love to hear more.
A student who's starting grad school asked me which topics in my field are under-explored. An important question! But not all researchers in a community will agree on the answers. If they did, those topics won't stay under-explored for long. So how to pick problems? [Thread]
It's helpful for researchers to develop a "taste" for problems based on their specific skills, preferences, and hypotheses about systemic biases in the research community that create blind spots. I shared two of my hypotheses with the student, but we must each develop our own.
Hypothesis 1: interdisciplinary topics are under-explored because it requires researchers to leave their comfort zones. But collaboration is a learnable skill, so if one can get better at it and find suitable collaborators, rich and important research directions await.