Hi folks -- time for another #AIhype take down + analysis of how the journalistic coverage relates to the underlying paper. The headline for today's lesson:
At first glance, this headline seems to be claiming that from text messages (whose? accessed how?) an "AI" can detect mental health issues as well as human psychiatrists do (how? based on what data?).
/2
Let's pause to once again note the use of "AI" in this way suggests that "artificial intelligence" is a thing that exists. Always useful to replace that term with "mathy math" or SALAMI for a reality check.
/3
Okay, back to the article. Odd choice here to start off with a call out to the Terminator which is carefully denied but still alluded to. Also odd to refer to what is (as we'll see) a text classification system as a "robot", again strengthening the allusion to the Terminator.
/6
Alright, so what does "potential signs of worsening mental illness" mean, and what were these text messages, and how did they get them? Time to go look at the underlying article.
/7
The study that the article actually reports on involves text messages (collected under informed consent) between patients and their therapists which were annotated by two "clinically trained annotators" for whether or not they reflect cognitive distortion (& of what type).
Then they trained a few different text classification algorithms on that annotated dataset and measured how well they did at replicating the labels on a portion held out as a test set.
/10
You could well be forgiven for reading "identify potential signs of worsening mental illness" as "detect worsening mental illness" and the "on par with human psychiatrists" in the headline & this para meaning on par with what human psychiatrists do when diagnosing patients.
/11
But no: What this study did was have two annotators annotate text messages from patients they were not treating, measured their agreement with each other (a not very impressive κ=0.51) and then measured how well the text classifiers could replicate those annotations.
/12
As an aside: I don't doubt that the expertise of these annotators (one with a master's degree in psychology and one who is a licensed clinical mental health counselor) is relevant. However, it is still misleading to refer to them as "psychiatrists".
/13
It's a really important detail that the annotators weren't working with text messages from patients they treat.
/14
This means that they were working with very little context: saying a machine does as well as them in this task would seem to have very little bearing on whether it would be appropriate to have a machine do this task.
/15
In other words, we could ask: Under what circumstances would we want to have (even clinically trained) humans screening text messages from people they have no relationship to (therapeutic or otherwise) to try to find such signs?
/16
I'm guessing not many. But in that case, what's the value of using an automated system that has as an upper bound the accuracy of the humans in that case?
/17
Another misleading statement in the article: These were not "everyday text messages" (which suggests, say, friends texting each other) but rather texts between patients and providers (with consent) in a study.
/18
Next, let's compare what the peer reviewed article has to say about the purpose of this tech with what's in the popular press coverage. The peer reviewed article says only: could be something to help clinicians take action.
/19
In the popular press article, on the other hand we get instead a suggestion of developing surveillance technology, that would presumably spy not just on the text messages meant for the clinician, but everything a patient writes.
/20
Note that in this case, the source of the hype lies not with the journalist but (alas) with one of the study authors.
/21
Another one of the authors comes in with some weird magical thinking about how communication works. Why in the world would text messages (lacking all those extra context clues) be a *more* reliable signal?
/22
In sum: It seems like here the researchers are way overselling what their study did (to the press, but not in the peer reviewed article) and the press is happily picking it up.
/fin
Coda: @holdspacefree illustrates the importance of reading the funding disclosures. The researchers giving the hype-laden quotes to the media weren't just being naive. They're selling something.
@holdspacefree This news story started off life as a press release from @UWMedicine 's @uwmnewsroom who I think should also have disclosed the financial COI that was in the underlying study.
@mathbabedotorg I do think there's a positive role for shame in this case --- shame here is reinforcing community values against "experimenting" with vulnerable populations without doing due diligence re research ethics.
>>
It seems that part of the #BigData#mathymath#ML paradigm is that people feel entitled to run experiments involving human subjects who haven't had relevant training in research ethics—y'know computer scientists bumbling around thinking they have the solutions to everything. >>
There's a certain kind of techbro who thinks it's a knock-down argument to say "Well, you haven't built anything". As if the only people whose expertise counts are those close to the machine. I'm reminded (again) of @timnitGebru 's wise comments on "the hierarchy of knowledge".>>
I've been pondering some recently about where that hierarchy comes from. It's surely reinforced by the way that $$ (both commercial and, sadly, federal research funds) tends to flow --- and people mistaking VCs, for example, as wise decision makers.
>>
But I also think that some of it has roots in the way different subjects are taught. Math & CS are both (frequently) taught in very gate-keepy ways (think weeder classes) and also students are evaluated with very cut & dried exams.
Trying out You.com because people are excited about their chat bot. First observation: Their disclaimer. Here's this thing we're putting up for everyone to use while also knowing (and saying) that it actually doesn't work.
Second observation: The footnotes, allegedly giving the source of the information provided in chatbot style, are difficult to interpret. How much of that paragraph is actually sourced from the relevant page? Where does the other "info" come from?
A few of the queries I tried returned paragraphs with no footnotes at all.
Chatbots-as-search is an idea based on optimizing for convenience. But convenience is often at odds with what we need to be doing as we access and assess in formation.
We're seeing multiple folks in #NLProc who *should know better* bragging about using #ChatGPT to help them write papers. So, I guess we need a thread of why this a bad idea:
>>
1- The writing is part of the doing of science. Yes, even the related work section. I tell my students: Your job there is show how your work is building on what has gone before. This requires understanding what has gone before and reasoning about the difference.
>>
The result is a short summary for others to read that you the author vouch for as accurate. In general, the practice of writing these sections in #NLProc (and I'm guessing CS generally) is pretty terrible. But off-loading this to text synthesizers is to make it worse.
@willknight The 1st is somewhat subtle. Saying this ability has been "unlocked" paints a picture where there is a pathway to some "AI" and what technologists are doing is figuring out how to follow that path (with LMs, no less!). SciFi movies are not in fact documentaries from the future. >>
@willknight Far more problematic is the closing quote, wherein Knight returns to the interviewee he opened with (CEO of a coding tools company) and platforms her opinions about "AI" therapists.