Hi folks -- time for another #AIhype take down + analysis of how the journalistic coverage relates to the underlying paper. The headline for today's lesson:


/1 Screencap of headline of th...
At first glance, this headline seems to be claiming that from text messages (whose? accessed how?) an "AI" can detect mental health issues as well as human psychiatrists do (how? based on what data?).

Let's pause to once again note the use of "AI" in this way suggests that "artificial intelligence" is a thing that exists. Always useful to replace that term with "mathy math" or SALAMI for a reality check.

Okay, back to the article. Odd choice here to start off with a call out to the Terminator which is carefully denied but still alluded to. Also odd to refer to what is (as we'll see) a text classification system as a "robot", again strengthening the allusion to the Terminator.

/6 Screencap from article, rea...
Alright, so what does "potential signs of worsening mental illness" mean, and what were these text messages, and how did they get them? Time to go look at the underlying article.

The study that the article actually reports on involves text messages (collected under informed consent) between patients and their therapists which were annotated by two "clinically trained annotators" for whether or not they reflect cognitive distortion (& of what type).

/8 Screencap: "Data were ...Screencap: "We created...
Then they trained a few different text classification algorithms on that annotated dataset and measured how well they did at replicating the labels on a portion held out as a test set.

You could well be forgiven for reading "identify potential signs of worsening mental illness" as "detect worsening mental illness" and the "on par with human psychiatrists" in the headline & this para meaning on par with what human psychiatrists do when diagnosing patients.

/11 Screencap: "According ...
But no: What this study did was have two annotators annotate text messages from patients they were not treating, measured their agreement with each other (a not very impressive κ=0.51) and then measured how well the text classifiers could replicate those annotations.

As an aside: I don't doubt that the expertise of these annotators (one with a master's degree in psychology and one who is a licensed clinical mental health counselor) is relevant. However, it is still misleading to refer to them as "psychiatrists".

It's a really important detail that the annotators weren't working with text messages from patients they treat.

This means that they were working with very little context: saying a machine does as well as them in this task would seem to have very little bearing on whether it would be appropriate to have a machine do this task.

In other words, we could ask: Under what circumstances would we want to have (even clinically trained) humans screening text messages from people they have no relationship to (therapeutic or otherwise) to try to find such signs?

I'm guessing not many. But in that case, what's the value of using an automated system that has as an upper bound the accuracy of the humans in that case?

Another misleading statement in the article: These were not "everyday text messages" (which suggests, say, friends texting each other) but rather texts between patients and providers (with consent) in a study.

Next, let's compare what the peer reviewed article has to say about the purpose of this tech with what's in the popular press coverage. The peer reviewed article says only: could be something to help clinicians take action.

/19 Screencap: "Conclusion...
In the popular press article, on the other hand we get instead a suggestion of developing surveillance technology, that would presumably spy not just on the text messages meant for the clinician, but everything a patient writes.

/20 Screencap: "The AI cou...
Note that in this case, the source of the hype lies not with the journalist but (alas) with one of the study authors.

Another one of the authors comes in with some weird magical thinking about how communication works. Why in the world would text messages (lacking all those extra context clues) be a *more* reliable signal?

/22 Screencap: "“When we'r...
In sum: It seems like here the researchers are way overselling what their study did (to the press, but not in the peer reviewed article) and the press is happily picking it up.

Coda: @holdspacefree illustrates the importance of reading the funding disclosures. The researchers giving the hype-laden quotes to the media weren't just being naive. They're selling something.

@holdspacefree This news story started off life as a press release from @UWMedicine 's @uwmnewsroom who I think should also have disclosed the financial COI that was in the underlying study.


• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with @emilymbender@dair-community.social on Mastodon

@emilymbender@dair-community.social on Mastodon Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @emilymbender

Jan 9
In the context of the Koko/GPT-3 trainwreck I'm reminded of @mathbabedotorg 's book _The Shame Machine_ penguinrandomhouse.com/books/606203/t…

@mathbabedotorg I do think there's a positive role for shame in this case --- shame here is reinforcing community values against "experimenting" with vulnerable populations without doing due diligence re research ethics.

It seems that part of the #BigData #mathymath #ML paradigm is that people feel entitled to run experiments involving human subjects who haven't had relevant training in research ethics—y'know computer scientists bumbling around thinking they have the solutions to everything. >>
Read 5 tweets
Dec 27, 2022
There's a certain kind of techbro who thinks it's a knock-down argument to say "Well, you haven't built anything". As if the only people whose expertise counts are those close to the machine. I'm reminded (again) of @timnitGebru 's wise comments on "the hierarchy of knowledge".>>
I've been pondering some recently about where that hierarchy comes from. It's surely reinforced by the way that $$ (both commercial and, sadly, federal research funds) tends to flow --- and people mistaking VCs, for example, as wise decision makers.

But I also think that some of it has roots in the way different subjects are taught. Math & CS are both (frequently) taught in very gate-keepy ways (think weeder classes) and also students are evaluated with very cut & dried exams.

Read 20 tweets
Dec 24, 2022
Trying out You.com because people are excited about their chat bot. First observation: Their disclaimer. Here's this thing we're putting up for everyone to use while also knowing (and saying) that it actually doesn't work. Screencap from You.com. Under the box that says "Ask me
Second observation: The footnotes, allegedly giving the source of the information provided in chatbot style, are difficult to interpret. How much of that paragraph is actually sourced from the relevant page? Where does the other "info" come from? Screencap of YouChat's response to "how do I avoid gett
A few of the queries I tried returned paragraphs with no footnotes at all.

Read 5 tweets
Dec 24, 2022
Chatbots are not a good replacement for search engines

Chatbots are not a good UI design for information access needs

Chatbots-as-search is an idea based on optimizing for convenience. But convenience is often at odds with what we need to be doing as we access and assess in formation.

Read 6 tweets
Dec 14, 2022
We're seeing multiple folks in #NLProc who *should know better* bragging about using #ChatGPT to help them write papers. So, I guess we need a thread of why this a bad idea:

1- The writing is part of the doing of science. Yes, even the related work section. I tell my students: Your job there is show how your work is building on what has gone before. This requires understanding what has gone before and reasoning about the difference.

The result is a short summary for others to read that you the author vouch for as accurate. In general, the practice of writing these sections in #NLProc (and I'm guessing CS generally) is pretty terrible. But off-loading this to text synthesizers is to make it worse.

Read 9 tweets
Dec 7, 2022
I appreciated the chance to have my say in this article by @willknight but I need to push back on a couple of things:



#ChatGPT #LLM #MathyMath
@willknight The 1st is somewhat subtle. Saying this ability has been "unlocked" paints a picture where there is a pathway to some "AI" and what technologists are doing is figuring out how to follow that path (with LMs, no less!). SciFi movies are not in fact documentaries from the future. >> Screenshot from linked arti...
@willknight Far more problematic is the closing quote, wherein Knight returns to the interviewee he opened with (CEO of a coding tools company) and platforms her opinions about "AI" therapists.

>> Screencap: Reddy, CEO of Ab...Screencap: Reddy, the AI st...
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!


0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy


3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!