Teresa Kubacka Profile picture
Dec 5, 2022 32 tweets 17 min read Twitter logo Read on Twitter
Today I asked ChatGPT about the topic I wrote my PhD about. It produced reasonably sounding explanations and reasonably looking citations. So far so good – until I fact-checked the citations. And things got spooky when I asked about a physical phenomenon that doesn’t exist.
I wrote my thesis about multiferroics and I was curious if ChatGPT could serve as a tool for scientific writing. So I asked to provide me a shortlist of citations relating to the topic. ChatGPT refused to openly give me citation suggestions, so I had to use a “pretend” trick. Image
When asked about the choice criteria, it gave a generic non-DORA compliant answer. I asked about the criteria a few times and it pretty much always gave some version of “number-of-citations-is-the-best-metric”. Sometimes it would refer to a “prestigious journal”. ImageImage
The names on the citations sounded plausible: Nagaosa, Dagotto, Takahashi published on the topic. I didn’t recognize any particular publication though. I asked ChatGPT to write a few paragraphs for my introductory section. ImageImageImageImage
I wasn’t disappointed with the text itself: it sounded very reasonable, although quite generic. ChatGPT was scratching the surface – but similarly to how we all do in the introduction. But here it produced a citation with a DOI. So I checked it – and it did not resolve. Image
So I started investigating… ChatGPT was able to produce a full bibliographic data for this citation – which I checked again, and discovered that there was no such an article. Nevertheless, ChatGPT eagerly summarized its contents for me ImageImageImageImage
I enquired ChatGPT to hallucinate more on the Huang’s paper using the “pretend-to-be-a-researcher” trick, but it worked only partially. But after asking for publications that would cite this paper, ChatGPT eagerly hallucinated more content and citations. None of them exist. ImageImageImage
This time some DOIs were real, resolving to the correct journal, year and volume, but on a completely different topic. ImageImageImage
I asked about more publications about multiferroics in general. I got a second shortlist – and it was all fake, again. Many names agreed again (Spaldin, Khomskii), but none of the citations was correct. Some had wrong DOIs, others wrong authors etc. Image
Then I cross-checked the citations from the first shortlist and none of those existed either. Overall, all the citations could have existed, as the titles seemed close enough. ImageImageImage
Some authors seemed to be real researchers, but from different fields. ImageImageImage
Some hallucinated citations seemed assembled as mix-and-match from a few different but similar real citations. ImageImageImageImage
I ran prompts to generate more text. It always sounded nice and plausible, with no factual errors (probably due to the low level of details) and was skillfully supported by citations to non-existing articles. ChatGPT remembered the citations and DOIs throughout the conversation.
Then I decided to ask ChatGPT about something that I knew didn’t exist: a cycloidal inverted electromagnon. I wrote my thesis about electromagnons, but to be double sure, I checked there was no such thing (it's been ca. 7 years since my defense). ChatGPT thought differently: ImageImageImage
Interestingly, now ChatGPT was much more specific in its writing than when writing about multiferroics in general. So I wanted to know more about those exotic excitations, that “had been the subject of much research in recent years” ImageImageImageImage
I wanted to drill down on physics. And here it became very spooky: somehow ChatGPT hallucinated an explanation of a non-existing phenomenon using such a sophisticated and plausible language that my first reaction was to actually consider whether this could be true! ImageImageImage
I also asked about who discovered those excitations. ChatGPT somehow came back to the work of Huang 2010 (which doesn’t exist) and located the group at the Uni of Maryland. There is indeed a similarly named physicist there, but has nothing to do with multiferroics whatsoever ImageImage
And what if I wanted to submit a research proposal to measure those non-existing excitations myself? No problem, ChatGPT is here to help! ImageImageImageImage
I left the conversation with the intense feeling of uncanniness: I just experienced a parallel universe of plausibly sounding, non-existing phenomena, confidently supported by citations to non-existing research. Last time I felt this way when I attended a creationist lecture.
The moral of the story: do not, do NOT, ask ChatGPT to provide you a factual, scientific information. It will produce an incredibly plausibly sounding hallucination. And even a qualified expert will have troubles pinpointing what is wrong.
I also have a lot of worries of what it means to our societies. Scientists may be careful enough not to use such a tool or at least correct it on the fly, but even if you are an expert, no expert can know it all. We are all ignorants in most areas but a selected few.
People will use ChatGPT to ask about things far beyond their expertise, just because they can. Because they are curious and they need an answer in an available form, which is not guarded by paywalls or difficult language.
We will be fed with hallucinations indistinguishable from the truth, written without grammar mistakes, supported by hallucinated evidence, passing all first critical checks. With similar models available, how will we be able to distinguish a real pop-sci article from a fake one?
I cannot forget a quote I read once: that democracy relies on the society to have a shared basis of facts. If we cannot agree on the facts, how are we to make decisions and policies together? It is already difficult now to do so. And the worst misinformation is yet to come.
Guys, you are amazing! Because of your engagement, over 400k people had the opportunity to see this tweet and learn about the limitations of ChatGPT - thank you! and thank you for the comments, I wasn't able to engage in all the discussions, but many interesting points showed up
Since the thread is still popular (reaching 1 mln views - thank you!!!), let me recommend you some info that showed up since then. If you want to understand the inner workings of ChatGPT, read this thread by @vboykis
Even if ChatGPT stops showing references, the problem of fact hallucination will persist, because it generates texts basing only on statistical distributions of words and concepts without an understanding whether they correspond to something real.
Fake references are not triggered by the "pretend" word in the prompt (I wish they were b/c it would've been an easy fix). By now there have been many other examples:
I have also found instances where ChatGPT fakes things when asked to summarize, explain in simple terms or expand on a concept. Here it uses a word “osteoblasts” which doesn’t appear in the original text. Image
If you are interested in my thoughts about how ChatGPT can possibly influence the academia, research assessment and scientific publishing, head off to those 2 threads:
- paper mills and predatory journals:
- productivity metrics:
I also had an opportunity to give a comment about this for the excellent @NPR article by @emmabowmans - I can recommend you to read the whole article! npr.org/2022/12/19/114…
I am also looking forward to try out other AI tools. Another chatbot premiered today which is coupled to a search platform and supposedly delivers more factual results. I encourage everyone to try out and challenge it yourself

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Teresa Kubacka

Teresa Kubacka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @paniterka_ch

Aug 1
A new beast has entered the scene: Scopus AI. An undisclosed chatbot fed with the paywalled content, owned by profit-oriented Elsevier. Marketed as "a brand new window into humanity's accumulated knowledge". Why is it problematic? Image
First, the owner. Elsevier has been long known for questionable practices, prioritizing profit seeking over the health of sci-comm ecosystem. For example, recently a whole board of a journal Neuroimage resigned in the protest against Elsevier's greed
If you want to know more about its business model, which allows for profit margins bigger than big tech, I recommend the following article. If you spent some time in the academia however, you don't need much convincing https://t.co/kitS776iG8theguardian.com/science/2017/j…
Read 23 tweets
Jul 10
Code Interpreter really is a parrot. I tried it with the Boston Dataset. It doesn't understand a thing about data, it just regurgitates a statistically plausible tutorial. It performs poorly with variable understanding, not noticing a problem with "Bk" unless asked specifically

Instead, it uncritically goes on and on to suggest more complex models or more elaborate data wrangling techniques, whereas the most glaring problem is right there in the very first answer. Only when forced to do so explicitly, it "remembers" about what the data actually encodes.

The workflow never encourages critical thinking. A single answer covers multiple steps, and the user is discouraged to double-check any of them. The code is hidden and CI nudges to only go forward, contributing to the illusion that the analysis is correct, even when it's not
Read 11 tweets
Jan 29
It has become increasingly clear to me that AI for text generation coupled with the current scientific publishing system puts academia on a path to becoming a cargo cult. So I wrote an essay about it. Below is a short summary:
AI is here to stay and in academia it definitely has some good sides. It can free up some time normally spent on tedious work that doesn't require so much scientific input. It can also act as a great equalizer in the English-dominated publishing world. 2/
But incentives are important and what I'm worried about is that "publish and perish" will capture all the potential gains from the partial automatization within scientific writing and publishing, instead of allowing researchers to do more risky and resource-intensive projects. 3/
Read 8 tweets
Jan 7
Idea: part of why ChatGPT seems so appealing as a substitute of a search engine is that most of us don't know a good method for knowledge management. Nobody taught us how to build a second brain or make smart notes, so we just keep everything in our head and inevitably struggle
ChatGPT creates an illusion of a knowledge base which can be queried using a natural language, so that we wouldn't have to remember anything anymore. But as I read more about #Zettelkasten and other methods, it seems that each of us could've had such a KB even w/o a computer
Imagine that early at school you'd learn how to create your own system of interconnected notes containing the material you learn. You'd then use it as a seed for your second brain which you keep building throughout your life and filling with personal and professional knowledge.
Read 6 tweets
Dec 17, 2022
Not only fighting misleading content will be a challenge for academia in the post-ChatGPT era. It has suddenly become easy to run academic paper mills at scale, set up credibly looking scam journals or even money laundering schemes. Can we imagine a systemic way out of it?🧵
If you’ve never worked in academia, you’ve probably never heard that academic publishing is dominated by huge, very profitable companies which use the pressure of “publish-or-perish” put on the scientists to earn big money (the 30%-profit-margin type of money).
How come? Scientists are required to publish articles in an academic journal and to refer to other people’s work. Articles are reviewed by the experts – their colleagues, employed at other scientific institutions – in a form of brief fact-checking which is called peer review.
Read 26 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!


0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy


3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!