Not only fighting misleading content will be a challenge for academia in the post-ChatGPT era. It has suddenly become easy to run academic paper mills at scale, set up credibly looking scam journals or even money laundering schemes. Can we imagine a systemic way out of it?🧵
If you’ve never worked in academia, you’ve probably never heard that academic publishing is dominated by huge, very profitable companies which use the pressure of “publish-or-perish” put on the scientists to earn big money (the 30%-profit-margin type of money).
How come? Scientists are required to publish articles in an academic journal and to refer to other people’s work. Articles are reviewed by the experts – their colleagues, employed at other scientific institutions – in a form of brief fact-checking which is called peer review.
So far so good. But if you look at it in detail, it is the money flow in scientific publishing which is peculiar. If you work as a journalist, you are typically paid for what you publish. In science it’s reversed: you pay to get published!
And you pay for more than that: the articles are paywalled, so you additionally have to pay to read about what your colleagues have worked on. And if you want to publish an article as Open Access – meaning that everyone can read it – you must pay extra.
For a publisher it’s perfect: scientists write content for free, review it for free, and pay to publish and to read it. The fees are typically covered by the institution or a grant. So it’s a neat, massive transfer of taxpayers’ money just to keep the system going.
Let’s talk numbers. It can cost up to 20 000 EUR to publish a single article. @open_apc collects a sample of data from acad.institutions. They refer that in 2020, institutions paid 38 mln EUR to publish 22 000 articles. To compare, over 6 600 000 articles were published that year
Think how much revenue that is in total. Given that the costs are slim (scientists pay to be published, pay to read and their salaries are covered by their institutions), potential for profit is enormous. Big publishers operate with a profit margin comparable to big tech.
The scheme is so lucrative because scientists cannot stop publishing or easily set up new journals. Career progress in academia critically depends on how many times you publish and how many times your colleagues refer to your work. Productivity is too often mistaken for quality
Given that permanent positions in academia are very scarce (less than 10%), an army of temporary workers with short-term contracts must compete for them in an intellectual version of Hunger Games. Engaging in activities not resulting in a publication puts you at a disadvantage.
The system is based on trust on one side (it is assumed that you write about a real scientific experiment and have used correct methods in your research) and fierce competition on the other, so it creates a whole lot of false incentives for unethical practices.
If you were a bad actor, you could offer a struggling scientist a place on an author list of a fake machine-generated publication for a fee. This is how paper mills operate. With ChatGPT, those mills have a tool to generate articles which will be much more difficult to detect
Another way to make a small fortune for yourself in this system is to set up a scam journal. It could be a journal with a new name, pro-actively approaching scientists and offering to publish literally anything for a fee. We call them predatory journals.
Other scam journals re-use a name of an existing journal, set up a fake website and wait for a prey to approach them, so that they can capture the publication fee. We call them hijacked journals. Read the work of @AbalkinaAnna if you want to know more.
With ChatGPT, such journals could generate high-quality fake content on scale – fake submissions, fake peer-reviews and more professionally looking websites – to look much more legitimate than now, making the task of identifying them much more difficult.
Now if you were a very evil actor with sophisticated taste and had money to launder, you could even couple the two: create a big volume of plausibly looking fake content and sell it to a predatory journal you own!
And then there is this. If you had an agenda that is not in line with the scientific consensus – maybe a belief that the Earth was created 4000 years ago – you could always set up a new journal, fill it with the relevant content and cite yourself in other articles or on Wikipedia
Once you have your own journal, you can mix citations to real and fake evidence to create an illusion of knowledge how you want it. Or have it indexed in a library. I don’t make this up. Have you heard about the Journal of Creation yet?
No longer you need to sponsor “scientific” studies to prove that smoking tobacco or eating refined sugar is actually really great for your health. Or that burning fossil fuels does great things for the planet. Just fake it all at scale.
You could even DDOS a journal that you dislike by overloading the editors with fake but plausible submissions, rendering it impossible to review manuscripts in a reasonable time, and to make them scrape quality control.
Machine translation has enabled paper mills to auto-generate fraudulent content on scale, but it is still detectable because of weird phrases. With ChatGPT, which has a really friendly UI and a high-quality output, we are entering a completely new level of scientific fraud.
Academia will have a lot of rethinking to do: how to defend from such a fraud? Is the peer-review alone able to spot such articles? Is the current process able to spot mistakes introduced by text-generation tools even if the research is honest?
How effective is it already now? (let me reiterate, there are over 6.5 million new articles published every year)
Should we continue relying on publishing texts as the golden standard for scientific communication? Can we imagine certifying validity of scientific discoveries in a better way than submitting a PDF for a fee and asking a bunch of overworked experts to read it in their free time?
Or even twist it another way round: imagine a solution where the research result itself is a validated and structured information about the setup, measurements etc. stored in a knowledge graph like @orkg_org, and we use a text-generating tool to narrate a human-friendly version?
How important it is to write scientific text anyway? How much of it could we ethically outsource to an AI assistant? And how much critical thinking happens only _while_ we write? And last but not least, where to find time and resources in academia to re-imagine all of this?
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Today I asked ChatGPT about the topic I wrote my PhD about. It produced reasonably sounding explanations and reasonably looking citations. So far so good – until I fact-checked the citations. And things got spooky when I asked about a physical phenomenon that doesn’t exist.
I wrote my thesis about multiferroics and I was curious if ChatGPT could serve as a tool for scientific writing. So I asked to provide me a shortlist of citations relating to the topic. ChatGPT refused to openly give me citation suggestions, so I had to use a “pretend” trick.
When asked about the choice criteria, it gave a generic non-DORA compliant answer. I asked about the criteria a few times and it pretty much always gave some version of “number-of-citations-is-the-best-metric”. Sometimes it would refer to a “prestigious journal”.