Teresa Kubacka Profile picture
Aug 1 23 tweets 7 min read Twitter logo Read on Twitter
A new beast has entered the scene: Scopus AI. An undisclosed chatbot fed with the paywalled content, owned by profit-oriented Elsevier. Marketed as "a brand new window into humanity's accumulated knowledge". Why is it problematic? Image
First, the owner. Elsevier has been long known for questionable practices, prioritizing profit seeking over the health of sci-comm ecosystem. For example, recently a whole board of a journal Neuroimage resigned in the protest against Elsevier's greed
If you want to know more about its business model, which allows for profit margins bigger than big tech, I recommend the following article. If you spent some time in the academia however, you don't need much convincing https://t.co/kitS776iG8theguardian.com/science/2017/j…
Elsevier's database of scholarly metadata, Scopus, is an important resource in academia, esp. for 2 major use-cases: research assessment & literature search. It is, however, a big black box in terms of data curation and algorithms. And it's very expensive. https://t.co/beyabh1dbbncbiotech.org/sites/default/…
Elsevier has probably carefully observed the development of new AI-powered tools like @scite and @scispace_ . Elsevier's advantage is that they sit on a trove of text data, not accessible otherwise (that is, full-texts of Elsevier articles and all the abstracts from Scopus).
Some of this data is accessible through open data sources like @OpenAlex_org and @open_abstracts @SemanticScholar but getting the texts from publishers is generally a pain, because the publishers as a rule are not willing to give their IP for free if they can make profit!
So open source / external providers will not have the same training data to compete with Scopus. And even if you buy a copy of this data from Scopus, you would generally have a very hard time in negotiating a license to build an application meant to compete with them.
Now, Scopus' apparent mission is to provide trustworthy, comprehensible product to improve academic work and research discoverability. But Elsevier's bigger mission is to make money off academia. It is appaling to see how they intersect in the newest chatbot offering.
"Scopus AI", how it is called, is riding on the generative AI wave, probably hoping to chip off some profit that OpenAI and others are making. Of course, in all LLM applications, hallucinations are a problem. Galactica and ChatGPT both failed at this
Scopus needs to appear credible, so what they claim is that their "advanced engineering limits the risk of hallucinations" and "and taps into the trustworthy and verified knowledge". But how do they do it? What is this advanced engineering? We don't know. https://t.co/ZbU5UME3Dsbeta.elsevier.com/products/scopu…
We actually know virtually nothing about the technology, the model, the curation process, the feedback loops with the experts they claim to be in place. No model card, no metrics, nothing. They claim the LLM usage is private, so maybe they use an open source model, but that's all Image
Let's just savor it for a moment: "our AI is using latest LLMs and other technologies, in combination with our own technology". How nice and transparent they are: they use hardware and software to create the product! sadly, their competitors aren't better
But there is also another huge problem with the bold claim they are making: that Scopus AI is a window into humanity's accumulated knowledge. But they don't access humanity's knowledge! It's only titles+abstracts of a selection of articles published in some peer-reviewed journals
Has all the knowledge been published in peer-reviewed journals? What about the knowledge contained in all the other forms of writing? about the knowledge not contained in writing at all? who and what is overrepresented, and who is underrepresented? @timnitGebru @emilymbender
Which scientific knowledge didn't make it to Scopus? which negative results are missing? which results were denied publication by being "not novel enough" by the peer review process? which of them were presented to be more novel than they are, to pass through this novelty bias?
Then, if Scopus uses black-box curation for their journal list, does it encompass all the scientific knowledge from the world? No - we know well that their data doesn't represent the world proportionally (slide by @stefhaustein and @RodrigoCostas1 ) https://t.co/aJE2bPB3txzenodo.org/record/7987872
To be honest, we don't even know if they really use all the available abstracts or if they amplify the Elsevier content somehow, which would sneakily boost the citations to the Elsevier-published articles and drive more users to their webpages (b/c they also sell readership data)
And last but not least: they don't even use full-texts, but the abstract and titles only. If you ever read a few scientific publications, how well did the abstract represent the actual findings from the article? all the finesse, details, critical view - this is all missing.
But of course, it is not in Elsevier's motivation to reflect those limitations to the user. What they want is a trust-washed chatbot, marketed especially for less experienced researchers, locking future generations of scientists even further into Elsevier's ecosystem.
ScopusAI is now in alpha, but it'll be a paid product. I can imagine it may be hard for the universities to resist this offering: ChatGPT may be hallucinating, but Elsevier's AI tells The Truth. And AI means gold in the toxic culture of publish or perish
We already see that the chatbot LLMs destroy the online ecosystems like Stack Overflow by redirecting the traffic, and inject bullshit into seriously sounding texts, which becomes pretty much undetectable. How will they impact scientific publishing?
So I really feel it's the worst of both worlds. It's not that I'm worried. I'm just very angry at this point. Knowing all that we know now about LLMs, scientific communication, creation of knowledge, biases, open science - let's just destroy it all in the race for profit.
#scicomm #AcademicTwitter @aarontay @scholarlykitchn @PaoloCrosetto @ETHBibliothek @OpenAcademics @RetractionWatch

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Teresa Kubacka

Teresa Kubacka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @paniterka_ch

Jul 10
Code Interpreter really is a parrot. I tried it with the Boston Dataset. It doesn't understand a thing about data, it just regurgitates a statistically plausible tutorial. It performs poorly with variable understanding, not noticing a problem with "Bk" unless asked specifically

Instead, it uncritically goes on and on to suggest more complex models or more elaborate data wrangling techniques, whereas the most glaring problem is right there in the very first answer. Only when forced to do so explicitly, it "remembers" about what the data actually encodes.

The workflow never encourages critical thinking. A single answer covers multiple steps, and the user is discouraged to double-check any of them. The code is hidden and CI nudges to only go forward, contributing to the illusion that the analysis is correct, even when it's not
Read 11 tweets
Jan 29
It has become increasingly clear to me that AI for text generation coupled with the current scientific publishing system puts academia on a path to becoming a cargo cult. So I wrote an essay about it. Below is a short summary:
AI is here to stay and in academia it definitely has some good sides. It can free up some time normally spent on tedious work that doesn't require so much scientific input. It can also act as a great equalizer in the English-dominated publishing world. 2/
But incentives are important and what I'm worried about is that "publish and perish" will capture all the potential gains from the partial automatization within scientific writing and publishing, instead of allowing researchers to do more risky and resource-intensive projects. 3/
Read 8 tweets
Jan 7
Idea: part of why ChatGPT seems so appealing as a substitute of a search engine is that most of us don't know a good method for knowledge management. Nobody taught us how to build a second brain or make smart notes, so we just keep everything in our head and inevitably struggle
ChatGPT creates an illusion of a knowledge base which can be queried using a natural language, so that we wouldn't have to remember anything anymore. But as I read more about #Zettelkasten and other methods, it seems that each of us could've had such a KB even w/o a computer
Imagine that early at school you'd learn how to create your own system of interconnected notes containing the material you learn. You'd then use it as a seed for your second brain which you keep building throughout your life and filling with personal and professional knowledge.
Read 6 tweets
Dec 17, 2022
Not only fighting misleading content will be a challenge for academia in the post-ChatGPT era. It has suddenly become easy to run academic paper mills at scale, set up credibly looking scam journals or even money laundering schemes. Can we imagine a systemic way out of it?🧵
If you’ve never worked in academia, you’ve probably never heard that academic publishing is dominated by huge, very profitable companies which use the pressure of “publish-or-perish” put on the scientists to earn big money (the 30%-profit-margin type of money).
How come? Scientists are required to publish articles in an academic journal and to refer to other people’s work. Articles are reviewed by the experts – their colleagues, employed at other scientific institutions – in a form of brief fact-checking which is called peer review.
Read 26 tweets
Dec 5, 2022
Today I asked ChatGPT about the topic I wrote my PhD about. It produced reasonably sounding explanations and reasonably looking citations. So far so good – until I fact-checked the citations. And things got spooky when I asked about a physical phenomenon that doesn’t exist.
I wrote my thesis about multiferroics and I was curious if ChatGPT could serve as a tool for scientific writing. So I asked to provide me a shortlist of citations relating to the topic. ChatGPT refused to openly give me citation suggestions, so I had to use a “pretend” trick. Image
When asked about the choice criteria, it gave a generic non-DORA compliant answer. I asked about the criteria a few times and it pretty much always gave some version of “number-of-citations-is-the-best-metric”. Sometimes it would refer to a “prestigious journal”. ImageImage
Read 32 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!


0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy


3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!