Code Interpreter really is a parrot. I tried it with the Boston Dataset. It doesn't understand a thing about data, it just regurgitates a statistically plausible tutorial. It performs poorly with variable understanding, not noticing a problem with "Bk" unless asked specifically
Instead, it uncritically goes on and on to suggest more complex models or more elaborate data wrangling techniques, whereas the most glaring problem is right there in the very first answer. Only when forced to do so explicitly, it "remembers" about what the data actually encodes.
The workflow never encourages critical thinking. A single answer covers multiple steps, and the user is discouraged to double-check any of them. The code is hidden and CI nudges to only go forward, contributing to the illusion that the analysis is correct, even when it's not
You can basically type "yes" and "what do you recommend" all the way through, and CI will gladly create a very sophisticated, very discriminatory model, even pretending it's all good and fair (unless you really insist it's not), and it will happily assist you in deploying it.
What I'm angry about is that this is the exact opposite of what the Explainable AI movement has been trying to achieve. Code Interpreter is the ultimate black box AI, which you cannot debug, cannot tweak, cannot rely on, but which very confidently delivers "interesting insights".
Its usefulness is limited when dealing with scrambled files as well. On the first try, the preprocessing seemed pretty impressive, but upon inspection, it turned out CI skipped the first two lines of data silently, and tried to convince me with a useless check that all is ok.
I wasn't able to make it explain how exactly it parsed the file header, although it came up with a pretty sophisticated text. So I created a second file, where I added 1 line to the header commentary, to check if it really is as robust as it initially seemed.
Then CI failed miserably: it locked onto skipping the same # of lines as before (has it been harcoded in the training data?), scrambling the data table, it confused the last line of header with the last line of df, happily "removed missing data" and said the data is now clean.
Additional plot twist is that the line I added contained an additional instruction. Which was actually picked up and evaluated. So prompt injection attacks inside datasets are possible.
So long story short, if you're a person who doesn't know how to analyze data, you're still better off with a transparent solution where you just click and make some groupbys. If you use CI, sooner or later you'll run into a problem which will likely backfire onto your credibility
PS. I'm also quite annoyed that the Code Interpreter training data has been so heavily polluted with the outdated @matplotlib API. The MPL team has put such a lot of effort to make it much better including creating better docs, and now we are back at step 1.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
It has become increasingly clear to me that AI for text generation coupled with the current scientific publishing system puts academia on a path to becoming a cargo cult. So I wrote an essay about it. Below is a short summary: 1/ lookalikes.substack.com/p/publish-or-p…
AI is here to stay and in academia it definitely has some good sides. It can free up some time normally spent on tedious work that doesn't require so much scientific input. It can also act as a great equalizer in the English-dominated publishing world. 2/
But incentives are important and what I'm worried about is that "publish and perish" will capture all the potential gains from the partial automatization within scientific writing and publishing, instead of allowing researchers to do more risky and resource-intensive projects. 3/
Idea: part of why ChatGPT seems so appealing as a substitute of a search engine is that most of us don't know a good method for knowledge management. Nobody taught us how to build a second brain or make smart notes, so we just keep everything in our head and inevitably struggle
ChatGPT creates an illusion of a knowledge base which can be queried using a natural language, so that we wouldn't have to remember anything anymore. But as I read more about #Zettelkasten and other methods, it seems that each of us could've had such a KB even w/o a computer
Imagine that early at school you'd learn how to create your own system of interconnected notes containing the material you learn. You'd then use it as a seed for your second brain which you keep building throughout your life and filling with personal and professional knowledge.
Not only fighting misleading content will be a challenge for academia in the post-ChatGPT era. It has suddenly become easy to run academic paper mills at scale, set up credibly looking scam journals or even money laundering schemes. Can we imagine a systemic way out of it?🧵
If you’ve never worked in academia, you’ve probably never heard that academic publishing is dominated by huge, very profitable companies which use the pressure of “publish-or-perish” put on the scientists to earn big money (the 30%-profit-margin type of money).
How come? Scientists are required to publish articles in an academic journal and to refer to other people’s work. Articles are reviewed by the experts – their colleagues, employed at other scientific institutions – in a form of brief fact-checking which is called peer review.
Today I asked ChatGPT about the topic I wrote my PhD about. It produced reasonably sounding explanations and reasonably looking citations. So far so good – until I fact-checked the citations. And things got spooky when I asked about a physical phenomenon that doesn’t exist.
I wrote my thesis about multiferroics and I was curious if ChatGPT could serve as a tool for scientific writing. So I asked to provide me a shortlist of citations relating to the topic. ChatGPT refused to openly give me citation suggestions, so I had to use a “pretend” trick.
When asked about the choice criteria, it gave a generic non-DORA compliant answer. I asked about the criteria a few times and it pretty much always gave some version of “number-of-citations-is-the-best-metric”. Sometimes it would refer to a “prestigious journal”.