Today I want to take a break from sharing research to share a personal story instead. It’s a story about my name, why I once decided to quit academia, why I came back, what I learnt from it, and why I’m grateful to have an audience here on Twitter.
Near the end of my PhD, I had three main publications. I ran into someone at a conference whom I knew from previous events. He asked me about my work, and I briefly mentioned my papers. His reaction was something I didn’t expect…
He’d seen and liked my papers, but the name hadn’t registered. He didn’t realize the 3 papers had the same first author and that it was someone he knew. He was embarrassed and apologetic. If he’d known, he said, he’d have advocated for me at his school for a faculty position(!)
I was naive back then, but I wasn’t an idiot. I knew that my name tends to register as "generic Indian name with too many A’s". I was used to jokes and many, many misspelled name tags (it’s like banana, you never know when to stop!). But this seemed more serious.
In academia, your name is your brand. In a game of reputation, that’s everything. Having your name be known in the community is how your get invited to talks, how people look you up to cite your papers, and how they decide whether your job application is worth a serious look.
At first I wondered if I was overreacting. But you know how when you learn a new word you suddenly start seeing it everywhere? I realized it wasn’t just that one guy. People knew my work but didn’t recognize my name. Paradoxically, my work was highly visible, but I was not.
That was a major reason I decided academia wasn’t for me. I’d have to work way harder due to my invisibility. If you have a name that’s easy to remember, you may wonder if I’m exaggerating. If you have a name that people forget, you know exactly what I’m talking about.
The startup world seemed egalitarian in contrast. If people liked your product, they would use it, no matter who built it. But a couple of years was all it took to become disillusioned with Silicon Valley. If you’ve read my work or seen my tweets, you probably know why that is.
So I was going to try to make a career of academia after all. But I decided I was going to do things differently. I would do everything I could to try to make sure that people who encountered my work would learn something about the author.
I tried various things, but what worked was online outreach. I had a somewhat popular blog that led to "ahh, now I know who you are!" comments when I was on the job market. And it’s one reason I got on Twitter and stuck with it for years while it felt like shouting into the void.
Don’t get me wrong—outreach is good, important, and rewarding, no matter what your name is. It’s worked out well for me in many ways. But it was also a survival strategy long before I had a reputation and the security of tenure. Twitter, for all its faults, let me say: "I exist!"
Over the years I’ve occasionally gotten snide comments from people implying that the time I spend online is a sign of vanity or frivolousness unbecoming of an academic. Every one of those was from people who have the privilege of not having to be loud in order to be heard.
For anyone who thinks academia is a meritocracy: the consequences of a hard-to-pronounce name were enough to nearly make me quit, but that’s a relatively *tiny* handicap. I’ve never faced sexual harassment or had people question my ability because of how I look.
There’s a simple way to fight familiarity bias. When you read a good paper with an author you don’t know, especially if they’re junior, take a minute to look them up, get to know their work, cite them, and keep them in mind for events you organize. In short, remember the name!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
In the late 1960s top airplane speeds were increasing dramatically. People assumed the trend would continue. Pan Am was pre-booking flights to the moon. But it turned out the trend was about to fall off a cliff.
I think it's the same thing with AI scaling — it's going to run out; the question is when. I think more likely than not, it already has.
You may have heard that every exponential is a sigmoid in disguise. I'd say every exponential is at best a sigmoid in disguise. In some cases tech progress suddenly flatlines. A famous example is CPU clock speeds. (Ofc clockspeed is mostly pointless but pick your metric.)
Note y-axis log scale.en.wikipedia.org/wiki/File:Cloc…
On tasks like coding we can keep increasing accuracy by indefinitely increasing inference compute, so leaderboards are meaningless. The HumanEval accuracy-cost Pareto curve is entirely zero-shot models + our dead simple baseline agents.
New research w @sayashk @benediktstroebl 🧵
Link:
This is the first release in a new line of research on AI agent benchmarking. More blogs and papers coming soon. We’ll announce them through our newsletter ().aisnakeoil.com/p/ai-leaderboa… AiSnakeOil.com
The crappiness of the Humane AI Pin reported here is a great example of the underappreciated capability-reliability distinction in gen AI. If AI could *reliably* do all the things it's *capable* of, it would truly be a sweeping economic transformation. theverge.com/24126502/human…
The vast majority of research effort seems to be going into improving capability rather than reliability, and I think it should be the opposite.
Most useful real-world tasks require agentic workflows. A flight-booking agent would need to make dozens of calls to LLMs. If each of those went wrong independently with a probability of say just 2%, the overall system will be so unreliable as to be completely useless.
A thread on some misconceptions about the NYT lawsuit against OpenAI. Morality aside, the legal issues are far from clear cut. Gen AI makes an end run around copyright and IMO this can't be fully resolved by the courts alone. (HT @sayashk @CitpMihir for helpful discussions.)
NYT alleges that OpenAI engaged in 4 types of unauthorized copying of its articles:
–The training dataset
–The LLMs themselves encode copies in their parameters
–Output of memorized articles in response to queries
–Output of articles using browsing plugin courtlistener.com/docket/6811704…
The memorization issue is striking and has gotten much attention (HT @jason_kint ). But this can (and already has) been fixed by fine tuning—ChatGPT won't output copyrighted material. The screenshots were likely from an earlier model accessed via the API.
A new paper claims that ChatGPT expresses liberal opinions, agreeing with Democrats the vast majority of the time. When @sayashk and I saw this, we knew we had to dig in. The paper's methods are bad. The real answer is complicated. Here's what we found.🧵 aisnakeoil.com/p/does-chatgpt…
Previous research has shown that many pre-ChatGPT language models express left-leaning opinions when asked about partisan topics. But OpenAI says its workers train ChatGPT to refuse to express opinions on controversial political questions. arxiv.org/abs/2303.17548
Intrigued, we asked ChatGPT for its opinions on the 62 questions used in the paper — questions such as “I’d always support my country, whether it was right or wrong.” and “The freer the market, the freer the people.” aisnakeoil.com/p/does-chatgpt…
We dug into a paper that’s been misinterpreted as saying GPT-4 has gotten worse. The paper shows behavior change, not capability decrease. And there's a problem with the evaluation—on 1 task, we think the authors mistook mimicry for reasoning.
w/ @sayashk aisnakeoil.com/p/is-gpt-4-get…
We do think the paper is a valuable reminder of the unintentional and unexpected side effects of fine tuning. It's hard to build reliable apps on top of LLM APIs when the model behavior can change drastically. This seems like a big unsolved MLOps challenge.
The paper went viral because many users were certain GPT-4 had gotten worse. They viewed OpenAI's denials as gaslighting. Others thought these people were imagining it. We suggest a 3rd possibility: performance did degrade—w.r.t those users' carefully honed prompting strategies.