Kareem Carr, Statistics Person Profile picture
Stats PhD student @Harvard • I'm on a mission to Make Twitter Nerdy Again • Follow me for a steady stream of nerdy content on your timeline.
19 subscribers
Jun 5 10 tweets 2 min read
You may have heard hallucinations are a big problem in AI, that they make stuff up that sounds very convincing, but isn't real.

Hallucinations aren't the real issue. The real issue is Exact vs Approximate, and it's a much, much bigger problem. Image When you fit a curve to data, you have choices.

You can force it to pass through every point, or you can approximate the overall shape of the points without hitting any single point exactly.
Jun 2 13 tweets 2 min read
I’m deeply skeptical of the AI hype because I’ve seen this all before. I’ve watched Silicon Valley chase the dream of easy money from data over and over again, and they always hit a wall.

Story time. First it was big data. The claim was that if you just piled up enough data, the answers would be so obvious that even the dumbest algorithm or biggest idiot could see them.

Models were an afterthought. People laughed at you if you said the details mattered.
Jun 1 9 tweets 2 min read
As a statistician, this is extremely alarming. I’ve spent years thinking about the ethical principles that guide data analysis. Here are a few that feel most urgent: Image RESPECT AUTONOMY

Collect data only with meaningful consent. People deserve control over how their information is used.

Example: If you're studying mobile app behavior, don’t log GPS location unless users explicitly opt in and understand the implications.
May 8 7 tweets 1 min read
The kids using ChatGPT to cheat are massively fumbling the ball.

I would give almost anything to experience learning something like calculus for the first time with an AI assistant. I have wasted an ungodly amount of time on poorly written math textbooks.

Confusing notation. Poorly worded statements that I puzzled over for hours. Typos that had me questioning my sanity for days.
May 7 6 tweets 1 min read
Hot take: Students using chatgpt to cheat are just following the system’s logic to its natural conclusion, a system that treats learning as a series of hoops to jump through, not a path to becoming more fully oneself. The tragedy is that teachers and students actually want the same thing, for the student to grow in capability and agency, but school pits them against each other, turning learning into compliance and grading into surveillance.
Apr 25 4 tweets 1 min read
If you think about how statistics works it’s extremely obvious why a model built on purely statistical patterns would “hallucinate”. Explanation in next tweet. Image Very simply, statistics is about taking two points you know exist and drawing a line between them, basically completing patterns.

Sometimes that middle point is something that exists in the physical world, sometimes it’s something that could potentially exist, but doesn’t. Image
Apr 19 5 tweets 1 min read
"Why are US taxpayers funding Harvard?"

These grants aren't charity. They're highly competitive contracts where the US government determines Harvard is the best institution for conducting specific research, and then pays Harvard for services rendered to US taxpayers. Each grant represents a fair contract that a group at Harvard won after being in competition with hundreds or even thousands of other groups. These are not handouts.
Apr 6 9 tweets 2 min read
As a someone who translates ideas into math for a living, I noticed something weird about the tariff formula that I haven't seen anybody else talk about. 🧵 Image The formula defines the tariff rate as exactly the percent you need to charge on imports to make up for the trade deficit. Basically,

trade deficit = tariff rate x imports

It's constructed as if tariffs are a kind of compensation for trade deficits but this raises a question.
Apr 1 12 tweets 3 min read
Whenever I see students with good grades but lots of college rejections, my first thought is a bad personal essay. As predicted, this guy's essay was kind of a disaster.

Since I did get into Harvard, I'll give my two cents on the essay: Maybe this will help some people. Here's the essay: Image
Image
Mar 8 5 tweets 2 min read
In honor of international women's day, let's take a moment to remember the most famous statistician in history.

You've definitely heard of her, but you probably have no idea she was a statistician.

It's Florence Nightingale. Image
Image
Nightingale was first female member of the Royal Statistical Society and a pioneer in using statistical analysis to guide medical decisions and public health policy.
Feb 18 6 tweets 1 min read
Took one for the team and made a histogram of the Elon social security data. Not sure why his data scientists are just giving him raw tables like that. Image
Image
It’s also weird that they keep tweeting out these extremely strong claims without taking a few days to do some basic follow up work.
Feb 8 5 tweets 1 min read
Here's my solution to teaching this kid probability 🧵 Image Let's just take his system of assigning probability at face value. What's the probability of getting a six when I roll a die?

Well either it happens or it doesn't happen. So, the chances of getting a 6 are 50%.
Feb 6 6 tweets 1 min read
Nate Silver's latest book reads to me like a roadmap of the current moment. It's about a kind of chaotic, aggressive quantitative thinker who's usually wrong, but in calculated ways that lead to massive wins when things break their way. Image These would include venture capitalists, crypto bros, tech evangelists, AI boosters and even a few influencers. They also seem to be among the most powerful members of MAGA.
Jan 23 6 tweets 2 min read
This is a resource thread about the Datasaurus Dozen data and how to get it.

The Datasaurus Dozen is a collection of extremely different datasets with near identical summary statistics.

It’s a reminder to all of us to ALWAYS plot our data. Here’s what all the datasets look like: Image
Jan 20 11 tweets 3 min read
Nassim Taleb has written a devastatingly strong critique of IQ, but since he writes at such a technical level, his most powerful insights are being missed.

Let me explain just one of them. 🧵 Image Taleb raises an intriguing question: what if IQ isn't measuring intelligence at all, but instead merely detecting the many ways in which things can go wrong with a brain?
Jan 15 4 tweets 1 min read
Here's something counterintuitive, that a lot of people don't understand about heritability as it relates to race, if skin color is heritable, and discrimination based on skin color is common, the bad outcomes due to racism is going to be heritable as well. Whenever you get any race-related heritability numbers, the first thing you absolutely should do is ask the person giving you those numbers what they did to rule these pathways out as a possibility.
Jan 15 7 tweets 2 min read
hey now, this is the guy that said your tweet was racist. go yell at him not me. Image Let me break this down. The original tweet is doing the statistical equivalent of this. Image
Jan 13 10 tweets 2 min read
It feels racist because it’s a white nationalist framing of these data. This is a textbook example of how to lie with statistics. Image My main criticism is he didn't even provide a source. So, 100k+ people have seen this and we don't even know if there is any real data here.
Dec 30, 2024 8 tweets 2 min read
According to a recent paper, the vast majority of academics gain their elite status the old-fashioned way, they were born with rich parents. Image Academics are more likely to have rich parents than teachers, lawyers and judges, and even physicians and surgeons. Image
Dec 16, 2024 16 tweets 2 min read
Race and IQ research tends to be really bad. Here's why: Image To say IQ is "genetic" is to say it operates at the level of genes.
Dec 15, 2024 19 tweets 3 min read
I have debunked this map of global IQs, and the study it was based on, so many times, but it just won't die. Help me spread the word about how much this study sucks. For every 10 likes, I will tweet a ridiculous fact about how badly this study was conducted. Real science is about paying close attention to the quality of your sources. Notice that the original poster doesn't bother to say where the data comes from.