#ChatGTP knows everything about p-values and that's a problem. A short 🧵
There's a lot of non-sense written about p-values on the internet. ChatGTP has learned it all, and readily mixes correct statements with nonsense like "p < 0.05 means there is less than a 5% chance that the results are due to chance".
Trying out different prompts you can get better answers. I had some luck by adding on "I'm a statistics professor" which makes ChatGTP say mostly correct things about p-values.
But if you're a student you're out of luck and back into "It's the probability the results occurred by chance"-nonsense camp.
Of course, ChatGTP will readily prescript NHST and p-values if you're asking how you can *prove* that two groups are different...
Here's the best example of that ChatGTP has internalized ALL the nonsense people have written about p-values...
In 2002 Haller and Krauss wrote this sweet little paper where they show that, at their university, confusions around p-values was common among students *as well as* methodology teachers. krigolsonteaching.com/uploads/4/3/8/…
Students, scientists, and methodology teachers were given the following prompt:
And the following questions, all of which *should* be marked as false:
Most students, teachers, and scientists failed on this questionnaire...
The amazing thing is, as chatGTP is so good, we can give it this questionnaire verbatim!
And it will answer! But as it's trained on people's general misconceptions of what a p-values is, it will fall into the same traps as the students and teachers.
Unfortunately, as long as chatGTP it's trained on what people have been writing about p-values online, it should not be used as an alternative to, say, wikipedia.
Some more ramblings about this in a short presentation I did at Bayes@Lund 2023:
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The good thing when working with tidy data is that python, R and SQL code often becomes very similar.
(Makes it easier for my brain, when switching between languages 🧠👍)
Of course, then SQL had to go and get the order of statements completely wrong... well well...
Btw, this query extracts the top 10 artists that's been on Spotify's Top 200 playlist the most in Sweden. Using the "Spotify Charts" dataset: kaggle.com/dhruvildave/sp…
By now you might have heard the good news that #rstats is getting a new shorthand function syntax. Soon you'll be able to write the following in R!
add <- \(x, y) x + y
But why does this new syntax use the backslash?
(A thread. 1/n)
The \(x,y) x + y syntax might look odd, but is borrowed (as far as I know) from perhaps the most functional of all programming languages - Haskell - where a similar syntax is used (with the addition of an -> arrow).
But why the backslash?
(2/n)
Modern Javascript doesn't use a backslash for shorthand function definitions and neither does python. But python gives us a clue! Why does python use the lambda keyword?
(3/n)