Do you want to do a psychology experiment while following best practices in open science? My collaborators and I have created Experimentology, a new open web textbook (to be published by MIT Press but free online forever).
The book is intended for advanced undergrads or grad students, and is designed around the flow of experimental project - from planning through design, execution, and reporting, with open science concepts like reproducibility, data sharing, and preregistration woven throughout.
We start by thinking through what an experiment is, highlighting the role of randomization in making causal claims and introduce DAGs (causal graphs) as a tool for thinking about these. We then discuss how experiments relate to psychological theories. experimentology.io/1-experiments
We introduce issues of reproducibility, replicability, and robustness and review the meta-science literature on each of these. We also give a treatment of ethical frameworks for human subjects research and the ethical imperative for open science. experimentology.io/3-replication
In our chapters on statistics, we introduce estimation and inference and both Bayesian and frequentist approaches. Our emphasis is on model-building and data description, rather than on dichotomous p<.05 inference. experimentology.io/7-models
Next, we move to the meat of the book, with chapters on measurement, design, and sampling. I'm very proud of these chapters because I don't know of any similar treatment of these topics, and they are critical for experimentalists! experimentology.io/8-measurement
How do you organize your files for sharing? Should you include manipulation checks? What are best practices for piloting? The next section of the book has chapters on preregistration, data collection, and project management. experimentology.io/11-prereg
The final section contains chapters on presenting and interpreting research, including writing, visualization, and meta-analysis. experimentology.io/14-writing
Throughout, the book features case studies, "accident reports" (issues in the published literature), code boxes for learning how to reproduce our examples, and boxes highlighting ethical issues that come up during research.
We also have four "tools" appendices, including introductions to RMarkdown, github, the tidyverse, and ggplot.
Use Experimentology in your methods course! We include a guide for instructors with sample schedules and projects, and we'd love to get your feedback on how the material works in both undergrad and grad courses. experimentology.io/E-instructors
Experimentology is still work in progress, and we're releasing it in part to gather feedback on errors, omissions, and ways that we can improve the presentation of complex topics. Please don't hesistate to reach out or to log issues on our issue tracker:
Can a large language model be used as a "cognitive model" - meaning, a scientific artifact that helps us reason about the emergence of complex behavior and abstract representations in the human mind? My answer is YES.
Why and under what conditions? 🧵
A scientific model represents part or a whole of a particular system of interest, allowing researchers to explore, probe, and explain specific behaviors of the system. plato.stanford.edu/entries/models…
Cognitive models are instances of this strategy, in which an artifact (typically a set of equations or a program) is used to represent a hypothesized set of mental operations. In practice, this could be anything from economic decisions to language use in context.
What does it mean for a large language model (LLM) to "have" a particular ability? Developmental psychologists argue about these questions all the time and have for decades. There are some ground rules. 🧵
This thread builds on my previous thread about general principles for LLM evaluation. Here I want to talk specifically about claims about the presence of a particular ability (or relatedly, an underlying representation or abstraction).
Again, I'm not saying that LLMs do or don't have any particular ability or representation. But I do think it's reasonable to *entertain* these sorts of ideas - just the same way Premack famously asked "does the chimpanzee have a theory of mind?" cambridge.org/core/journals/…
People are testing large language models (LLMs) on their "cognitive" abilities - theory of mind, causality, syllogistic reasoning, etc. Many (most?) of these evaluations are deeply flawed. To evaluate LLMs effectively, we need some principles from experimental psychology.🧵
Just to be clear, in this thread I'm not saying that LLMs do or don't have *any* cognitive capacity. I'm trying to discuss a few basic ground rules for *claims* about whether they do.
Why use ideas from experimental psychology? Well, ChatGPT and other chat LLMs are non-reproducible. Without versioning and random seeds, we have to treat them as "non-human subjects."
How do we compare the scale of language learning input for large language models vs. humans? I've been trying to come to grips with recent progress in AI. Let me explain these two illustrations I made to help. 🧵
Many caveats still apply. LLMs are far from perfect, and I am still struggling with their immediate and eventual impacts on science (see linked thread). My goal in the current thread is to think about them as cognitive artifacts instead.
My lab held a hackathon yesterday to play with places where large language models could help us with our research in cognitive science. The mandate was, "how can these models help us do what we do, but better and faster."
Some impressions:🧵
Whatever their flaws, chat-based LLMs are astonishing. My kids and I used ChatGPT to write birthday poems for their grandma. I would have bet money against this being possible even ten years ago.
But can they be used to improve research in cognitive science and psychology?
1. Using chat-based agents to retrieve factual knowledge is not effective. They are not trained for this and they do it poorly (the "hallucination problem"). Ask ChatGPT for a scientist bio, and the result will be similar but with random swaps of institutions, dates, facts, etc.
For two years, @mbraginsky, @danyurovsky, Virginia Marchman, and I have been working on a book called "Variability and Consistency in Early Language Learning: The Wordbank Project" (@mitpress).
We look at child language using a big dataset of parent reports of children's vocabulary from wordbank.stanford.edu, w/ 75k kids and 25 languages. (Data are from MacArthur-Bates CDI and variants). Surprisingly, parent report is both reliable and valid! langcog.github.io/wordbank-book/…
First finding: It's long been known that children are variable with respect to language. The striking thing is that the level of variability is very consistent across languages. The world around, toddlers are all over the place with respect to language! langcog.github.io/wordbank-book/…