Parents: Taste-testing is a great way to engage kids on science and how methodology can address potential biases.
Here's an example w/@LastCrumbCookie we did recently. This is after a few years of lots of simpler taste tests.
Most of our prior taste tests were amenable to some blinding such as comparing fast food chicken and fries and testing generic vs. brand-name.
Haven wanted to do @LastCrumbCookie and compare their 12 varieties. Blinding not possible & 12 is a lot to test! Challenging for design.
Haven and Joni also wanted to do a pre/post comparison to see if our expectations were good predictors of what we would actually like.
Finally, they wanted to evaluate the cookies holistically and needed a measurement strategy that captured important variation between cookies.
After some debate, we decided:
* Pre-rate early and then remove access to ratings to reduce expectancy effects.
* We can also reduce expectancy by making predictions & ratings different judgments. Pre: predict final rating; Judgments: eval 4 specific features & average results.
* 12 cookies is too many for 1 day, so 3 cookies per day for 4 days.
* Stratified random sampling for each day: 1 cookie from pre-rated top 1/3, 1 from pre-rated middle 1/3; and 1 from pre-rated bottom 1/3.
* Random order for each 3 each day.
* Predictions were made as rank-order from 1 to 12 knowing the cookie name and ingredients.
* Judgments were 1 to 10 ratings on four dimensions: texture, taste, originality, presentation.
Planned comparison of predictions versus judgments combined across raters was a correlation of -0.06! But mom 0.45, dad 0.37, and Joni 0.32 had positive correlations and Haven -0.22 had a negative correlation.
Outperforming expectations:
Netflix and Crunch (Cinnamon Toast Crunch)
Not Today Mr. Muffin Man (Blueberry Muffin)
Underperforming expectations:
The James Dean (Oreo Milkshake)
Macadamnia (Salted Caramel Macadamia)
With years of these activities, the kids have learned so much about research design, hypothesizing, blinding, bias, visualization, variability, correlation, measurement, and uncertainty.
This example illustrates how much this activity has matured.
If you want to try it with your kids, start really simple like we did:
First: Rate on an agreed scale, write down scores, compare with each other
Second: Add comparison, eg, different brands of same ice cream
Third: Add blinding
Fourth: Add visualization of data
Fifth: Add variation in measurement
Sixth: Discuss other biases and how to reduce them with good design
There are many other approaches. The point is start super simple and just add "naturally" as the kids offer ideas.
Next, we might move out of spreadsheet for analysis and move into analytic and visualization tools. Maybe we will even try preregistration on OSF.
Check out the data and you can imagine the conversation as we tried to make sense of it together.
For the last 2.5 years, my daughters and I have been rating breakfast places in #charlottesville#cville area. We rated 51 restaurants from 1 (worst) to 10 (best) on taste, presentation, menu, ambiance, & service. We also recorded cost-per-person.
Here's what we learned. 1/
Across 51 restaurants we spent $1,625.36 pre-tip, average cost of $9.91/person (sometimes other family members joined).
Most expensive per person: The Ridley $27.08, Farm Bell Kitchen $17.81, Fig $17.44
Averaging all 5 ratings across all raters is one way to determine an overall rating. The grand average is 7.1 out of 10 w/ a range of 4.8 to 9.1. How strongly related are cost per person and overall rating?
r=0.36
Just 13% of the variation in quality is associated with cost.
Sharpen your intuitions about plausibility of observed effect sizes.
r > .60?
Is that effect plausibly as large as the relationship between gender and height (.67) or nearness to the equator and temperature (.60)?
r > .50?
Is that effect plausibly as large as the relationship between gender and arm strength (.55) or increasing age and declining speed of information processing in adults (.52)?
r > .40?
Is that effect plausibly as large as the relationship between weight and height (.44), gender and self-reported nuturance (.42), or loss in habitat size and population decline (.40)?
@Edit0r_At_Large The journal did invite a resubmission if we wanted to try to address them. However, we ultimately decided not to resubmit because of timing. We had a grant deadline to consider.
We did incorporate reviewer suggestions that we could into the final design and proceeded.
We eventually had the full report and that was peer reviewed in the normal process.
We published the paper in Nature Human Behaviour.
The RR was originally submitted to Nature Human Behaviour.
I think the RR submission did meaningfully improve our design & odds of success.
New in Nature Human Behavior: We had 353 peer reviewers evaluate published Registered Reports versus comparison articles on 19 outcome criteria. We found that RRs were consistently rated higher on rigor and quality.
Figure shows performance of RRs versus comparison articles on 19 criteria and 95% credible intervals. Red criteria evaluated before knowing the results, blue after knowing the results, green summarizing whole paper.
Congrats to @cksoderberg Tim Errington @SchiavoneSays@julia_gb@FSingletonThorn@siminevazire and Kevin Esterling for the excellent work on this project to provide an additional evidence base for how Registered Reports can alter the credibility of published research.
10 years of replication and reform in psychology. What has been done and learned?
Our latest paper prepared for the Annual Review summarizes the advances in conducting and understanding replication and the reform movement that has spawned around it.
We open w/ anecdote of the 2014 special issue of Social Psychology. The event encapsulated themes that played out over the decade. The issue brought attention to replications, Registered Reports, & spawned “repligate”