Parents: Taste-testing is a great way to engage kids on science and how methodology can address potential biases.

Here's an example w/@LastCrumbCookie we did recently. This is after a few years of lots of simpler taste tests.
Most of our prior taste tests were amenable to some blinding such as comparing fast food chicken and fries and testing generic vs. brand-name.

Haven wanted to do @LastCrumbCookie and compare their 12 varieties. Blinding not possible & 12 is a lot to test! Challenging for design.
Haven and Joni also wanted to do a pre/post comparison to see if our expectations were good predictors of what we would actually like.

Finally, they wanted to evaluate the cookies holistically and needed a measurement strategy that captured important variation between cookies.
After some debate, we decided:

* Pre-rate early and then remove access to ratings to reduce expectancy effects.

* We can also reduce expectancy by making predictions & ratings different judgments. Pre: predict final rating; Judgments: eval 4 specific features & average results.
* 12 cookies is too many for 1 day, so 3 cookies per day for 4 days.

* Stratified random sampling for each day: 1 cookie from pre-rated top 1/3, 1 from pre-rated middle 1/3; and 1 from pre-rated bottom 1/3.

* Random order for each 3 each day.
* Predictions were made as rank-order from 1 to 12 knowing the cookie name and ingredients.

* Judgments were 1 to 10 ratings on four dimensions: texture, taste, originality, presentation.

* Overall judgment = taste*0.67 + texture*0.11 + originality*0.11 + presentation*0.11
* Overall rating = average of four raters (Dad, Mom, Haven, Joni)

* Convert overall rating to a ranking to compare with predictions

Data was recorded in a Google spreadsheet w/ only planned comparison of pre/post ranking & all else exploratory analysis.

docs.google.com/spreadsheets/d…
Planned comparison of predictions versus judgments combined across raters was a correlation of -0.06! But mom 0.45, dad 0.37, and Joni 0.32 had positive correlations and Haven -0.22 had a negative correlation.
Outperforming expectations:

Netflix and Crunch (Cinnamon Toast Crunch)
Not Today Mr. Muffin Man (Blueberry Muffin)

Underperforming expectations:

The James Dean (Oreo Milkshake)
Macadamnia (Salted Caramel Macadamia)

Others pictured examining means, variation, correlations.
With years of these activities, the kids have learned so much about research design, hypothesizing, blinding, bias, visualization, variability, correlation, measurement, and uncertainty.
This example illustrates how much this activity has matured.

If you want to try it with your kids, start really simple like we did:

First: Rate on an agreed scale, write down scores, compare with each other

Second: Add comparison, eg, different brands of same ice cream
Third: Add blinding

Fourth: Add visualization of data

Fifth: Add variation in measurement

Sixth: Discuss other biases and how to reduce them with good design

There are many other approaches. The point is start super simple and just add "naturally" as the kids offer ideas.
Next, we might move out of spreadsheet for analysis and move into analytic and visualization tools. Maybe we will even try preregistration on OSF.

Check out the data and you can imagine the conversation as we tried to make sense of it together.

Try it! docs.google.com/spreadsheets/d…
For our kids, the interest has become intrinsic, so much so that they set-up tests on their own for themselves, their friends, and the pets.

Science is fun and is for everyone.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Brian Nosek

Brian Nosek Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @BrianNosek

3 Oct
For the last 2.5 years, my daughters and I have been rating breakfast places in #charlottesville #cville area. We rated 51 restaurants from 1 (worst) to 10 (best) on taste, presentation, menu, ambiance, & service. We also recorded cost-per-person.

Here's what we learned. 1/
Across 51 restaurants we spent $1,625.36 pre-tip, average cost of $9.91/person (sometimes other family members joined).

Cheapest per person: Duck Donuts $3.10, Sugar Shack $3.41, Bojangles $4.30.

Most expensive per person: The Ridley $27.08, Farm Bell Kitchen $17.81, Fig $17.44
Averaging all 5 ratings across all raters is one way to determine an overall rating. The grand average is 7.1 out of 10 w/ a range of 4.8 to 9.1. How strongly related are cost per person and overall rating?

r=0.36

Just 13% of the variation in quality is associated with cost.
Read 18 tweets
17 Jul
Sharpen your intuitions about plausibility of observed effect sizes.

r > .60?

Is that effect plausibly as large as the relationship between gender and height (.67) or nearness to the equator and temperature (.60)?
r > .50?

Is that effect plausibly as large as the relationship between gender and arm strength (.55) or increasing age and declining speed of information processing in adults (.52)?
r > .40?

Is that effect plausibly as large as the relationship between weight and height (.44), gender and self-reported nuturance (.42), or loss in habitat size and population decline (.40)?
Read 8 tweets
25 Jun
An ironic back story (h/t @Edit0r_At_Large for evoking the memory).

This paper was itself submitted as a Registered Report in 2018 but was rejected!

Reviews were excellent. Identified some limitations we could solve, others we would have needed to debate on feasibility grounds.
@Edit0r_At_Large The journal did invite a resubmission if we wanted to try to address them. However, we ultimately decided not to resubmit because of timing. We had a grant deadline to consider.

We did incorporate reviewer suggestions that we could into the final design and proceeded.
We eventually had the full report and that was peer reviewed in the normal process.

We published the paper in Nature Human Behaviour.

The RR was originally submitted to Nature Human Behaviour.

I think the RR submission did meaningfully improve our design & odds of success.
Read 9 tweets
24 Jun
New in Nature Human Behavior: We had 353 peer reviewers evaluate published Registered Reports versus comparison articles on 19 outcome criteria. We found that RRs were consistently rated higher on rigor and quality.

Paper nature.com/articles/s4156…

Green OA osf.io/preprints/meta…
Figure shows performance of RRs versus comparison articles on 19 criteria and 95% credible intervals. Red criteria evaluated before knowing the results, blue after knowing the results, green summarizing whole paper. Image
Congrats to @cksoderberg Tim Errington @SchiavoneSays @julia_gb @FSingletonThorn @siminevazire and Kevin Esterling for the excellent work on this project to provide an additional evidence base for how Registered Reports can alter the credibility of published research.
Read 6 tweets
9 Feb
10 years of replication and reform in psychology. What has been done and learned?

Our latest paper prepared for the Annual Review summarizes the advances in conducting and understanding replication and the reform movement that has spawned around it.

psyarxiv.com/ksfvq/

1/
We open w/ anecdote of the 2014 special issue of Social Psychology. The event encapsulated themes that played out over the decade. The issue brought attention to replications, Registered Reports, & spawned “repligate”

econtent.hogrefe.com/toc/zsp/45/3

Figure from royalsocietypublishing.org/doi/full/10.10…
Read 15 tweets
10 Sep 20
Our prospective replication study released!

5 years: 16 novel discoveries get round-robin replication.

Preregistration, large samples, transparency of materials.

Replication effect sizes 97% the size of confirmatory tests!

psyarxiv.com/n2a9x

Lead: @JProtzko 1/
When teams made a new discovery, they submitted it to a prereg’d confirmatory test (orange).

Confirmatory tests subjected to 4 replications (Ns ~ 1500 each)

Original team wrote full methods section. Team conducted independent replications (green) and a self-replication (blue).
Based on confirmatory effect sizes and replication sample sizes, we’d expect 80% successful replications (p<.05). We observed 86%.

Exceeding possible replication rate based on power surely due to chance. But, outcome clearly indicates that high replicability is achievable
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(