Let's talk "misNAEPery": Common misues of #NAEP results. Here are 3 types of misNAEPery to look out for on Monday's "NAEP Day": 1) correlation-is-causation (@EduGlaze's original definition) 2) psychometric misNAEPery 3) one-true-outcome misNAEPery.
🧵 1/
For each of these misNAEPeries, I try to distinguish between "high crimes" and "misdemeanors."
I used to get a little too gleeful in pointing out misNAEPery.
I now try to ask, "does it really matter?" or "what's the end goal?" before calling someone out for something "wrong." 2/
Type 1 misNAEPery: My leadership or policy caused these NAEP results. @EduGlaze coined misNAEPery in 2013 to refer to this common, predictable tendency among leaders, reporters, and commentators. ggwash.org/view/31061/bad… 3/
So is NAEP useless because we can't conclude anything about remote learning or charter schools or political leadership? No. My stance is closer to @MichaelPetrilli's here. NAEP is essential. fordhaminstitute.org/national/comme… 4/
For Type 1 misNAEPery, what's a "high crime"? Taking credit for high NAEP scores (with known correlations with wealth in our @seda_data).
But speculating on the basis of positive *trends* and declining inequality reduces this to a "misdemeanor," to me. 5/ edopportunity.org/explorer/#/cha…
Type 2 misNAEPery is psychometric misNAEPery: misuse of different NAEP scores. This includes comparing proficiency %s, collapsing results across subjects and grades, neglecting statistical significance, and translating effect sizes to other metrics like "months of learning." 7/
A rule of thumb for psychometric misNAEPery is, "beware the differences in differences." Any one score metric is probably fine. A difference in metrics is a "misdemeanor." And a difference in differences becomes a "high crime," if you don't cross-check with means, first. 9/
Unfortunately, differences-in-differences are often important.
Comparing pandemic trends across states? Diff-in-diff.
Comparing subgroup score inequality over time? Diff-in-diff.
These require cross-checking with means or reporting means outright. 10/ gse.harvard.edu/news/uk/15/12/…
Problems with proficiency are why I consider "months of learning" metrics to be "misdemeanors," and the lesser of two evils.
e.g., my "Rule of 27" is a transparent calculation that doesn't suffer from the bias that proficiency rate comparisons have. 11/
Similarly, I consider aggregating sensibly across subjects and grades to be a "misdemeanor," as we do in our "pooling model" for @seda_data, a simple index of academic educational opportunity. I hope you'll let us off with a warning. 12/ edopportunity.org/explorer/#/map…
Type 3 misNAEPery is "one-true-outcome" misNAEPery, forgetting that NAEP scores and the Reading and Mathematics skills they measure are among many outcomes we desire for our children.
There is lots of middle ground between "NAEP is everything" and "NAEP is nothing." 13/
To me, it's a "high crime" to decontextualize NAEP entirely from other outcomes like physical health, social/emotional health, and other academic subjects. But it's also a "high crime" to dismiss NAEP entirely. It is important for kids to read and reason quantitatively. 14/
But I think it's fine, more of a "misdemeanor," to generalize from NAEP scores to "academic outcomes" after establishing this context. I've used a "tip of the iceberg" metaphor previously to remember what we're not measuring. 15/ news.harvard.edu/gazette/story/…
My favorite metaphor for NAEP is that NAEP is like "the North Star." It's just one star in the sky, but its consistency and dependability help us to navigate. I credit my fellow @GovBoard alumnus (and kamaʻāina) Frank Fernandes with the metaphor. 16/
So, on Monday, beware misNAEPery of all 3 types. Distinguish between "high crimes" and "misdemeanors." (Save your outrage for the "high crimes.") And then, as I've said before, let's keep increasing our support of education. 17/ news.harvard.edu/gazette/story/…
And, in case you missed my thread yesterday on "Why is NAEP Monday important?", see here:
On NAEP Eve, my 3rd thread, on "learning loss." At 12AM, people expect NAEP will find "learning loss."
Are results about "learning loss" essential to inform us as we move forward?
Or is the concept of "learning loss," itself, damaging and hurtful?
To me, the answer is: Both. 🧵1/
When I say "learning loss," I try to create a "firewall" between what I say about systems and what I say about kids.
Evidence of "learning loss" shows the debts our society owes to kids. For kids and their parents & teachers, we must build from their strengths, their assets. 2/
I hope my "four quadrants" framework is useful in this debate. NAEP (upper left) monitors aggregate progress. It's not about kids (lower left). It's not even about schools (upper right). It's about our whole system of educational opportunity. 3/
1-pager: scholar.harvard.edu/files/andrewho…
Why is Monday’s “NAEP Day” so important? Don’t we already know about “learning loss” after our @CRPE_edu report and the September 1 @NAEP_NCES release? Here are three reasons why NAEP Day matters. 🧵1/
CRPE Report: crpe.org/wp-content/upl…
Sept 1 NAEP LTT:
Reason #1: This is NAEP’s ONE JOB: Assessing Educational Progress. Below is my “four quadrants” framework for test purposes. NAEP sits in the upper left: monitoring progress. scholar.harvard.edu/files/andrewho… 2/
Tests should follow “the Golden Rule of Testing”: DO NOT CROSS QUADRANTS.
Why can't other tests monitor with authority? State tests can be inflated. Classroom tests can be incomparable. Selection tests can be incomplete. But NAEP? NAEP has one job... It does it with authority. 3/
My essay acknowledges the historical harms and overreach of accountability testing and all the dangers of overemphasizing and inflating the role of tests.
State testing programs are heading for an iceberg. We can still turn the ship. I wrote a short essay at @FutureEdGU about how. There is even an 8-step plan. 1/8
The anti-accountability movement has earned a well-deserved victory. I am glad! But its momentum has it poised to strike state tests at exactly the time when tests can be most useful--for allocating unprecedented federal support. 2/8
Federal guidance invites state waivers for accountability and attendance. States should take these invitations! But state tests should happen this spring: They have irreplaceable comparability, alignment, and authority for directing federal support. 3/8
Without metrics like these, valid interpretations of school and district test scores will be impossible. States trying to "target resources and supports" per @usedgov intentions will fail.
States should prepare to define these metrics and answer these questions now. This is a time for an "educational census," not business-as-usual test score reporting.