Regardless of what happens in the election, the bottom line is that modern statistical methods can provide better analyses of polling data than the stuff available ten years ago, especially (but not only) because of recent problems in the polling industry. Time to do better.
You are not going to get people to ignore polls. Besides, they are the best tools we have for gauging opinion — and very normatively useful in non-electoral capacities.
But you CAN improve the aggregate so you don't magnify errors for millions of readers!
Aaand there it is folks! Early vote-by-mail results in California's recall election are way ahead of tied-race benchmarks and signal an imminent victory for Governor Gavin Newsom, possibly by high double digits. I'm going to bed early tonight livevoterturnout.com/sandiegoca/Liv…
At this point, given the CA recall polling and VBM data, all we're looking for in the early vote tonight is confirmation of projected partisan distributions. Returned ballots are sufficiently Dem that we just need to assess loyalty + turnout. LA county VBM +25 Newsom would do it.
The very fuzzy math here is that early mail ballots statewide (+40) were about 10 points more pro-Biden than the final results (+30) in California in 2020, and LA County (+45) was about 15 pts more Biden than the whole state. So +55 early LA = +30 CA-wide in the end. +25 = tied.
You don't call an election just based on one county, of course. But Los Angeles cast 25% of CA's votes in 2020 so it's a good guide.
Other VBM benchmarks could be:
Recall +11 in Orange County (+19D 2020 VBM = 30 overall :: -11D VBM = 0)
Keep +3 in SD (+33D VBM = 30 :: 3D = 0)
Here's my final update to this model of California recall polls. I'm calculating an aggregate that adjusts polls based on whether they use partisanship in their weighting schemes, and draws different trends for adjusted v unadjusted data. Newsom +18 +/- 10 gist.github.com/elliottmorris/…
The point of this project was to illustrate the different methods we can use to aggregate polling data — esp in how to improve existing popular averages that don't peer under the hood of how pollsters are processing their data, an increasingly important aspect of public polling.
So, note two things:
1. Popular averages magnify unlikely trends in public opinion by being whipsawed by data that is subject to higher standard errors than a decade ago (when the models were first made). Weighting by party flattens trends by decreasing nonresponse
The thing about Rasmussen (and to a lesser extent, some other right-leaning pollsters and aggregators) is that the conservative information ecosystem has provided a top-dollar audience for confirmation bias, and there's not much AAPOR or good political journos can do about it.
Two of the enduring patterns of polling over the past 20 years are (a) that pre-election polls tend to underestimate the dominant party in a given state, especially lopsided ones (like California) & (b) that polls underestimate the status quo option on referenda and recalls. So..
And see the graph from @MHeidemanns for potus polling errors
@MHeidemanns Also, given that the crosstabs for Newsom v Cox pretty much match perfectly with the 2018 results in the state, I'm inclined to believe the media was hoodwinked by partisan non-response and outliers (esp from lower-quality polls) over the last month.
Maybe Newsom was always ahead (as everyone’s prior prediction should have been all along) and people were just overreacting to outliers and temporary partisan non-response that got magnified by over-aggressive polling averages?
I do think maybe people have been conditioned to rely on basic poll averages (like 538’s) too much at the expense of studying the underlying data and empirical dynamics of historical polling on recall elections (big “yes” bias), both of which call the average into question
Anyway, maybe I will be wrong next week, but the underlying conditions of this contest have been pretty consistent — despite what a couple overhyped self-reported likely voter polls have shown