For fuck's sake... My aggregate in CA was 60% Keep Newsom if you just allocated undecideds and corrected for whether polls had the right weighting scheme. That's a 2 point error on vote share! Tiny! This is bad for SOME pollsters, not the industry. Jesus gelliottmorris.substack.com/p/polls-of-cal…
Contrary to popular belief, you don't have to dunk on the polls every time an election result is a little bit surprising to you -- or if you looked at the wrong polling averages
Observing error in a 3rd-party polling average (& 1 that ingested data that was basically withdrawn by a polling house and didn't allocate undecideds!) and then projecting that error onto "polls" as an industry-wide error is a huge analytical misstep IMO.
Yes, it’s the obvious answer — but my point is more that the high number of undecideds (which skews the topline) and historical range of error in polls (esp recalls) means the error is not a “failure” for the whole industry, just some pollsters AND a bad aggregator.
Put another way: the result in California is a “failure” for a select group of bad pollsters and those who use outdated methods, and — if analyzed correctly — a pretty good showing for the good pollsters.
It is far past time for the most popular public polling analysis website to take seriously the fact that there are some methods and procedures that make polls good and bad and cannot be captured by ad-hoc weighting schemes parametrized based on previous predictive accuracy.
(There is also the matter that there are many unquantifiable “artful” things a pollster can do, eg with question wording, advertising or overall business decisions, that should disqualify their data from entering the public discourse — but that’s another matter)
So, tldr: there are 2 types of errors misleading the conventional wisdom on polls:
1) Analysts are making grave errors in ignoring the things going into the polls, focusing instead almost exclusively on what comes out 2) The public and press expect impossible precise estimates
Nothing in this Kagan essay is new or shocking, but I do find the cohesive packaging useful. The Constitution has no checks against proto-fascist factions abusing multiple branches of government for the pursuit of power. Something has to change—and soon. washingtonpost.com/opinions/2021/…
We have seen multiple crises of the confluence of factionalism & US electoral +other institutions over the last year. Life-threatening covid-19 policy & 1/6 are only the most relevant examples. I have to wonder how bad people think it needs to get before we hit the tipping point.
I think the latter paragraph here from this excellent @jbouie article puts the pieces together very well. A faction of leaders holding power across levels and institutions of government can effectively circumvent the checks and balances of our government nytimes.com/2021/09/24/opi…
Aaand there it is folks! Early vote-by-mail results in California's recall election are way ahead of tied-race benchmarks and signal an imminent victory for Governor Gavin Newsom, possibly by high double digits. I'm going to bed early tonight livevoterturnout.com/sandiegoca/Liv…
At this point, given the CA recall polling and VBM data, all we're looking for in the early vote tonight is confirmation of projected partisan distributions. Returned ballots are sufficiently Dem that we just need to assess loyalty + turnout. LA county VBM +25 Newsom would do it.
The very fuzzy math here is that early mail ballots statewide (+40) were about 10 points more pro-Biden than the final results (+30) in California in 2020, and LA County (+45) was about 15 pts more Biden than the whole state. So +55 early LA = +30 CA-wide in the end. +25 = tied.
You don't call an election just based on one county, of course. But Los Angeles cast 25% of CA's votes in 2020 so it's a good guide.
Other VBM benchmarks could be:
Recall +11 in Orange County (+19D 2020 VBM = 30 overall :: -11D VBM = 0)
Keep +3 in SD (+33D VBM = 30 :: 3D = 0)
Regardless of what happens in the election, the bottom line is that modern statistical methods can provide better analyses of polling data than the stuff available ten years ago, especially (but not only) because of recent problems in the polling industry. Time to do better.
Here's my final update to this model of California recall polls. I'm calculating an aggregate that adjusts polls based on whether they use partisanship in their weighting schemes, and draws different trends for adjusted v unadjusted data. Newsom +18 +/- 10 gist.github.com/elliottmorris/…
The point of this project was to illustrate the different methods we can use to aggregate polling data — esp in how to improve existing popular averages that don't peer under the hood of how pollsters are processing their data, an increasingly important aspect of public polling.
So, note two things:
1. Popular averages magnify unlikely trends in public opinion by being whipsawed by data that is subject to higher standard errors than a decade ago (when the models were first made). Weighting by party flattens trends by decreasing nonresponse
The thing about Rasmussen (and to a lesser extent, some other right-leaning pollsters and aggregators) is that the conservative information ecosystem has provided a top-dollar audience for confirmation bias, and there's not much AAPOR or good political journos can do about it.