Ensemble approaches, also known as combination methods, exist in statistics and data science to try & make sense of predictions coming from different models by combining them through some scheme.
In some instances, combining models by averaging predictions can be accurate!
Ensemble predictions tend to perform well as a means to settle highly varying estimates.
When one model prediction is too high,
And another model prediction is too low,
Sometimes the average of the two may be just right.
Could we use them for election polls?
There have been means of combining poll results from different firms to get some insights.
Here is Wikipedia's approach, wherethey collated results & found a smoothed middle among the varied results thru LOESS.
Personally, different opinion research firms would require differing weights as a function of their quality, but again, sometimes average or middles may be sufficient for accuracy and insight.
Now, I will make some toy analysis trying to combine Pulse Asia's results and Google Trends, especially when trying to reconcile some representation issues.
So many heavy assumptions that can be taken as points of attack, and generally my reply to valid non-troll critique would be:
"I know, understand, and acknowledge your critique, and with that, let's continue playing with this toy."
Let's start with one assumption:
That Google Trends are strongly correlated with vote percentages.
Pretty heavy assumption. I know. I hurled inside my mouth trying to continue with this, but let's continue with playing with this radioactive toy.
Second assumption:
That Pulse Asia survey results are strongly correlated with vote percentages in May.
Still pretty heavy, but this has been supported from previous elections, so not that noxious.
Third assumption, but this one is malleable as we continue:
Both Pulse Asia and Google Trends results are captured from the same population but their methods lead to different results and can produce representation issues.
We will try to make a fix in the age issue later.
So here were Pulse Asia & Google Trends presidential results, shot as of May 3, 2022, 11:58pm.
In reconciling these two sources of data, it may seem that they are practically tied, but favors Marcos.
Again, note that these rely on VERY HEAVY ASSUMPTIONS and should be only seen as toys.
Now, let's continue.
Let's adjust the third assumption, but the fix that I will do only takes into account the age representation between Pulse Asia and Google.
Third Assumption, changed:
The Pulse Asia and Google Trends results are adjusted to reflect voting-age population projections by weighted average, increasing weights from underrepresented age groups and decreasing weights for overrepresented groups.
This necessitates two additional assumptions:
Fourth Assumption:
The Google Trends results are all the same for each age group.
There is no age breakdown in Google Trends, so I will have to make do with this.
Fifth Assumption:
Voting age population projections are perfectly reflected on COMELEC population of registered voters.
It may not be the case so I will accept that critique.
Though, please give me some leeway as COMELEC has bad age bins:
15-41, 42-55, and 56 and above. Nope.
Some of the population %ages have been discussed in the "Question of Age Representation" thread, shown below
> Conf = 188, 253 days
>> 1 case purged, reflecting an increase of 187 (+0.0051%) in total cases
> Recov = 838 (+0.0232%), 245 days
> Deaths = 36 (+0.0215%), 261 days
> Net daily change in active = -664
"In a rare admission of the weakness of Chinese coronavirus vaccines, the country’s top disease control official says their effectiveness is low and the government is considering mixing them to give them a boost."
'... Chinese vaccines “don’t have very high protection rates,” said the director of the China Centers for Disease Control, Gao Fu, at a conference Saturday in the southwestern city of Chengdu. ...'
'... Beijing has distributed hundreds of millions of doses in other countries while also trying to promote doubt about the effectiveness of Western vaccines. ... '
The predicted no. of new reported cases for April 11, 2021 is 15,705 (90% CI: 9,326 -- 24,060).
The methods are based on epiforecasts.io/covid/methods.…, using the most recent 180 days of data available as of April 4, 2021, and adjusted to DOH reporting delays.