Ensemble approaches, also known as combination methods, exist in statistics and data science to try & make sense of predictions coming from different models by combining them through some scheme.

In some instances, combining models by averaging predictions can be accurate!
Ensemble predictions tend to perform well as a means to settle highly varying estimates.

When one model prediction is too high,

And another model prediction is too low,

Sometimes the average of the two may be just right.
Could we use them for election polls?

There have been means of combining poll results from different firms to get some insights.

Here is Wikipedia's approach, wherethey collated results & found a smoothed middle among the varied results thru LOESS.

en.m.wikipedia.org/wiki/Opinion_p…
Personally, different opinion research firms would require differing weights as a function of their quality, but again, sometimes average or middles may be sufficient for accuracy and insight.
Now, I will make some toy analysis trying to combine Pulse Asia's results and Google Trends, especially when trying to reconcile some representation issues.
So many heavy assumptions that can be taken as points of attack, and generally my reply to valid non-troll critique would be:

"I know, understand, and acknowledge your critique, and with that, let's continue playing with this toy."
Let's start with one assumption:

That Google Trends are strongly correlated with vote percentages.

Pretty heavy assumption. I know. I hurled inside my mouth trying to continue with this, but let's continue with playing with this radioactive toy.
Second assumption:

That Pulse Asia survey results are strongly correlated with vote percentages in May.

Still pretty heavy, but this has been supported from previous elections, so not that noxious.
Third assumption, but this one is malleable as we continue:

Both Pulse Asia and Google Trends results are captured from the same population but their methods lead to different results and can produce representation issues.

We will try to make a fix in the age issue later.
So here were Pulse Asia & Google Trends presidential results, shot as of May 3, 2022, 11:58pm.

By simple average,

Marcos = (56 + 24)/2 = 40%
Robredo = (23 + 55)/2 = 39%

In reconciling these two sources of data, it may seem that they are practically tied, but favors Marcos.
Again, note that these rely on VERY HEAVY ASSUMPTIONS and should be only seen as toys.

Now, let's continue.

Let's adjust the third assumption, but the fix that I will do only takes into account the age representation between Pulse Asia and Google.
Third Assumption, changed:

The Pulse Asia and Google Trends results are adjusted to reflect voting-age population projections by weighted average, increasing weights from underrepresented age groups and decreasing weights for overrepresented groups.
This necessitates two additional assumptions:

Fourth Assumption:

The Google Trends results are all the same for each age group.

There is no age breakdown in Google Trends, so I will have to make do with this.
Fifth Assumption:

Voting age population projections are perfectly reflected on COMELEC population of registered voters.

It may not be the case so I will accept that critique.

Though, please give me some leeway as COMELEC has bad age bins:
15-41, 42-55, and 56 and above. Nope.
Some of the population %ages have been discussed in the "Question of Age Representation" thread, shown below

And here is the toy monster:

Marcos is estimated to have 41.96% of the vote,
Robredo is estimated to have 38.35% of the vote
How were the reconciled results were computed?

The formula is:

For each age group,
For example:

For Marcos's age 18-24 reconciled result,

58.27 = (19.34/10.83 * 72 + 19.34/27.05 * 24) / (19.34/10.83 + 19.34/27.05)
The formula for the overall result?

Overall Result
= sumproduct(reconciled results, est. voting population %) / sum(est. voting population %)

If you would like to play around with the toy analysis:

docs.google.com/spreadsheets/d…
*How were the reconciled results computed?
Do these results hold some ground?

They depend on the assumptions, of which with some I would vomit for even considering.

Again, VERY HEAVY ASSUMPTIONS, and thus these are simply toy analyses.

Whatever helps me sleep, of which I need now.
PS:

I also made for the VP race between Sara, Kiko, and Tito.

It seems Sara may garner the majority vote.

Side comment: We made the youth (age 18-24) more ignorant of our history.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Peter Cayton, the Stats Guy

Peter Cayton, the Stats Guy Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @PJACaytonPhD

May 2
Daily Change(+/-%) & Crude Days-to-Double

> Conf = 188, 253 days
>> 1 case purged, reflecting an increase of 187 (+0.0051%) in total cases
> Recov = 838 (+0.0232%), 245 days
> Deaths = 36 (+0.0215%), 261 days
> Net daily change in active = -664
Source: doh.gov.ph/covid19tracker

Personally-Archived Data: docs.google.com/spreadsheets/d…

Highest recovery rate ever.

Highest case fatality rate since Jan 16, 2022.

Lowest no. of active cases since April 19, 2020.
Tracing Statistics as of Mar 28, 2022 (ndrrmc.gov.ph)

Case:Close Contacts Ratio = 1:5
Optimal Ratio (analogous to testing): 1:19 to 1:49
Worsening situation.

Close Contacts Listed & Assessed w/in 24 hrs: 97%
Optimal Value: >=80%
Preferred situation.
Read 7 tweets
May 2
Let's discuss this one:

Question of Age Representation

Here below is the estimated voting population with percentage breakdown by age, and Pulse Asia's percentage breakdown.

Pop'n nos. used are 2022 projections.

I estimated the 18-19 age group as 40% of the 15-19 age group.
Yes, Pulse Asia seemed to have underrepresented the 18-34 group and overrepresented the 45 and older. Kinda nailed it in the 35-44 range.

What if we add the estimated social media internet users in the Philippines?
I used the FLEMMS 2019 survey as the basis, with some extrapolation for the 65 and over age group.

I assumed the 18-19 group have the same frequency as the 15-19 group in social media use.

Population bases are again the 2022 projections
Read 4 tweets
May 2
The website of the April 2022 survey by Pulse Asia.

pulseasia.ph/april-2022-nat…
And additional breakdowns

pulseasia.ph/pb-april-2022-…
I see this age-disaggregated result as sufficient evidence for our nation's failure to properly educate the youth about the history of our country.
Read 5 tweets
Apr 11, 2021
"In a rare admission of the weakness of Chinese coronavirus vaccines, the country’s top disease control official says their effectiveness is low and the government is considering mixing them to give them a boost."

apnews.com/article/beijin…
'... Chinese vaccines “don’t have very high protection rates,” said the director of the China Centers for Disease Control, Gao Fu, at a conference Saturday in the southwestern city of Chengdu. ...'
'... Beijing has distributed hundreds of millions of doses in other countries while also trying to promote doubt about the effectiveness of Western vaccines. ... '
Read 5 tweets
Apr 10, 2021
Note:
> 19 case purged
> 56 recovs purged/changed status
> 1 death purged/changed status

Daily % Change & Crude Days-to-Double
Conf= 1.5056%, 134 days
Recov= 0.0829%, 168 days
Deaths= 1.5427%, 157 days

Net increase in active = 11,894

Data: docs.google.com/spreadsheets/d…
Highest no. of active cases ever.

Second highest no. of new cases reported ever.

Lowest recovery rate since Sept 19, 2020.

Recovery is the slowest of the three counts.

The end of the pandemic is far from sight.
The predicted no. of new reported cases for April 11, 2021 is 15,705 (90% CI: 9,326 -- 24,060).

The methods are based on epiforecasts.io/covid/methods.…, using the most recent 180 days of data available as of April 4, 2021, and adjusted to DOH reporting delays.
Read 4 tweets
Apr 10, 2021
I am slightly aggravated that a private firm is on a broadsheet media saying "we only need 35 million vaccinated for herd immunity".

The gall!

So, let's tickle this devilish idea with some simple mathematical equation

H = (1 - 1/R)/E

academic.oup.com/cid/article-pd…
H = herd immunity threshold thru vaccination
R = reproduction number
E = vaccine efficacy

Suggested H this truth-misrepresenting private firm is 35 million or 35/110 ≈ 31.8182%.

Currently, the available vaccine has 50.4% efficacy based on FDA's EUA

fda.gov.ph/wp-content/upl…
Plugging in the values:

35/110 = (1 - 1/R)/0.504
=> 0.1603636364 = 1 - 1/R
=> 1/R = 0.8396363636
=> R = 1.19099

To meet their conditions, the virus should be transmitting with R = 1.19099,

COVID-19 has R bet. 2 -- 4, 50% more w/ some variants, when there are NO INTERVENTIONS.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(