matti heino Profile picture
Apr 14, 2021 12 tweets 6 min read Read on X
Say you want to figure out which beliefs to target in a behaviour change campaign, and as part of the evaluation look at correlations between two self-reports, like beliefs and intentions:

A Tale of Non-linearity 🧵👇

1/
In the process of Confidence-Interval Based Estimation of Relevance (CIBER) you aim to find variables that are both a) correlated with something more "downstream" (such as behaviour or behavioural intentions), and b) changeable (not maxed out already)

2/

ncbi.nlm.nih.gov/pmc/articles/P…
It's not uncommon to end up with highly skewed distributions. This doesn't of course always happen, but it does sometimes, even though people try to craft their questions such that the middle answer is the most common, and the rest are symmetrically less so.

Real data:

3/
Now, what happens when you take a correlation from two variables with a disproportionate number of people answering "7" on a scale of 1-7 (i.e. "extremists"), and everyone else answering randomly?

Something @nntaleb called "Dead Man Bias".

Simulation:

4/
In the case of the real data presented earlier, the authors ended up choosing the underlined variable, as it was both correlated and changeable. There was ~30% of people answering 7.

The regression line shows you how well the sample is described by the correlation...

5/
You can see that only the {7, 7} folks are well described by the correlation. Positive correlation is seen in the upward slope of the line. In the left panel there is the real data, in the right is data where {7, 7} is kept as is, and everyone else's answers are shuffled.

6/
The original correlation of 0.31 remains as it is, even if all non-extremists answer randomly!

You make a naive demonstration by removing all pairs with a 7 (right), or the {7, 7} extremists (left).

7/
Maybe it's still an adequate description of the data generation process. Still, correlation doesn't seem the right tool for the job.

8/
In samples of 1000 (left), the effect is clearer than in samples of 250 (right). But information-based measures still outperform correlations. Surprised to see Spearman perform even worse, although I should've believed Nassim.

9/
There's a nice blog post on the topic by @DavidSalazarVir, with #rstats code. Look under "Correlation under non linearities".

10/

david-salazar.github.io/2020/05/22/cor…
As a general note, avoiding skewed distributions with subgroups is a good idea if you need to use linear tools made for homogeneous populations.

But maybe you want to do stuff with diverse types of data 🤷‍♂️

Quick demo based on NNT's recommendation:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with matti heino

matti heino Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Heinonmatti

Jun 12, 2022
Uusi versio: *Pahin-On-Aina-Ohi -illuusio, eli miten viimeisimmät koronatilastot voivat olla ikuisesti laskussa*

1/

#koronafi #covid19fi #koronakriisi
Ilmiötä voi käyttää ehtymättömänä helpotuksen lähteenä:
1. Mene tarkastamaan viimeisimmät luvut.
2. Havainnoi, että pahin on ohi.
3. Toista ensi viikolla uudelleen, jättäen huomiotta että mennyt data muuttuu joka päivä, kun viivästyneitä lukuja lisätään siihen.

2/
Read 5 tweets
Jun 9, 2022
Alan huolestua koronatutkijoiden tilannekuvasta, mistä olin ajatellut #WELGO-väen olevan hyvin perillä. Keskustelussa kaikki näyttivät olevan samanmielisiä siitä, että pandemian opit koskevat (ylimitoitettuja) rajoituksia.

Kukaan ei maininnut #LongCovid tai #ilmahygienia.

1/
Ehkä parhaat palat olivat muussa seminaarissa, mutta siitä striimattiin vain paneelikeskustelu, josta jäi kuva, ettei oheisessa twiitissä kuvattua tilannetta ole olemassa.

Paikalla olleet: jakakaa tilaisuuden huippukohdat!

#koronanOpetukset

2/

Mietin, mm.

a) kuinkahan laajalti suomalaiset asiantuntijat jakavat saman käsityksen siitä, mistä tässä taudissa on ylipäätään kyse?

3/
#koronanOpetukset
Read 22 tweets
Jul 10, 2020
In case you're late to the party:

1/4 In the absence of a physical law forcing boundaries on a metric, it becomes fat-tailed, i.e. a single observation can be more important than everything that came before, combined.
2/4 There is this parameter called alpha, which quantifies the thickness of the tail, i.e. how bad the situation is compared to one where you can happily just use normal approximations and non-parametrics.

david-salazar.github.io/2020/05/19/und…
3/4 Turns out that the alpha exponent is actually pretty well-behaved, that is, you don't need a ton of data to estimate it, and it gives you veeeeery important information as regards the actions you should be taking.
Read 4 tweets
Jan 25, 2020
Thread on silly late night musings regarding chaos and ecological momentary assessment:

I was watching these videos and playing around with data. 1/

There's this idea that the extent of chaos can indicate system failure, or destabilisation such as shown by this awesome work by @OlthofMerlijn 2/ psych-networks.com/how-to-study-e…
I wondered, what kind of OBJECTIVE data I'd have that could show periodicity and chaos in time (like in fig) and realised I could play around with the inter-response intervals from our study, where office workers were beeped 5/day to answer motivation surveys

3/
Read 9 tweets
Jan 16, 2020
SHOULD WE TREAT FEVER [in children]? Thread based on a quick literature search for personal interest's sake.

I'm either missing major pieces of evidence, or the #1 Finnish authority for health information gives strange advice. /1
Some background: The aforementioned organisation, @DuodecimFi, disseminates information to doctors and the general public. Their article [terveyskirjasto.fi/terveyskirjast…] is v. positive towards fever reduction and says there are no adverse effects. /2 Image
According to Duodecim, you should use antipyretics (paracetamol, ibuprofen etc.) for fever higher than 38.7°C/101.7°F. In Helsinki, we also have consultation service which tells you that for 2-year-olds, you need to medically lower fever if ear measure reaches 37.8°C. /3
Read 16 tweets
Oct 31, 2019
Ok, the Russians were here, and I didn't understand a thing. Next up @trishankkarthik, who's claiming Quantum Supremacy isn't a racist thing. Let's see how this goes.

#RWRI
Taking an integrative non-segregationist view, he's explaining that all computers are basically the same. #RWRI
Ok, so, point is that some things are logically impossible. There is a perfect answer but it takes a shitton of time (which you don't have) to find it out... But if you're given an answer, much easier to figure out if it's right or not. #RWRI
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(