Say you want to figure out which beliefs to target in a behaviour change campaign, and as part of the evaluation look at correlations between two self-reports, like beliefs and intentions:

A Tale of Non-linearity 🧵👇

1/
In the process of Confidence-Interval Based Estimation of Relevance (CIBER) you aim to find variables that are both a) correlated with something more "downstream" (such as behaviour or behavioural intentions), and b) changeable (not maxed out already)

2/

ncbi.nlm.nih.gov/pmc/articles/P…
It's not uncommon to end up with highly skewed distributions. This doesn't of course always happen, but it does sometimes, even though people try to craft their questions such that the middle answer is the most common, and the rest are symmetrically less so.

Real data:

3/
Now, what happens when you take a correlation from two variables with a disproportionate number of people answering "7" on a scale of 1-7 (i.e. "extremists"), and everyone else answering randomly?

Something @nntaleb called "Dead Man Bias".

Simulation:

4/
In the case of the real data presented earlier, the authors ended up choosing the underlined variable, as it was both correlated and changeable. There was ~30% of people answering 7.

The regression line shows you how well the sample is described by the correlation...

5/
You can see that only the {7, 7} folks are well described by the correlation. Positive correlation is seen in the upward slope of the line. In the left panel there is the real data, in the right is data where {7, 7} is kept as is, and everyone else's answers are shuffled.

6/
The original correlation of 0.31 remains as it is, even if all non-extremists answer randomly!

You make a naive demonstration by removing all pairs with a 7 (right), or the {7, 7} extremists (left).

7/
Maybe it's still an adequate description of the data generation process. Still, correlation doesn't seem the right tool for the job.

8/
In samples of 1000 (left), the effect is clearer than in samples of 250 (right). But information-based measures still outperform correlations. Surprised to see Spearman perform even worse, although I should've believed Nassim.

9/
There's a nice blog post on the topic by @DavidSalazarVir, with #rstats code. Look under "Correlation under non linearities".

10/

david-salazar.github.io/2020/05/22/cor…
As a general note, avoiding skewed distributions with subgroups is a good idea if you need to use linear tools made for homogeneous populations.

But maybe you want to do stuff with diverse types of data 🤷‍♂️

Quick demo based on NNT's recommendation:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with matti heino

matti heino Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Heinonmatti

10 Jul 20
In case you're late to the party:

1/4 In the absence of a physical law forcing boundaries on a metric, it becomes fat-tailed, i.e. a single observation can be more important than everything that came before, combined.
2/4 There is this parameter called alpha, which quantifies the thickness of the tail, i.e. how bad the situation is compared to one where you can happily just use normal approximations and non-parametrics.

david-salazar.github.io/2020/05/19/und…
3/4 Turns out that the alpha exponent is actually pretty well-behaved, that is, you don't need a ton of data to estimate it, and it gives you veeeeery important information as regards the actions you should be taking.
Read 4 tweets
25 Jan 20
Thread on silly late night musings regarding chaos and ecological momentary assessment:

I was watching these videos and playing around with data. 1/

There's this idea that the extent of chaos can indicate system failure, or destabilisation such as shown by this awesome work by @OlthofMerlijn 2/ psych-networks.com/how-to-study-e…
I wondered, what kind of OBJECTIVE data I'd have that could show periodicity and chaos in time (like in fig) and realised I could play around with the inter-response intervals from our study, where office workers were beeped 5/day to answer motivation surveys

3/
Read 9 tweets
16 Jan 20
SHOULD WE TREAT FEVER [in children]? Thread based on a quick literature search for personal interest's sake.

I'm either missing major pieces of evidence, or the #1 Finnish authority for health information gives strange advice. /1
Some background: The aforementioned organisation, @DuodecimFi, disseminates information to doctors and the general public. Their article [terveyskirjasto.fi/terveyskirjast…] is v. positive towards fever reduction and says there are no adverse effects. /2 Image
According to Duodecim, you should use antipyretics (paracetamol, ibuprofen etc.) for fever higher than 38.7°C/101.7°F. In Helsinki, we also have consultation service which tells you that for 2-year-olds, you need to medically lower fever if ear measure reaches 37.8°C. /3
Read 16 tweets
31 Oct 19
Ok, the Russians were here, and I didn't understand a thing. Next up @trishankkarthik, who's claiming Quantum Supremacy isn't a racist thing. Let's see how this goes.

#RWRI
Taking an integrative non-segregationist view, he's explaining that all computers are basically the same. #RWRI
Ok, so, point is that some things are logically impossible. There is a perfect answer but it takes a shitton of time (which you don't have) to find it out... But if you're given an answer, much easier to figure out if it's right or not. #RWRI
Read 6 tweets
30 Oct 19
1/3 Order from randomness: @financequant demonstrates #complexity emerging. Check out what comes from this simple algorithm iterated... #RWRI
2/3 Voila! The goddamn Sierpinski triangle! #RWRI
3/3 I really suggest trying this out on your computer, to get the eery feeling of demonic possession. (@FredHasselman has R code here: anti-ism-ism.com/post/the-chaos…) #RWRI
Read 6 tweets
25 Jun 19
Me and @AleksiHalsas are doing a 5-day fast, 13 to 17 July. If you want to jump in, there's still time to try out shorter ones before liftoff!

(For an easy intro to fasting, see p. 4 onward here: )
@AleksiHalsas Day 1: Attended goddaughter's birthday party. Good times drinking black coffee in the middle of 🍰🎂🍡🥐🍪
@AleksiHalsas Day 2: Lots of walking. No hunger yet, as expected from a 3-day fast two weeks ago. Keto slips indicate body went to ketosis somewhere between 37 and 42 hours since last meal.
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!