A further hint here.
Remember that we said Kishore was close to the idea we are after.
Hint. What do those three variables share, in your experience?
Not the answer I was looking for, but getting at exactly the right issue.
Let's remind ourselves about Parametric and Non-parametric STATISTICAL TESTS.
(@alexnowbar please make a note of this as a topic to cover specifically as people do find it a bit puzzling sometimes)
"Go to point (2, -1). Every pixel that is within 3 pixels of that point, is in my region of interest. Everything outside is not."
How many numerical parameters do I need to describe the position of the light blue region?
I can see how Donald Trump became president.
X coordinate of centre of circle, i.e. 2
Y coordinate of centre of circle, i.e. -1
Radius of circle, i.e. 3.
How many numbers is that?
Anyway I am considering 2 and -1 to be separate numbers.
(I can't believe I'm having to point this out)
With those 6 parameters, 2 coordinates for each of the 3 corners, the triangle is described unambiguously, and uniquely - that is, only that exact triangle has those particular 6 values.
For example, how about:
X and Y coords of point B, and then
Orientation of one side (0 to 360 degrees)
Orientation of other side (0 to 360 degrees)
Seems to be 4?
So you need
X and Y coords of B, plus
Orientation AND LENGTH of one side
Orientation AND LENGTH of other side
So that's 6 again.
It turns out 6 is the minimum needed.
"These two groups of numbers, heights of men and heights of women. Could they easily be explained as one big group, with man versus woman being unrelated to height?"
"If in reality gender has no effect on height, how likely is it that when you measure heights of men and women, you would get a result as different as this between groups?"
The most convenient summary is when the data look reasonably "Normal" (Gaussian) in distribution.
How many numerical PARAMETERS have we reduced our detailed graph to?
Starting with the raw data
Computing 4 parameters (mean & sd of one group, mean and sd of other group)
THROWING AWAY THE RAW DATA
and then feeding the 4 parameters into the stats.
Even if you didn't, you NOW know, don't you?
The unpaired t-test does not need the RAW data of all your million people or whatever.
Just the 4 *PARAMETERS*.
You know what's coming, don't you?
It tests the PARAMETERS, not the raw data."
D Francis
J Blindingly Obvious, 2020
But what about the left? The program will have a very different impression from reality
So you end up having to feed in ALL THE RAW DATA.
Not just the parameters.
Not the parameters.
So these statistical tests (not your data!) are called
It's just that the parameters aren't very helpful in describing the situation.
Your wealth distribution does have parameters such as mean (many billions!) and SD (also many billions), but that doesn't really give a useful picture of the situation.
"Doctor, doctor, I must be dying! My BNP has doubled since last time!"
But your BNP can easily double, and halve between visits, just on a whim.
It has a large positive tail.
You can see that almost half of the Normal distribution is dangling into the negative territory.
BNP
CRP
Troponin
Income of people in a whole country
Simply ask yourself "Is it possible to have a value 3 times that of the mean?"
If the answer is "sure, yes!" then the distribution is 100% definitely skewed.
In a 100 patients, selected on any non-CRP criterion, I guarantee that you will find the upper quartile much, much further away than the lower quartile, from the median.
Go and check, if you don't believe me.
Which one of the following is NOT a possibility.
Why is that not instantly winning, due to not even existing?
Those are NOT medians.
They are means.
A mean can be 3.2, with less than half of the cases being 3 or more, because a few high values are lifting up the mean a lot.
4.1 ...................... 5.6 ........ 6.6
The skew is the wrong way round.
In a 100-patient group, that never happens. So I am sure it is a typo.
OR mean and 95% CI limits of mean
They can't be SD, because they are two narrow.
The two underlined ones are presumably typos.
13 22 32
31 43 56
Can you visualise it?
The "32" is the upper quartile of the first one, and
the "31" is the lower quartile of the second.
If I draw a vertical line at around 31-or-32,
ROUGHLY how many of the first histogram's area is to the LEFT of it?
It will be <0.001.
Parametric or non-parametric, there is no way P is 0.02.
That is what I wanted to say originally.
The abstract is odd in another regard too.
I doubt that the age distribution is really as narrow as shown. Probably again the same issue, mean and something, not IQR.
I wouldn't know a T1 or 2 time if it bit me on the nose.
Never met one in my normal life, doing humble echo etc.
Does anyone have experience? Graham has his head in his hands and all I can get out of him is "dont' go there".
If, as I thought I was extremely confident of, those are not IQRs but a confidence interval of the mean, then it should be mean +/- 2 SEs, i.e. for 100 patients
mean +/- 2*SD/sqrt(100)
mean +/- SD/5
I can't work out how the values in the abstract, the P values, and the graphs, can all coexist.
The problem is that all 3 seem to be mutually incompatible.
I think no single one of the 3 can be fixed, to make a compatible set.
Open to ideas?