The people have spoken! This thread is part 1; part 2 will come soon, once I manage to answer all the questions that will no doubt be coming my way.

The problem of constructing the SEV function in adaptive trials was the point of departure for my investigations into the SEV function that recently received some attention on twitter (hi @Lester_Domes!). Here I aim to explain only where I started, not where I finished.
But before we can talk about constructing the SEV function in adaptive trials I need to introduce the idea of severity and say something about the what, the why, and the how of it.
—Disclaimer—

I don’t claim to be authoritative; the source I’m working off of is Mayo’s own definitive article on the subject.

phil.vt.edu/dmayo/personal…
I feel certain that Mayo would rather you read that than this, and frankly I wouldn’t disagree; but perhaps all you have time for is my humble twitter thread. Just be aware that this is a very abbreviated take on the concept.
—The What—

The notion of severity builds on a rule of inference in classical logic called modus tollens:

major premise: A implies B
minor premise: B is false
conclusion: A is false
For example:
If there is smoke the smoke detector beeps; the smoke detector isn’t beeping; therefore there’s no smoke.
But smoke detectors aren’t necessarily perfect; severity reasoning extends modus tollens to handle this case.

premise: with high probability the smoke detector beeps when there’s smoke
data: no beep
inference: no smoke
The inference doesn’t follow logically, but (we are told) we can say it is well-warranted nevertheless: if there had been smoke we would almost certainly have detected it, so the inference ‘no smoke’ has passed a severe test.
Or has it? What if the detector beeps with high probability even when there’s no smoke? This sort of thing is addressed by requiring that the data agree with the inference we’re making; if the detector is really just a beep machine then this criterion is not satisfied.
This looks pretty similar to likelihood but the similarity will disappear in the continuous case.
—The Why—
(well, one reason why)

Frequentist methods come in for a fair bit of criticism because there’s a conceptual gap between the way they’re justified (good long-run operating characteristics) and what we use them for (inferences in the specific case at hand).
The problem is particularly stark when considering confidence interval procedures because some of them can be shown to sometimes do an inarguably poor job in specific cases.

learnbayes.org/papers/confide…
But the conceptual gap in the *justification* for frequentist methods exists regardless of performance. Severity reasoning bridges this conceptual gap by connecting the frequency properties of a method of inference to the actual inference in the specific case at hand.
The argument is that *this specific inference being made* has passed severe test because (and just to the extent that) if it were in error then the test we used would have detected the error with high probability.
—The How (continuous case, fixed sample size)—

Let’s look at the severity criteria:
To construct the SEV function we’ll instantiate criterion (S-2) in math. Let’s consider the most toy-like of all toy models: normal distribution, unknown mean μ, variance 1, n = 1, test statistic X with observed value x. We want to infer, for some particular value m, that μ > m.
We’ll do it in steps.

(S-2) says “A hypothesis H passes a severe test T with data x if… with very high probability…”. Our inference H is ‘μ > m’, and the severity of the test it has passed is the probability of something TBD.
“supposing H is false”

Our inference ‘μ > m’ is false if μ ≤ m. Let’s put that where the parameter goes.
You have probably noticed that ‘μ ≤ m’ doesn’t actually pick out a parameter value. We take it to mean that we need the worst case (lowest SEV) among the possible μ ≤ m, and in the normal model it will turn out that this is just μ = m.
“test T would have produced a result that accords less well with H than x does”

(Heads-up: this will by the tricky part when we get to adaptive trials.) This is the event ‘X < x’, so let’s put that into the event slot. In the end we find that the SEV function is:
This expression is to be thought of as a function of m; we get severity assessments for the full set of inferences of the form ‘μ > m’ all in one go. A passable graph of a SEV function can be found in Figure 3 of the Error Statistics paper linked near the start of the thread.
This function is more often known as the “p-value function”. The neat bit is that we arrived at it via an argument that says that it tells us about the warrant for various inferences in the case at hand; it isn’t merely about the long run properties of the test statistic.
Part 2 to follow shortly!
In the meantime here's an entr'acte:

Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Corey Yanofsky truly and sincerely
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!