My Authors
Read all threads
OK, #stats story time again. I've promised @RrrichardZach a thread about the #GermanTankProblem. So imagine it's 1943. You, a statistician for the Allies, would really like to have some idea of how many tanks are rolling off German assembly lines each month. 1/10
The intelligence community is making their own estimates, but you're using information from the serial numbers of various tank parts, mainly the gearboxes and the wheels. 2/10
Here's a simplified version of your problem: suppose your troops have captured tanks with gearboxes with the following serial numbers: 12, 71, 47, 112, 34, 45, 88, and 103. How many tanks would you guess have been produced? Clearly, at least 112... 3/10
...but since you don't expect the smallest serial number you see to be the smallest possible serial number, you can't expect the biggest serial number you see to be the biggest possible one, either. But how much higher should you go? 4/10
Answer: 112+[(112-8)/8]=125. Here's why: you have 8 data points, & 112/8=14. If they were evenly distributed, you'd have found serial nos. 14, 28, 42, ..., 98, 112. That would leave 13 points below the lowest one, between the 1st and the 2nd, ..., & above the highest. 5/10
In reality, it was more complicated than this. They were using serial numbers on multiple tank parts and cross-referencing them, but the situation above is what is called the German tank problem. 6/10
The methods worked. In Ruggles & Brodie, "An Empirical Approach to Economic Intelligence in World War II," 1947, they give
Month: Intelligence est., Serial # est., (Actual number)
6/1940: 1000, 169, (122)
6/1941: 1550, 244, (271)
8/1942: 1550, 327, (342)
#Stats wins! 7/10
(The Ruggles & Brodie paper is in the Journal of the American Statistical Association, Vol. 42, No. 237, March 1947. It's a really good read. Highly recommended. Should be available through JSTOR.) 8/10
This is a trick you can do any time you can see the serial numbers on something... and there's no way to defeat it without making the serial numbers, well, not serial and thus far less useful. 9/10
Finally & formally: You're trying to estimate the parameter N for a uniform distribution on 1, 2, ..., N based on a sample of size k. If M is the sample maximum, your guess will be
M+(M-k)/k = M+M/k-1.
This is the minimum-variance unbiased estimator for N. 10/10
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Prof. Johanna Franklin

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!