Profile picture
Nick Craver @Nick_Craver
, 10 tweets, 2 min read Read on Twitter
A lot of comments on our low CPU usage at Stack Overflow the past few days. But I haven't seen something critically important to understand come up at all.

Let's talk about CPU usage and how it's measured. It's often not what you think.
When we want to get a metric from a system, "% CPU" isn't a metric. There is no way at any point in time ever to get this data. A CPU is a complex construct of parallel pipelines and stages and things are always at various points along the way.
We can't get point in time data. It's not possible. So what we do is look at a window of data. You take some slice of time (let's pick one and say a second) and you measure how long was spent and how much was done.

These are counters in every modern OS. Then we divide.
Why does this matter? Because that time slice matters. Using us as an example, we render pages in < 20ms. Did we have a pegged CPU for half a second averaging 50% CPU with that divide? Or did we have a constant ~50% with capacity overhead the whole time?

You can't tell.
The important thing to remember about counters and data collection is that they are only valid observations down to how finite the counter in. Within that windows is *an average*. With all the caveats and mysteries that an average comes with.

Now, on recording...
Counters aren't trivial, there's some cost to accessing them. Recording them (and sending their data somewhere) takes:
- CPU on the host to read
- network bandwidth to send
- Storage to store
- More CPU capacity to process and view

So more often: more expensive. It's a balance.
At Stack Overflow we default to 15 second intervals on system metrics collection. It's the best balance for us. Monitoring is great, but if you dial it to 11 and the CPU you're wanting to monitor is now being eaten primarily by the monitoring...well, yeah...don't do that.
When I try to explain intervals, I find it helps to explain bandwidth instead.

Is your network connection 1gbps?
Is it 100mb/100ms?
Is it 1mb/1ms?

The answer is yes. While we look at a second to be reasonable, the limits and throughput are much more finite in practice.
Modern CPUs are starting to push 5 BILLION operations per second per core. For us to measure it in seconds is in some ways laughably silly. But, it's also reasonable. Measuring in billionths of a second is far sillier.

Anyway, keep in mind: you're often looking at an average.
To wrap up:
Having headroom *in an average* doesn't mean you can get away with less CPU and maintain the same performance. When your units of work are small, those tiny-time 100% spikes are averaging out.

It's good to trust metrics, but only once you understand what they mean.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Nick Craver
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!