Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Denise Rystsov

@rystsov

Nov 11, 2022 • 12 tweets • 4 min read • Read on X

Scrolly

For a long time I've been thinking that using a closed loop (sync) for measuring latency is wrong

It's affected by the coordinated omission problem: imagine that all but one do_action executions take 1ms and the bad one takes one minute. If we look at p99.(9) we won't notice a problem and assume that everything is fine while the rogue request may block the system.

The "right" way out is to break the loop (async) and to issue the requests at the fixed rate (load). With this approach the rogue requests won't block the control flow and we notice the degradation

But things are not black and white. For example, it's hard to use that to determine the throughput of the system:
- if the load is below the throughput it's below by the definition :)
- if it's above then the system chocks and the latency skyrockets

While with the sync approach it's pretty simple to see the max throughput but first let's address the coordinated omission problem. First we chart the reported data

Then we break the time into buckets and measure the number of operations per ms per bucket

When a rogue request takes a lot of time it will be visible as the throughput drop. Not only per bucket but as full (when we sum all ops and divide by the duration of the experiment) throughput too if the pause is big enough

Ok, but still how to measure the true throughput of a system? We repeat the experiments multiple times and each time we increase the number of parallel sync clients.

Then when we plot full throughput and latency (p50, p99, it doesn't matter) from the number of then clients and we get the following chart

What's cool about the chart is that it always has that shape. When the system reaches the saturation point the throughput converges to the max throughput and the latency starts to grow linearly.

The magic of the sync client is the natural feedback loop so instead of choking the system like the async way does, the sync load adjusts to the system capacity and we get a clear picture.

For example this is an actual chart of the Redpanda's transactional performance. We see that the max tps throughput (for the cluster I used) is 10k distributed transactions per sec

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @rystsov

Denise Rystsov

@rystsov

May 1, 2020

You may wonder why I am staring on this king oyster mushroom, well buckle up it's going to be a long thread

When I was watching Firefly I noticed that one of its character (Hoban Washburne) looks familiar and searched if an actor played Seamus Zelazny Harper in the Andromeda tv series. Happened that they just look similar

The same repeated with the Star Trek: Discovery and Paul Stamets

Read 22 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Denise Rystsov

Try unrolling a thread yourself!

More from @rystsov

Denise Rystsov

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!