My first reaction was "that's dumb - a highway can hold more cars at 20mph than 65mph; it's throughput that matters." Then I read the article and realized that the summary was what was dumb. The actual study was not, and illustrates computer systems concepts, as I'll explain:
I've published this thread to my blog, so I ask you to use that link if you want to read on a single page, please don't ask threader apps to compile. blog.tacertain.com/latency-throug…
The important metrics for a lot of systems are latency and throughput. Throughput is how much work gets done per unit time (so in this case, people transported per minute), and latency is how long each individual piece of work takes (how long each person is on the escalator).
These two concepts are related, and Little's Law (en.wikipedia.org/wiki/Little's_…) gives the simplest relation. It says that the number of people on the escalator is equal to the average time each person spends on the escalator multiplied by the rate of people getting on.
(Note that the rate of getting on and the rate of getting off had better be the same, or there's something very strange going on.)
Little's Law is usually written as L = λW, so in our case, L is the number of people on the escalator, λ is the people per minute carried, and W is the time spent riding the escalator. Of the three things, the only thing we care about is W. ... What?? ...
Surely we care about λ, right? I mean, if a train drops off 200 people every 4 minutes, we'd better be able to move at least 50 people per minute on the escalator, right? Yes... but. The but is that the only thing your customer cares about is W - for the whole system.
As a train rider, I only care about me. How long does it take from when I exit the train to when I'm at the surface? So the throughput of the escalator (λ) matters because it impacts total system W for everybody.
If the escalator can only move 50 people per minute, then it'll take 4 minutes to clear each train's load. That means somebody is waiting 4 minutes, and ½ your passengers are waiting at least 2. But if it can move 100, then the longest wait is 2 and half are done in 60 seconds.
This relationship between latency and throughput is really important. When I wrote in the intro to this thread that "it's throughput that matters," what I really meant was that it matters more than carrying capacity in impacting latency.
But more throughput doesn't necessarily mean lower latency. For example, let's say I had a 200-person elevator that took 4 minutes to go from platform to surface, versus our escalator that can move 50 people per minute.
λ is the same for both systems - 50 people per minute. But the escalator is strictly better. For the elevator, everybody takes 2 minutes to get to the surface. For the escalator, 50 people take 60 seconds or less, and only the worst-off riders take 2 minutes.
Furthermore, higher λ is only better, if it decreases W. If my elevator could carry 800 passengers, then it has higher throughput than my escalator, but that extra capacity doesn't actually provide any benefits to the system.
Stuart Cheshire wrote a rant entitled "It's the Latency, Stupid" that gets into this relationship even more. It's super important for anybody building computer systems that serve humans. It's much better to double λ by cutting W in ½ than by increasing L. stuartcheshire.org/rants/Latency.…
OK. Getting back to the article. The headlines was "People who walk on the escalator actually slow everyone down," and the tweet talked about an escalator holding more people if they are all standing. You can probably see the flaw in this reasoning.
To analyze, we need to be a little careful. There are really two systems here. Riders care about the overall W, which we can decompose into the time waiting on the platform to get on the escalator (Wₚ) plus the time riding the escalator (Wₑ).
The only thing impacting Wₚ is λₑ (the throughput of the escalator). The more people can get on the escalator each minute, the less time people have to wait on the platform. So for Wₚ higher λ is unambiguously better.
Recall Little's Law is L = λW. Rearranging terms, we see that λ = L / W. Indeed, if we double the carrying capacity of each escalator, we can get more people through the system, which is going to be a win for Wₚ. Case closed, right? Well, if so, I wouldn't be writing this!😄
The flaw in the analysis is that we've ignored the impact on Wₑ. If people are walking up an escalator at least as fast as the escalator moves (a reasonable assumption), then the total time on the escalator when walking is half of that when standing.
Looking at both terms together, you see that if people are standing vs walking, L doubles (good!), but W doubles as well (bad!). In fact, λₑ will stay roughly the same, meaning Wₚ will stay the same, but Wₑ will double. Overall, standing will increase total wait time!
Note that if we increase capacity by doubling the number of escalators, that would be a pure win, since it would increase λₑ without impacting Wₑ. Overall wait time in the system will go down.
But while the headlines were misleading, the studies cited in the article were good. They actually measured total W, and it went down when people were only standing. Why is that?
The picture in the tweet holds a clue. The comparison isn't between everybody walking and everybody standing. In fact, the amount of separation people need when walking is a total red herring. The problem is not with the walkers, but with cultural norms.
The problem is that the escalators are divided up (by norm) into half for walking and half for standing. If the walking half were fully utilized, as we saw above, it would be pure goodness - same throughput, lower latency for walkers.
But the walking half is not fully utilized, because less than 50% of people want to walk. So now escalator throughput drops, which means that platform wait times go up, and riders are unhappy. Or at least riders who like to stand on escalators are.
The study showed that when the transit system enforced a no-walking rule, then the whole escalator was used, which meant that total average W decreased. Unfortunately, this isn't pure goodness, and I think that's the crux of this problem.
While enforcing a no-walking rule improved average wait times, it increase wait times for escalator walkers. Before, people in a hurry could get right on the escalator and cruise up, yielding much lower latency for them, but at the expense of higher latency for standers.
Generally, we dislike "unfair" outcomes, where some people are favored over others in a shared social setting. So the standers don't like that they have to wait longer while other people "take advantage" of the system.
On the other hand, leaving half of the escalator open for walkers means that people who are late for a job interview or whatever can get where they are going faster. My point is that there's no "mathematically correct" answer here. These are social questions.
Back to computer systems, the relationship between latency and throughput is usually less about social fairness and more about balancing average latency with high percentile latency (see blog.tacertain.com/p-four-nines/ for my take on why high-percentile latency matters).
But the important thing to remember is that customers only care about latency. Improving throughput by increasing parallelism is often necessary, but isn't nearly as good as reducing latency directly.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
People dunked on this tweet, saying, in essence, "This isn't 100% correct - you shouldn't pay attention." But that misses the point. The value of any model is that it's simpler than reality so that you can gain insight. Here are the insights I have gained from this model.
Fred Brooks first put forth the idea that adding people to a late project makes it later, and stated that the pairwise communication was the real killer. Note that he was only talking about adding people to a late project - more on this later. But first, a digression (or two)!
Around the time I started QLDB, I read about the Universal Scalability Law via the always-informative @mbrooker (brooker.co.za/blog/2018/06/2…). This law extends Amdahl's Law to explain why adding more processors to a task can make the task take longer.
If you're wondering what "P-four-nines" means, it's the latency at the 99.99th percentile, meaning only one in 10,000 requests has a worse latency. Why do we measure latency in percentiles?
In 2001, I was managing the Performance Engineering team. We were responsible for the performance of the website, and we were frustrated. We were a few engineers fighting against the performance entropy of hundreds of developers adding features to the web site.
Latency kept going up, but we had a big problem: how to convince people to care. The feature teams had web labs that could show "if we add feature X to the page, we get Y more revenue."