Unless you’re serving identical requests arriving at an identical rate, requiring identical amount of work in a completely static environment, every server process is a multi tenant system.
“Constant work” systems need to be shedding load constantly too.
I’m thinking of something like a per request-class backpressure. Where requests that disrupt the steady state (SLO) are penalized (outright rejected or put in low priority queues) more than requests that don’t.
Where different “types” of requests go into different queues, with the highest priority queues being processed (often in a LIFO manner), and lower priority requests being queued.
With enough client-side jitter when retrying, and an “upstream” aware load balancer that knows not to send a client request to a previously rejected backend, this can work fairly well?
Anyone know of other ways?