Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most analyses I've read say the threshold is around the 80% mark [1], although it depends on how model the distribution, and there's nothing magical about the number. The main thing is to avoid getting close to 100%, because wait times go up exponentially as you get closer to the max.

Little's Law is fundamental to queueing theory, but there's also the less well-known Kingman's formula, which incorporates variability of arrival rate and task size [2].

[1] https://www.johndcook.com/blog/2009/01/30/server-utilization...

[2] https://taborsky.cz/posts/2021/kingman-formula/



Really both of those models show 60% as about the limit to where you're still effectively at the baseline for latency. 80% is just about the limit to where you're up there in the exponential rise, any higher and things become unusable.

0-60 and you're still at minimum latency. 60-80 you're at twice the latency but it's probably worth the cost savings of the extra compute density since it's still pretty low. Higher than 80 and things are already slowing down and getting exponentially worse by the request


If you look at the chart in the second link, where does the wait time leave the origin? Around 60%.

The first one is even worse; by 80% you're already seeing twice the delay of 70%.

If I were to describe the second chart I'd say 80% is when you start to get into trouble, not just noticing a slowdown.

I said minimize latency, not optimize latency.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: