Most analyses I've read say the threshold is around the 80% mark [1], although i...

dude187 · on Feb 28, 2025

Really both of those models show 60% as about the limit to where you're still effectively at the baseline for latency. 80% is just about the limit to where you're up there in the exponential rise, any higher and things become unusable.

0-60 and you're still at minimum latency. 60-80 you're at twice the latency but it's probably worth the cost savings of the extra compute density since it's still pretty low. Higher than 80 and things are already slowing down and getting exponentially worse by the request

hinkley · on Feb 28, 2025

If you look at the chart in the second link, where does the wait time leave the origin? Around 60%.

The first one is even worse; by 80% you're already seeing twice the delay of 70%.

If I were to describe the second chart I'd say 80% is when you start to get into trouble, not just noticing a slowdown.

I said minimize latency, not optimize latency.