Estimating API / Web Traffic Load

One of the challenges facing anyone trying to capacity plan  is how to estimate immediate concurrent  load on the system. This is more important in a way than the average number of requests per second over a day, it is the maximum concurrent work being carried out at the same time. This information lets you plan peak memory and CPU in a way that means your service can remain responsive according to your needs.

I wanted to share some insight into the very bursty nature of the traffic I see on the upgrade.digital API by way of an example which may surprise you:

Let's assume you have been told your site will receive 200K hits per day (let's just assume a single request here for argument sake). Sounds OK right? Even the slowest web server would be able to handle ~140 requests per minute right? Nope...

In the real world of e-commerce all of your traffic comes in the peak 8 hours of the day ramping up, typically as the day goes on, looking at it this way you are going to be seeing about 400 requests per minute. Starting to sound a bit more interesting right?

Beyond that we get to the true 'peak' concurrent minute where we are seeing around 2000 requests. This gives us some indication of how far we need to scale out and also gives a nice ratio:

Given a daily traffic estimate of T per day, the average per minute traffic over 24 hours would be X, assume typical load is 3X and assume peak minute load is 15X.

Now lets assume that your traffic arrives uniformly inside that second and is dispatched without hitting any kind of queueing limit, for simplicity let's say you are seeing a peak of 1800 requests per minute, that is 30 Queries Per Second (QPS). For our systems, most requests resolve in around 100 - 500 milliseconds, one arrives every 33 milliseconds and three more arrive before it completes, giving us around 4-15 concurrent requests depending on the traffic class.

The fun starts when you get good enough at your tracking to notice when you stop getting linear scaling in your current architecture. Typically you are looking for a breakdown in requests / throughput where your assumptions about request time start to break down. Monitoring AWS ELB response time against requests is a good way to understand this load factor.


No comments:

Post a Comment