PERCENTILE, BEST MEASURE FOR RESPONSE TIME

WHAT IS A PERCENTILE?

Percentile (common measurement in statistics) splits the given sample group into 100 equal-sized intervals and this enables the data to be analyzed in terms of percentages.

We can explain the usage of percentile with an example: The response time for an HTTP request below which 90% of the response time values lie, is called the 90-percentile response time.

In the following graph, 90% of the requests is processed in 3.0 seconds or less:

performance testing

WHY DO WE USE PERCENTILES?

If we speak statistically, there are a lot of methods to define just how good of an overall experience your product is providing. Averages are used commonly. They are quite easy to calculate and understand – however they can be misleading.

For example: Let’s say the average salary in a region in Europe is 1900€. When we look closer, we find out that the majority, 9 out of 10 people, only earns around 1000 Euros and only one earns 10.000. If you calculate, of course you will see that the average of this is indeed 1900, but we can all agree that this doesn’t represent the “average” salary as we would use the word in day to day life. So, now let’s apply this understanding to application performance.

THE AVERAGE RESPONSE TIME

The Average Response Time is the most commonly used metric in performance management. We assume that this shows a “normal” transaction, whereas, this would only be true if the response time is always the same, the response time distribution would be like bell-curved.

bell curved

In a Bell Curve, the average and median are the same. In other words, observed performance would represent half or more than half (the majority) of the transactions. In real world, most applications have a very few heavy outliers; a statistician would say that the curve has a long tail. A long tail does not imply many slow transactions, but a few of them have magnitudes slower than the standard.

automated testing

We recognize that the average no longer represents the bulk of the transactions. A better metric by far are percentiles, because they allow us to understand the distribution better. A percentile tells us at each part of the curve we look at, how many transactions are represented by that metric. To visualize this, check out the following chart:

load testing

The green line represents the average. As you can see, it is very volatile. The other two lines represent the 50th and 90th percentile. As we can see, the 50th percentile (median) is rather stable but has a couple of jumps. These jumps represent real performance degradation for the majority of the transactions.

The 90th percentile (this is the start of the “tail”) is a lot more volatile, which means that the outliers’ slowness depends on data or user behavior. The important one here is; the average is heavily influenced by the 90th percentile, the tail, rather than the bulk of the transactions.

A percentile gives us a better sense of our real performance, because it shows us a slice of response time curve. Especially for this reason, percentiles are perfect for automatic baselining. If the 50th percentile moves from 500ms to 600ms, we know that 50% of our transactions suffered a 20% performance degradation. You need to improve that, it is clear.

HOW TO CALCULATE PERCENTILES?

We will tell how to calculate percentiles with a simple example:

Let’s say we have a group of complex numbers that we need to sort from 1 to 10.  After sorting the array, the sequence is as the following:

 

1 – 2 – 3 – 4 – 5 – 6 – 7 – 8 – 9 – 10

 

%70’s numerical value is 7;

%80’s numerical value is 8;

%90’s numerical value is 9;

As you can see, calculating a percentile is very simple. In Loadium,  you can view percentiles in Summary Report after running a load test.

performance monitoring

Percentiles are also great for performance tuning. For example, let’s assume that generally something within your application is too slow and you need to make it faster. In this case, you need to focus on bringing down the 90th percentile. This would ensure that the overall response time of the application goes down.

In another example, let’s say you have major outliers that you need to focus on bringing down response time for transactions beyond the 98th or 99th percentile.

You might have seen many applications that have perfectly acceptable performance for the 90th percentile, but worse magnitudes within the 98th percentile.

We could not make the same observations with averages, minimum and maximum, but with percentiles they are very easy indeed.

CONCLUSION

Averages are inefficient, because they are too simplistic and one-dimensional. Percentiles are a really great and easy way of understanding the real performance. They also provide a great basis for automatic baselining, behavioral learning and optimizing your application with a proper focus.

In short, percentiles are great!