“How does the system behave under a certain load?”
“What kind of infrastructure do I need within acceptable values?”
“How far does this system take us with these resources?”
To get these answers, you must first be able to apply similar loads and see the current behavior of the system. The next step is analysis.
What Will We Analyze?
The analysis part is very thorough. Because at this point, we also need to know what to analyze and where to look. So what should we analyze?
Response time: System response time when called
Error rate: The percentage of error we encountered under certain load
I chose the term “call” here because the load we apply may not be just for a web page. It can be a rest API, Web Service, socket protocol, direct database call or even video Streaming.
Where Will We Look?
If a system starts to create problems on two values under load, it can be very difficult to address. For example, if we say that the response time is increased to 10 seconds with instant 500 requests, you should first look at the following points:
- Errors due to faulty coding
- Database response times, sql queries, connection pool configuration
- Application server configurations such as thread count or heap checks
- Sluggish IO movements
- Delays due to integrations
- CPU and Memory shortages
- Network problems
- Load balancer configuration
- Bandwidth shortage
- Configuration problems
It is possible to elaborate this list, but I have mentioned the most important ones in the order that should be looked at. Since I am a software-based person, I start by addressing code errors.
Profiling or Back-End Analysis
The hardest part of this process is tracking or reporting a running code. We can call this profiling or back-end analysis. Many software developers or test engineers have a really hard time managing this process. This situation is very natural; because they cannot watch or debug fast flowing code. However, there are too many components, and unfortunately, it is very difficult to capture from which component these analysis will be made.
For example, as a test engineer to a customer, “The system takes 5,000 orders instantly. It gets 10 seconds response time and 15% error rate. If you post a result saying “These are out of acceptable values” or even tell them which requests are slowing down; The answer your customer will give you is “I know that too. I ran into it during the last campaign period ”. The purpose of load testing is to reveal such problems, but the critical thing is to address and eliminate these problems properly.
In fact, there are APM (Application Performance Monitoring) tools developed for this. The most common ones are New Relic, Dynatrace, Appdynamics.
Although these tools were originally designed for application error and performance analysis, over time they also gained competence in areas such as client, system and network analysis.
You can add an APM as an agent at application, browser, operating system and network level. These agents collect existing behaviors and provide you an analysis.
For example; If you add such an agent to an application that you use MySql in the database, developed with Java on the back-end and Angular on the front, you can see which request takes how long and where it is stuck, how much it responds in total, slow Java code blocks and Sql queries. This agent works as a bytecode instrumentation and ensures that real transactions are processed.
I can say that this is the healthiest analysis method. While we are running a load test, we are trying to make these analysis in the back-end with one of the APM tools and try to give pinpoint feedback. As an example, “The order step is slow because the SQL running at order completion takes too long to respond. It needs to be optimized.” or “Login screens are slow because a web service call is made to this address on the Java side with login and it responds too late. No other problems with other calls ”. Thus, the load tests that do not give details are very hard to analyze.
Running Tests Correctly
It is very important to create the expected environment in order to analyze the application. Normally, you can do your analysis with APMs without the need for load testing, but it is critical to prepare in advance for a period like Black-Friday and to do a load test to see what you will encounter before a newly developed feature goes live.
At first glance, all solutions start with saving and running scenarios in vehicles such as JMeter and Gatling. Not wrong for the beginning, but not enough for later. While it is possible to run the written tests on a single machine with 500 requests, when a number of 5,000 is reached, they are no longer realistic tests. This affects the accuracy of the tests. Meanwhile, the maximum number of threads recommended on a single machine is 500.
My suggestion here is: very simple, run your load tests distributed. There are different tools available for this. My suggestion would be Blazemeter or Loadium. These two tools distribute load tests written with tools such as JMeter and Gatling in the cloud environment. In this way, you can run millions of tests instantly. For example, it enables you to run 50,000 instant tests with 100 machines, along with 500 instant tests from each machine. What you need to be aware of here is that these tests are run on the cloud. However, both solutions have an on-premise option.