“Build Verification” Stress Test in a nutshell
Before answering this let me point out that I used the word “stress” and not “performance”. That’s because performance degradation is only one of the potential finds of a good CI stress test, with errors being the other. So, before we talk about what makes a good one let’s talk about what doesn’t.
A test like you would use in regular pre-release load testing is not a good one for several reasons. These normally run for some period of time, up to an hour or longer, with ramp up, hold and ramp down times included. We need something more like the POST (power on self-test) your laptop goes through when you first turn it on – get as much done in the shortest amount of time possible, making sure there are no glaring errors or performance degradation issues before promoting the build (or bringing up Windows on your laptop in the case of the POST test).
“Engines on Blocks” methodology
The methodology I use is called “Engines on Blocks” because it’s modeled after the test harnesses that automobile engines are installed and tested in before putting them in a car at the factory. It’s not a lengthy test but it does rev the engine at 80 or 90 for a couple minutes to see if it sputters and dies. Even though only 1 or 2% fail at this stage, the cost of finding and fixing them here is much, much less than discovering it later down the line. Same goes for your application, and hence the need for a build verification test.
One thing that isn’t needed is think time – what is there to think about anyway? We are not trying to model user behavior we are trying to run a 3-to-5-minute stress using the minimum number of threads needed to rev the CPU up to 80, just like that car engine. And we should be testing a happy path through the core functionality of the application, not merely logging in, traversing the menus, and logging out (although even that is better than nothing).
Your CI or build verification test represents the velocity of your core business transaction, where milliseconds count. Since this test is run many times, it becomes easy to spot historical trends and to know exactly when performance degradation first begins to happen so it can be fixed before it degrades further.
Numerous challenges ahead
Running load tests as part of the CI test pipeline is challenging for a number of reasons. They don’t simply pass or fail the way functional tests do and since every application is different there are different criteria for what should cause a failure. All the usual requirements of a load test, including test data and test users, sill apply, including the state of the database itself, before and after the test. Backing out the added transactions to avoid leaving clutter behind is always a good idea.
Has anyone out there in our readers tried this at your company? If so, would you like to share your experience with the rest of us?