Many people enjoy splitting testing up in a myriad of test types: Acceptance Tests, Functional Tests, Integration Tests, Performance Test, Technical Tests, Unit Tests. I have myself been guilty of such terminology as “embedded integration tests” and “requirement tests”. However, what unites the tests are more important than what divides them. The divisions are fuzzy, and they should be.
All tests have but two purposes: To tell you if you’ve completed a new requirement, and to ensure that you haven’t broken something that worked. There are three fundamental properties of a good test suite: Coverage, Robustness and Speed.
The properties of a good test suite
Coverage: I use the term coverage with some apprehension, as it has an existing and problematic definition. Test coverage to most people means line and/or branch coverage. That is: How many percent of your code is executed when you run the test suite? This metric can be misguiding, and it is probably not the goal you want. Instead, I propose a different definition of coverage: Coverage is the percentage of bugs you introduce into your code that are detected by your test suite. Stated in a different way: The higher number of false positive tests when you change your code, the lower your coverage.
Improving your line or branch coverage may or may not improve the chance that your test suite catches a defect. It may or may not be a good investment of time to improve how safe you are. Chances are that if your line coverage is over 70 %, there are better things to spend your time on than improving it further. And those things may in fact improve your line coverage as a result.
Robustness: The problem with the easy focus on line and branch coverage that tools give us is that it tends to hurt other characteristics of a good test suite. If you add a test to make sure that all the internals of your system are tested, chances are good that this test can break because of a non-destructive change. I’ve found that teams with high test coverage always seem to run into the problem of the Fragile test.
The fragility of a test suite can be described as the number of changes that breaks a test even though they did not introduce a defect. Stated in a different way: The higher number of false negative tests you have when you change your code, the lower your robustness.
You make tests more robust by testing the outcome and not the mechanism. Incidentally, I have found the mock objects seem to make my tests more fragile.
Speed: So a test breaks. What happens now? Presumably, you try to isolate the behavior that breaks, maybe by running a smaller suite of tests. Then you make some changes to the code (if there was an actual bug) or the test (if it was a false positive), you run the failing test again to check whether the problem is fixed. Repeat until done. Then you run the whole suite again to check that you didn’t introduce some other problem.
There are a few critical thresholds for tests when it comes to execution time. More than about 2 seconds, and I check my email. More than 10 seconds, and I try to respond to email. More than 20 seconds, and I start working on two tasks in parallel. More than 1 minute, and I go for a cup of coffee. Each of these secondary effects are ten times as time consuming as the test.
This means: Let’s say I introduce a bug where I need 5 attempts to fix it, and I introduce another bug that I detect when I run the full suite and that takes another 2 attempts to fix. So I run the full suite first, after fixing the first bug and after fixing the third bug. And I run a single test about 7 times. If running a full suite takes a minute and running a single test takes 10 seconds, this will have taken me 3 times * 1 minutes * 10 minutes for coffee + 7 * 10 seconds * 100 seconds to answer an email = 2500 seconds or about three quarters of an hour. If running the a tests took 1 second (no interruption) and running the suite takes 10 seconds (I’ll watch it for that long), the test time will be less than a minute. But I didn’t get to write those three emails.
The universal test
The suggested difference between an integration test and a unit test is the time it takes to run the test. The difference in running time is caused by the fact that an integration test has more setup and more realistic infrastructure. However, we usually want to test the same scenarios.
I would like to submit to you, gentle reader, that it is not only possible, but quite feasible to write a test that can be used both as a “unit test“, running with fast, in-memory implementation, and as an “integration test“, using the target infrastructure. This achieves the goals of high coverage, good robustness, and the right speed, by focusing on what the system is supposed to do, and using the infrastructure setup as a point of variation.