Unit tests are ubiquitous in software development. Many organisations have test coverage metrics, with a concerted drive to have at least one test covering each line of code.
In fact, there is often more test code than actual business code, as we generally need to test a combination of execution paths in our tests.
This is currently the accepted wisdom – but is it correct?
For some organisations, if tests have saved even one error reaching production, that would justify all the effort. Examples of where this would be true include banking software, self-driving cars and space exploration software.
In many more instances though, developers are expending a huge amount of effort for diminishing returns. Let me be clear, I am a huge fan of unit tests; it’s the test coverage that I have issues with.
Consider making a cake as a metaphor, and ‘tasting’ as we go along is the ‘testing’.
Add flour. Test? No.
Add eggs. Test? No.
Add sugar and beat. Test? Yes, taste to see if it’s sweet enough.
Add milk and vanilla essence. Test? Yes, taste the dough and look at consistency (a different kind of test).
Pour into baking tin. Test? Yes, evaluate if the tin is filled to the correct level – based on experience.
Bake. Test? Yes, check if the cake is done by probing the middle of the cake. We’ll also be testing (monitoring) cooking time, and oven temperature.
How does this relate to software?
There were key points where we could mess up the recipe. We tested at each of those points, but no more. We also used different testing mechanisms at the different stages. The more experience we have with making cakes, the less we possibly taste test. But I’d bet that every baker checks the dough before it goes into the oven.
Could we apply this metaphor to our code? Are there key points where testing is really valuable? Absolutely, and this is why I dislike the brute-force approach of test coverage metrics. Just as there are key points for testing, there are also many steps that should not need tests.
In unit tests, we have taken a useful developer’s tool and turned it into an often-evil management metric for quantity over quality.
Humans bring value to complex processes through their ability to be creative, and to see patterns that can be exploited (code itself is a pattern). Computers bring value to processes by their astonishing ability to crunch enormous quantities of data and instructions in a short space of time – their brute-force capabilities.
When we force people to work like machines, in a brute-force manner, we diminish their creative spark. Every time I stop my flow of thought to write a test that I don’t need, I’m actually doing a worse job of problem-solving.
Software development changed when Kent Beck and Erich Gamma wrote JUnit and made unit testing easy. That free piece of software, so generously gifted to the development world, gave all of us the opportunity to test critical steps as we went along.
Sadly, software development is based on low trust. We write a lot of code assuming that things will go wrong, and that developers will make mistakes. Unit tests seem to offer a strategy to reduce risk. My concern is that our risk may actually be higher if developers ship software with the false confidence that their low quality tests will save them.
Test coverage is an overview metric. Overview is something that management can look at to see if you’re doing your job, by the definition of the metric. Overview metrics are notorious for corrupting processes. People start aiming at the metric, not the outcome that the metric was supposed to measure.
I’m not against measuring things. But they must benefit the team, and not be used to manage people. In unit tests, we have taken a useful developer’s tool and turned it into an often-evil management metric for quantity over quality. Quality is knowing which steps of the baking process must be tested. It’s knowing when to lick the spoon. Quantity has no such insight.
I’m very sure that no tests is a bad thing. There are always steps that would benefit by testing, and prove that we are on track. I like to be able to talk about those points of value, because that adds insights for the testers, developers and our users.
Isn’t it interesting that people follow others in such a lemming-like way? Test coverage is now well established as something that teams do, and no one is noticing that the test coverage emperor is naked. We think we are finding mistakes in code through our wonderful test coverage, but really we would have found most of those without the tests if we had just allowed our developers to be thinking and creative professionals.
We’ve dumbed down the process, and thereby our developers, and we’re all blindly chasing quantity over quality. Naked, I say, the emperor is naked!
Share