Page 1 of 3
Mutation testing is a very special methodology utilized by developers for testing software quality. It is can amaze you, make you think you’ve lost your mind and, finally, can bring peace to your programmer's soul. This sounds quite bold, even pretentious, but after reading the rest of the article, in which we apply Mutant, an open source mutation tester for Ruby, you'll might just be convinced it’s true.
The mutation testing technique is based on a simple idea. Say you have a bunch of code and a number of tests to verify its correctness. It doesn't matter how those tests were born: using techniques like TDD or written afterwards. Mutation testing allows verification that your test suite is full. By full I mean – there is no code (code execution path, to be precisely correct) that is not covered with at least one test case.
Why do we have to measure test coverage after all? To be sure the program behaves as intended, to protect from regression failures, etc. How about program correctness? How do we have confidence the program works correctly on all valid inputs? It’s difficult to cover all program states and for some programs, it’s not possible at all. Consider the function next_char:
A quite simple function accepting a character and returning the next one in an ASCII table. To prove it's correct, it would need to pass every possible char out there, including the edge case for "\xFF".
A light change to the next_char (now it accepts the optional step parameter to specify how far to jump) function makes testing impossible:
To fully cover all input parameter space, we would need to pass every possible integer value for each possible char... This makes it clear why program correctness will always remain in the field of computer science. However, there are other kinds of metrics, letting you gain confidence that your code is actually correct.
There are test coverage tools out there for almost every language. However, most of them collect statistics only about source code lines, namely, whether particular line of source code was executed or not (and how many times). But in fact, there is more than just line coverage: just look at this slide.
C1 is intended to track code branches execution. Each source code line can potentially contain more than one code branch. Think about conditions, loops, early returns and nasty things like try operator. To satisfy C1coverage, tests should contain at least two cases, one for each of the execution branch. Otherwise, some branches may remain un-visited having, however, C0 coverage on this particular line satisfied.
C2 is called condition coverage. If condition expression consists of more than one sub-expression (for example: if a == 2 || b.nil?), it ensures that each sub-expression gets evaluated to true and false at least once.
A Real World Example
Enough with theory. As programmers, we love to get our hands dirty and see some code. Let's write an example program in Ruby to automate a simple workflow business process.
Workflow consists of many steps, each of which is configured with a threshold value. To proceed to the next step of the workflow, it's necessary to get the number of votes from applicable (according to the voting permissions) users. If a user has enough permissions, it's possible for him to force skip one workflow step. Every user with at least a voting permission can reject the current step of the voting process. Workflow will continue from the beginning of the previous step. Inactive users cannot vote or reject. A user can only vote once on the same step.
Here is one possible implementation of the described workflow:
(Click in code to expand in larger window or download successive versions of Workflow from the Codebin)
This code was written without any tests in mind, but looks... robust. All required business features are there: inactive users, duplicate votes, force approves, etc. So, let's test it! It would be a good idea to actually bring the example from the diagram to existence.
Running spec is indeed successful. Moreover, simplecov claims to have 100 percent code coverage!
This is where the story might end for a mediocre developer. One would think that since coverage indicates we are good, there is no work left to do. Let’s not rush here. Mutant to the rescue!
What? Only 64.71 percent? This is very sobering. What happened?
We specified mutation target ('Workflow' string in the command line), Mutant detected 7 mutation subjects (methods) in that class. For each subject, it has constructed an AST (Abstract Syntax Tree) of the code and tried to apply different mutations to it. Mutations are small code changes, for example: flipping a condition to its opposite, removing a whole line of code, changing constant value, etc. Mutations are easier to deal with working with AST instead of plain text, and that's why Mutant parses your code, applies mutation (new mutant is born) and then converts AST back to code and inserts it into your VM, to let rspec execute tests against it (attempt to kill the mutant). If all tests are passed, Mutant remains alive. Think about it: someone just deleted a whole line from your code, and tests are still passing. That basically means a test suite does not contain enough examples to cover all execution branches.
On the screenshot above, Mutant claims that if we remove condition check inside the vote method, tests will continue to work. Hard to believe? Let's verify: