Search

Sunday, 7 September 2014

Mutation Testing Overview

Mutation Testing Overview

Introduction

It's always good to have the entire application code covered with tests. Also, it's nice to have some tracking on features we implement and test. All this stuff provides overall picture of actual behaviour correspondence to expectations. That's actually one of the main goals of testing. But there are some cases when all the coverage metrics don't work or do not show the actual picture. E.g. we can have some test which invokes some specific functionality and provides 100% coverage by any accessible measure but none of the tests contains any verifications. It means that we potentially have some problems there but nothing is alerted about that. Or we may have a lot of empty tests which are always green. We still may obtain coverage metrics to find out if existing number of tests is enough for some specific module. But additionally we should make sure that our tests provide good quality of potential problems detection. So, one of the ways to reach that is to inject some modification which is supposed to lead to some error and make sure that our tests are able to detect the problem. The approach of certifying tests against intentionally modified application is called Mutation Testing.

Main feature of such testing type is that we do not discover application issues but rather certify tests for errors detection abilities. Unlike "traditional testing" we initially know where the bug is expected to appear (as we insert it ourselves) and we have to make sure that our testing system is capable to detect it. So, mutation testing is mainly targeted to check quality of the tests. The above examples of empty test suites or tests without verifications are corner cases and they are quite easy to detect. But in real life there's interim stage when tests have verifications but they may have some gaps. In order to make testing solid and reliable we need to mitigate such gaps. And mutation testing is one of the best ways to detect such gaps.

In this article I'll describe main concepts of mutation testing as well as describe potential ways to perform this testing type with all relevant proc and cons.

Main Definitions

Mutation testing has its' own specific terminology we should know about. The below definitions are one of the core terms of mutation testing. So, let's define them for future use.

TermDefinition
Mutation in current context it is single intentional application under test modification which is targeted to make sure that at least some small group of tests can detect the impact caused by such change.
Equivalent Mutations changes to application under test which do not impact actual functionality. Actually it means that the modification applied to the application is equivalent to the modified part. As the result we can get wrong interpretation of results
Killed Mutant the mutant which was detected at least by one test
Alive Mutant the mutant which left unnoticed by any test performed. Such mutant is similar error situation as a bug in "traditional" testing

Mutation Classifications

Each mutation processing can be done in different ways depending on many different factors like:

  • The type of mutation itself
  • The way the mutant is generated and area it's applied to
  • The way of tests selection to verify against mutant
  • The way the mutant is injected into application under test
  • The way the processing goes in case of mutant detection
Let's describe all the above factors in more details.

By Kinds of Mutation

Value Mutations

these mutations involve changing the values of constants or parameters (by adding or subtracting values etc), e.g. loop bounds - being one out on the start or finish is a very common error.

Example:

Before Mutation:

OFFSET = 2
value = parameter + OFFSET
After Mutation:
OFFSET = 2
value = parameter - OFFSET
Or:
OFFSET = 1
value = parameter + OFFSET

Decision Mutations

this involves modifying conditions to reflect potential slips and errors in the coding of conditions in programs, e.g. a typical mutation might be replacing a > by a < in a comparison.

Example:

Before Mutation:

if value < 100 {
    value++
}
After Mutation:
if value > 100 {
    value++
}
Or:
if value <= 100 {
    value++
}

Statement Mutations

these might involve deleting certain lines to reflect omissions in coding or swapping the order of lines of code. There are other operations, e.g. changing operations in arithmetic expressions. A typical omission might be to omit the increment on some variable in a while loop.

Example:

Before Mutation:

x = 100
value = x + 20
x += 50 
After Mutation:
x = 100
x += 50
value = x + 20

By Mutant generation

Defines where the classes are analysed and mutants created.

Source Code

The mutation is applied to source code. After that the application is re-built and started.

  • Advantages
    • Simplicity - everything starts from it. All you have to do is to modify some part of source code (which is very likely a text representation) and run application after such changes. Everything you operate with is accessible in readable format.
    • A large range of mutations can be generated this way - this is the result from previous point.
    • Mutations can closely mimic the types of error a programmer might make - and there's the reason why. A lot of problems appear due to improper operator used or improper condition or some other changes to the code we make. But this time all the stuff is done automatically.
    • The mutations made can be clearly described and understood - since we're operating with text representation we can clearly define what we change.
    • Wide applicability - this approach is applicable to many different languages including scripting languages which do not require compilation into machine-specific structure.
  • Disadvantages
    • Generating mutations in this way is relatively slow - most of time losses are due to necessity of compilation
    • Mutants must be written to disk, limiting the methods by which they can be inserted
    • In theory a mutant class could be accidentally released - this is normally resolved by preparing separate workspace for mutation testing but in general there's definitely risk like that.

Byte Code

The mutation is applied to the byte code or some other compiled representation.

  • Advantages
    • Generally much faster - we simply don't have time losses for compilation
    • Can potentially create mutants without access to source files - if we communicate to the byte code directly we shouldn't care about source code at all.
    • Same mutation operators can in theory work for other languages - e.g. there're many languages which are based on Java or .NET. If we modify the byte code representation of application under test we shouldn't take into account too much language specifics. Thus, similar changes can be applicable to multiple languages.
  • Disadvantages
    • Applicability - we cannot apply this to scripting languages (simply no need of it)
    • Difficult to implement - the approach implementation requires knowledge of internal structure of compiled modules which is not really visible outside the source code.
    • Errors representativeness - the mutants generated this way harder correspond to real life conditions where such error may be introduced. So, there's a risk to find the mutant which never happens in real life. And this may require time for analysis

By Test selection

in which tests are selected to run against the mutants

Manual

The selection is done manually. Normally it is done to check only some specific places which can be of the biggest interest by end users.

  • Advantages:
    • We're always aware of changes we introduce
  • Disadvantages:
    • Slow and hardly applicable for regular runs as it's still requires human interaction
    • Can only be used to determine the coverage of individual classes

Naive

Selection is done automatically by passing through all possible locations. The idea is simple: we try to insert mutation wherever we see appropriate exercising various combinations.

  • Advantages:
    • Already can be used on fully-automated basis
    • Provides high coverage
  • Disadvantages:
    • Slow as it goes step by step introducing even mutations at the places we're not interested in
    • There may be time losses chasing mutations which are not covered by tests while we can get the same information by simple code coverage analysis

Convention Based

Selection is done automatically based on some specific conventions. Normally similar convention is used for unit tests coverage when using tests are created and named based on class under test. Thus the structure of such tests replicates the structure or components under test. So, if we have such convention we can use it for selecting tests for mutation testing.

  • Advantages:
    • Faster than naive approach as it's more targeted to areas where mutation occurs
  • Disadvantages:
    • Not all tests which actually cover mutated code can be involved
    • Still there may be problems with mutants which are not covered by tests
    • Works badly when conventions are not so stable or applicable (e.g. it's hard to set the correspondence between code and integration/system tests which involve areas of functionality)

Coverage Based

Tests are selected based on code coverage analysis. In other words, for each specific mutation we select only those tests which cover the changed code part.

  • Advantages:
    • Faster than any above approaches as it narrows down the number of tests to run. Only tests which cover mutated code are executed
    • Gives clear picture of the entire test coverage in combination to code coverage
  • Disadvantages:
    • Requires code coverage analysis in addition to main activities
    • Generally approach is more complicated

By Mutant insertion

Identifies the way how mutations are inserted into the target system

Naive

Each mutant is generated and each time instance of application under test is started from the scratch. It can be either by making changes to source code with re-compilation or in memory. The main thing is that each time we start new instance.

  • Advantages:
    • Reliability - this method works everywhere
    • Mutants will be active during the construction of static state (singletons, static intializers etc)
  • Disadvantages:
    • This method is relatively slow as application under test requires to be restarted each time

Mutant schmeta

Main idea that there's some generated class which contains all mutants which are then enabled programmatically.

  • Advantages:
    • Relatively fast in comparison to naive approach
    • Reliably works everywhere
  • Disadvantages:
    • Mutants will NOT be active during the construction of static state (singletons, static intializers etc)
    • The approach itself is more complicated that naive

Debugger hotswap

In some cases there's possibility to access the debug information to insert changes we need. At least this is applicable to variable values. So, all potential mutations are held in memory and inserted using debugger API

  • Advantages:
    • Potentially performance should increase
    • No risk of accidental release with mutants
  • Disadvantages:
    • Performance. Depending on implementation there may be either advantage or loss of performance
    • Mutants are not active during static state construction
    • There may be problems with such API support

Instrumentation api

Similar to debugger hotswap but any mutant is applied using instrumentation API

  • Advantages:
    • Fast
  • Disadvantages:
    • Mutants are not active during static state construction
    • There may be problems with such API support

Others

In addition to the above types there may be some others which are based on technology specifics. E.g. for Java code testing we can override class loaders to control the mutation insertion process. But these approaches are technology specific and are not a subject of this article.

By Mutant detection

in which the selected tests are run against the loaded mutant

Naive

Means that all planned set of tests is executed.

  • Advantages:
    • Gives wider picture of code coverage by tests checkpoints. Actually, you can see how many tests are affected with each specific mutation to see how many times you cover the same code part.
  • Disadvantages:
    • Slow as we have to run all tests each time while only some of them are really the ones which are of interest

Early exit (coarse)

Test classes are run until any of them finds the bug.

  • Advantages:
    • Faster than naive approach as it runs no longer than it's needed to find first error
  • Disadvantages:
    • All tests within a class are run to completion so slower than a more fine grained approach

Early exit (fine)

  • Advantages:
    • Faster than above approaches
  • Disadvantages:
    • Some overhead required to split the test cases
    • Splitting tests out of classes may cause issues with some JUnit extensions

Side Effects of Mutation Testing

  • Stability testing - in most cases mutation testing actually leads to running the entire test suite multiple times in different combination. Thus we actually can make sure that our tests are stable and reliable. Also, all this time we're working with the system under test. Thus, we're getting additional information how the entire system works during some long period of intensive use
  • Potential bug-fixes definition - mutation testing is almost the only testing type which changes the quality of system under test by itself. In most cases it changes to worse (making existing tests to fail) however in some cases we can get some sudden surprise when mutation applied leads to some known problem fix. Well, the probability of this is really small and we shouldn't expect this to happen but anyway there may be surprise like that.
  • Coverage checks - eventually if we run our tests on different mutations we can see how good we are at code coverage in real. Anyway, the mutation testing results are other reflection of code coverage. The main difference is that now we also know how good we are at checking what we cover. Actually, we receive another coverage metrics like:
    • "checkpoint coverage" as the percentage of lines/branches which are really checked by tests. As other coverage metrics this value varies between 0 and 1 (or within the range from 0 to 100%)
    • "checkpoints per code line" which shows how many times each specific line is checked with tests. It can be calculated as ratio between total checkpoints affected to the number of lines of effective code. In particular it estimates how effectively we cover all the functionality with the tests. Ideally, this measure should be near 1 indicating that each line and each condition has at least 1 check per item. Of course, there should also be some values reflecting checkpoints distribution between different code parts and some other methods related to statistics.

Existing Systems Overview

NOTE
The information below is taken from official documentation or other similar sources. Also, some criteria used are not clearly applicable in some specific cases. That may cause incomplete or not really precise information to be provides. So, if you find some mismatches or some information which is missing here, please, let the author know

System NameTechnologyActive?Mutations supported
By Kinds of MutationBy Mutant generationBy Test selectionBy Mutant insertionBy Mutant detection
V
a
l
u
e

M
u
t
a
t
i
o
n
s
D
e
c
i
s
i
o
n

M
u
t
a
t
i
o
n
s
S
t
a
t
e
m
e
n
t

M
u
t
a
t
i
o
n
s
S
o
u
r
c
e

C
o
d
e
B
y
t
e

C
o
d
e
M
a
n
u
a
l
N
a
i
v
e
C
o
n
v
e
n
t
i
o
n

B
a
s
e
d
C
o
v
e
r
a
g
e

B
a
s
e
d
N
a
i
v
e
M
u
t
a
n
t

s
c
h
m
e
t
a
D
e
b
u
g
g
e
r

h
o
t
s
w
a
p
I
n
s
t
r
u
m
e
n
t
a
t
i
o
n

a
p
i
N
a
i
v
e
E
a
r
l
y

e
x
i
t

(
c
o
a
r
s
e
)
E
a
r
l
y

e
x
i
t

(
f
i
n
e
)
PIT Java 
Jester Java 
Simple Jester Java 
Jumble Java 
μJava Java 
JavaLanche Java 
CREAM C# 
NinjaTurtles C# 
Nester C# 
Visual Mutator C# 
Mutandis JavaScript 
AjaxMutator JavaScript 
Grunt JavaScript 
Mutant Ruby 
Heckle Ruby 
MutPy Python 
PyMuTester Python 
Nose plugin Python 

References

  1. Mutation Testing Systems for Java Compared  Java 
  2. Real world mutation testing  Java 
  3. How do you test your tests? - Mutation analysis of Java programs with PIT  Java 
  4. How do you test your tests? - Mutation analysis of Java programs with PIT  Java 
  5. Judy - A Mutation Testing Tool for Java  Java 
  6. μJava Home Page  Java 
  7. Introduction to mutation testing with PIT and TestNG  Java 
  8. The Major mutation framework. Easy and scalable mutation testing for Java!  Java 
  9. Joy of Coding... and mutation testing in Java  Java 
  10. Mutation Testing  Java 
  11. CREAM - CREAtor of Mutants  C# 
  12. NinjaTurtles - .NET mutation testing  C# 
  13. MSDN Magazine - Super-Simple Mutation Testing  C# 
  14. Nester: What is this?  C# 
  15. Simple Talk - Mutation Testing  C# 
  16. Visual Mutator - Visual Studio Mutation Testing Tool  C# 
  17. Mutandis - GitHub project  JavaScript 
  18. AjaxMutator. AjaxMutator - Related GitHub Project  JavaScript 
  19. SlideShare - Mutation Analysis for JavaScript  JavaScript 
  20. Grunt Mutation Testing. Related GitHub project  JavaScript 
  21. Mutant - GitHub project.  Ruby  . Related posts:
  22. Mutation Testing in Ruby  Ruby 
  23. Mutation Testing with Heckle  Ruby 
  24. Mutation Testing - Ruby Edition  Ruby 
  25. MutPy project site  Python 
  26. PyMuTester project site  Python 
  27. Python Mutant Testing (PyMuTester)  Python 
  28. Nose plugin for mutation testing  Python 
  29. Mutation in Python  Python 

1 comment:

  1. Hey nice matrix. Should also add Mutator to the mix. It's for JS, ruby and concurrent java: http://ortask.com/mutator/

    ReplyDelete