Introduction

There's one logical recursion I encounter with test automation. Test automation is about developing software targeted to test some software. So, the output of test automation is another software. This is one of the reason for treating the test automation as the development process (which is one of the best practices for test automation). But how are we going to make sure that the software we create for testing is good enough? Indeed, when we develop the software we use testing (and test automation) as one of the tools for checking and measure the quality of the software under test.

So, what about software we create for test automation?

On the other hand we use testing to make sure that the software under test is of acceptable quality. In case of test automation we use another software for this. And in some cases this software becomes complicated as well. So, how can we rely on non-tested software for making any conclusions about the target product we develop? Of course, we can make test automation simple but it's not the common solution. So, we should find some compromise where we use reliable software to check the target software (the system under test). Also, we should find the way to find out how deep testing can be and how we can measure that.

So, main questions which appear here are:

How can we identify that the automated tests we have are enough to measure quality of end product?
How can we identify that our tests are really good?
How can we keep quality control on our automated tests?
How can we identify if our tests are of acceptable complexity?

In this article I'll try to find out answers to many of those questions.

What tests are applied for?

Before started describing how we can measure our tests quality we should be able to identify what exactly we should measure or what our metrics should be based on. The below picture shows main artifacts tests are bound to:

There're 3 main artifacts we're operating with:

Requirements - any formal definition on how system under test should work. It can be either some dedicated document or set of descriptions or even simply information based on previous experience with similar systems. In any case, there should be any kind of description of how system should behave.
Implementation - set of source code and corresponding resources which implements all items defined in requirements
Tests - any form of instructions targeted to verify the correspondence between requirements and actual system under test behavior.

So, Requirements are the main source of expected behavior definition. Implementation is the actual reflection of requirements and tests are artifacts verifying that implementation and requirements are valid. So, tests can be bound both to requirements and implementation. Requirements are verified by playing different kind of scenarios at the system level while implementation directly is tested on code level and tests are rather bound to some specific code components than some functionality part.

Despite implementation is the reflection of requirements tests can be mapped not just to requirements but also to some separate part of implementation which is not strictly bound to any part of functionality. This may be related to various auxiliary utility code which is used across the project. It is used by various functional parts representing business logic but they're not dedicated to any of it. At the same time it's necessary to cover such utilities with tests to make sure nothing is broken after any change as such change may affect business logic implementation.

So, given all the above information tests cover requirements and they should be mapped somehow to them. In addition to that tests cover implementation modules and should be mapped to them as well. So, this is the basis to respond to the next question.

How can we identify that the automated tests we have are enough to measure quality of end product?

How do we cover requirements?

There's common practice for requirements coverage. This is Traceability Matrix. It normally sets correspondence between requirements and tests. In case of test automation it also sets correspondence to automated tests. So, this matrix can be represented with the table like:

Requirement ID	Test Case ID	Auto-test ID
REQ-1	TC-1	ATC-1
REQ-1	TC-1	ATC-2
REQ-1	TC-2	ATC-3
REQ-2	TC-3	ATC-4
REQ-3	TC-4	-
REQ-4	-	-

Cells in red indicate rows which aren't covered either by test cases or simply don't have automated test. This is still gap for test runs as they always show what's tested but not what's left non-covered.

In general case each requirement may have multiple test cases verifying different aspects of the requirement (e.g. positive/negative tests). Each test case may have multiple automated tests assigned especially when test case plays several scenarios.

With such scheme we can't get simple measure saying how good we are at requirements coverage, especially for automated tests. All we can use is just 2 separate (slightly relevant) measures:

Test Case coverage - the relation between requirements with test cases to the overall number of requirements. It can be reflected with the following formula:
RCOV_tc = R_tc/R
where:
- RCOV_tc - requirements coverage by test cases
- R_tc - the number of requirements covered by test cases
- R - overall number of requirements
Automated Tests coverage - it is the part of requirements covered by tests which have automated implementation. It can be reflected with the following formula:
RCOV_atc = RCOV_tc * TCCOV_auto = RCOV_tc * TC_atc/TC
where:
- RCOV_atc - requirements coverage by automated tests
- RCOV_tc - requirements coverage by automated tests
- TCCOV_auto - test cases coverage by automated tests
- TC_atc - the number of tests with automated implementation
- TC - total number of test cases
Overall Requirements Satisfaction Rate - the result we get after entire test set run showing which part of requirements are met at all. The formula combines previous values and looks like:
ORSR = PassRate * RCOV_tc * TCCOV_auto
Where:
- ORSR - Overall Requirements Satisfaction Rate value
- PassRate - is the relation between passed tests and entire number of tests executed
ORSR value is final measure and it actually indicates how good our system under test is. It reflects the portion of functionality which is covered by tests and works as expected. E.g. if ORSR equals 0.7 it means that 70% of entire application functionality is tested and works as expected.

How to make this measure more precise and simple?

The above measures have some distortions and inconsistencies due to following reasons:

Requirement is considered covered when at least one test is associated with it. But requirement can be too general and test may cover just some part of it
Test case is considered covered with automation when it has at least one automated test associated. If test case involves several scenarios where just some of them have automated implementation it still counts but the coverage number is not precise
Any coverage like this doesn't reflect possible cases which may happen due to technical implementation specifics

First 2 items may be fixed with proper split between requirements and tests as well as we can tightly link each items to each other so that changes in requirements may lead to changes in tests and their automated implementation. At the end we reach the proportion between requirements, tests and auto-tests to the value of 1:1:1. This can be reached with several major steps:

Requirements detalization

Each requirement is split to atomic item which requires just single check-point. In order to do better mapping between requirements and tests it's better to perform such split based on testing techniques used. Thus, we can identify range of valid ways, improper ways, border conditions etc. Once we have definition of expected behavior in all of those cases we already can make quite atomic and targeted tests. Thus, the above table will be transformed to something like:

Requirement ID	Test Case ID	Auto-test ID
REQ-1-1	TC-1-1	ATC-1
REQ-1-2	TC-1-2	ATC-2
REQ-1-3	TC-1-3	ATC-3
REQ-2-1	TC-2-1	ATC-4
REQ-3	TC-3	-
REQ-4	-	-

Map auto-test to test case

Make 1:1 correspondence between test scenario and it's automated implementation so that it can be tracked easily. Thus, we'll get matrix like:

Requirement ID	Test Case ID	Auto-test ID
REQ-1-1	TC-1-1	ATC-1-1
REQ-1-2	TC-1-2	ATC-1-2
REQ-1-3	TC-1-3	ATC-1-3
REQ-2-1	TC-2-1	ATC-2-1
REQ-3	TC-3	-
REQ-4	-	-

After such transformation the formula:

ORSR = PassRate * RCOV_tc * TCCOV_auto

shows more or less reliable results as the area we cover consists of granular requirement definitions covered with dedicated and yet granular tests.

But we still have untracked areas where we don't cover anything. When we run testing our results wouldn't include information about requirements coverage. We should always track requirements and their correspondence to tests. Generally, this stage is quite OK and a lot of projects stop here. But it doesn't mean that it's really maximum we can take.

Make test cases and automated implementation as a single unit

The idea is that each test case is created in specific form which can be read and interpreted automatically by some test engine which would run specific test instructions based on test case steps description. This leads us to Keyword-driven testing where each test case is the set of keywords processed by some automated engine. Thus, we collapse test cases and their automated implementation into single unit where test case itself is just input resource for the automated tests. After such transformation our table looks like:

Requirement ID	Test ID
REQ-1-1	KTC-1-1
REQ-1-2	KTC-1-2
REQ-1-3	KTC-1-3
REQ-2-1	KTC-2-1
REQ-3	KTC-3
REQ-4	-

After such transformation our initial formulas change a bit. In particular, the value of test cases coverage by automated tests becomes 100% or 1 by default, or in the form of formula:

TCCOV_auto = TC_atc/TC = 1

RCOV_atc = RCOV_tc

and then final ORSR value is now calculated as:

ORSR = PassRate * RCOV_tc

As it's seen from the table we still may have problems with incomplete or missing coverage items (highlighted in red in the above table). In this example we still have REQ-4 requirement item non-covered with any test but REQ-3 item is already shown as covered. But potentially some of steps would have no automated implementation. And this brings new major difference: test with incomplete automated implementation is now not just non-covered but it's failed. And this requires different attitude to such situation correction. Failed test requires fix. Also, we have advantage in maintenance. When we change the test scenario the automated implementation picks up changes immediately. So, now we don't have distribution between requirements, test cases and automated tests. We just have requirements and tests.

Make requirements executable

Previously we've done unification between tests and automated tests which collapsed table just to 2 columns and 2 major items: requirements and tests. But what if requirements are created the way that tests covering them are generated automatically in a form accessible for automated execution? This approach is called Executable Requirements. Thus, requirements are automatically expanded into test cases and test cases are expanded to automated tests. Eventually, we'll get representation like:

Requirement ID
REQ-1-1
REQ-1-2
REQ-1-3
REQ-2-1
REQ-3
REQ-4

Main remarkable thing in such approach is that all tests match to some specific requirement and it leads to the following:

RCOV_tc = 1
ORSR = PassRate * RCOV_tc = PassRate

and then final ORSR value is now calculated as:

ORSR = PassRate

It's also clearly seen in the previous table where red-highlighted cells represent failed requirements or requirements which weren't met. Thus, the % of passed tests explicitly indicates the % of requirements met. Thus, we represented the entire requirement satisfaction metric with a single measurable value of tests pass rate.

How do we cover implementation?

All the above was related to binding requirements to tests. But we didn't covered implementation at all. In some cases we may have some implementation parts which aren't covered by any requirement or some specifics which is not detailed in requirements but exists in the code.

Why it is important? OK. Let's just keep ORSR metric and use only it. In this case we may have 100% coverage even when all tests are empty so that they don't do anything. So, in order to prevent such situation we should also take into account the code coverage metrics indicating that each specific code item is invoked at least once during tests run.

Mainly we can take line and branch coverage values as the most frequently used. We can also use class and function/method coverage but that would actually be another reflection of line coverage metric. Also, we can involve some more complicated coverage metrics but it's a matter of separate chapter. For now we'll just take the most frequently used metrics. So, the Overall Code Coverage may be calculated as multiplication of all independent coverage metrics. Since, all coverage metrics show values from 0 to 1 (or from 0% to 100%), the final value would also fit this range. So, the formula is:

OCC = CCOV_line * CCOV_branch

where:

OCC - overall code coverage as integrated measure or code coverage
CCOV_line - code line coverage
CCOV_branch - code branch coverage

Now, we can combine this with Overall Requirements Satisfaction Rate to combine both requirements and implementation coverage. Let's name this unified metric as Overall Product Satisfaction Rate (OPSR) the unified coverage of requirements and their implementations which also can be interpreted as Overall Product Readiness. It is calculated as:

ORSR = PassRate
OCC = CCOV_line * CCOV_branch
OPSR = ORSR * OCC = PassRate * CCOV_line * CCOV_branch

All the above metrics can be received automatically from test run and code coverage reports. The value itself can be interpreted as % of product readiness for use as we measure how it fits the requirements and correlate it to the rate of how do we cover the actual implementation.

Is that enough?

No. Despite we the coverage we measure is already complex and covers different aspects of system under test there are still gaps which may lead to inconsistent and wrong interpretation of results. One thing which is left non-covered here is the tests themselves. Next paragraphs will describe this moment in more details.

How can we identify that our tests are really good?

When tests can be bad?

Let's take a look at some small example of requirement, its' implementation and test covering it to see why OPSR metric is not enough to say that system under test is of good quality. Let's say we have some system which has the requirement that states:

Subtraction: for the given input A and B the result C is received as C = A - B

Let's assume we already described all necessary details regarding input format, acceptable values and we already have tests for all those parts. Now we're concentrated on the operation itself. The implementation of it may look like:

double subtract(double a, double b) {
    return a + b;
}

And now let's assume we have test which covers the implementation:

void testSubtract() {
    subtract(2, 3);
}

Firstly, not that the implementation sample uses + operation which is opposite to subtraction. But also notice that we have test which simply invokes the operation without checking the result. If we measure overall coverage we'll see that test covers all lines of implementation, it also covers the requirement. But you see that functionality is wrong and test doesn't detect that.

That's why the quality of our tests must also be estimated.

How can we detect that test is good?

There are several criteria indicating that each specific test is of good quality:

Test does what it's supposed to do - sometimes test is designed for one thing but actually checks something else. This may happen either for bad (mistake during automation) or good (result of test case update without automation implementation changes) reason. Nevertheless, we should be able to control such situations;
Test operates with valid data - when we design our tests we should make sure that we use proper input and proper expectations for the output. In some cases we may operate with improper data or we may set improper results as expected (especially during test automation when some people are more targeted to make all tests pass assuming the data is correct rather than verifying data consistency).
Test has sufficient number of check points - it is very frequent case when our tests have some check points but they are not enough to check all output items in the entire output. So, we should make sure that our tests may detect any potential errors in output results;
Test fails if the functionality under test is inaccessible or changed at all - obviously if system under test doesn't work at all the test interacting with this system should fail. Or if we replace working module with something that doesn't work there should be at least one test which can detect that something goes wrong;
Test is independent - test runs the same way both separately or in any combination of other tests. So, it's independent to other tests. This is important as a lot of test engines (like any of xUnit engine family or similar) do not give any guarantee regarding sequence of tests to be performed. Additionally, we may need different set of tests running for different situations. And finally, if there's a test which depends on results from another test isn't that more correct to treat those 2 tests as one?
Test runs the same way multiple times with the same result - each test should be predictable and reliable. At least it is useful to be able to reproduce the situation which happened during tests run.

So, what are the methods which may assure the above items? Some of them are:

Review - the most universal way of tests quality confirmation at least because it can be done anywhere and can be applied to the widest range of potential problems. At the same time it's one of the most time consuming way and it doesn't mitigate human factor.

Cross-checks - some tests may be designed the way that they make actions which produce similar or comparable results. So, additionally we can make some reconciliation of results by comparing relevant operations.

 
Example:
Imagine we have some module supporting 2 operations:
Operation 1: add(a, b) = a + b
Operation 2: mult(a, b) = a * b

We may add some tests verifying their functionality separately:

Test 1:
Expression add(a, b) = c is valid for a, b, c
where
| a | b | c |
| 1 | 1 | 2 |
| 2 | 0 | 2 |
...

Test 2:
Expression mult(a, b) = c is valid for a, b, c
where
| a | b | c |
| 1 | 1 | 1 |
| 2 | 0 | 0 |
...

At the same time the above operations are relevant and multiplication can be expressed by addition, e.g.: 2 * 3 = 2 + 2 + 2 (3 times add 2).

Resource sharing across independent teams - it's rather process item which means that input data and automated test implementation are done by different people independently. When 2 people doing work in the same direction but from different sides and their results are matching it increases probability that they do properly. At least it avoids the risk of adapting data to test from implementation side and at the same time strictly controls the data definition. There may be several examples of resource sharing:
- Input data for data-driven tests - test designer may prepare data sheet with inputs and expected outputs while test automation engineer may work on making common work flow based on some test samples.
- Keyword-driven or similar approaches - using this approach test designer creates test cases independently on implementation. At the same time test design and test automation here are separate activities. Thus, test does predictable actions with known and validated data.
Mutation Testing - the testing type which is based on artificial error injection in order to check how tests are good at detecting potential problem we know about. This approach is quite time and resource consuming but it can be fully delegated to machine.

All those approaches have different way and area of influence. Also, some of the above items are approaches, some of them are process items. So, it's hard to put all the above items into one place to see the entire picture. But the below diagram shows how each of the above items cover requirements, their implementation and tests for them:

As it's seen from the picture:

Review is something that can be applied everywhere, not just tests and actually it can cover almost all aspects of functionality and tests for it
Resource sharing and cross checks involve a bit all items to cover. Thus, we can make various cross-checks to verify consistency between requirements, we can make more detailed tests based on actual implementation as well as to verify consistency of our tests. But that's rather technical and process items and they are not applied everywhere
Mutation testing is targeted to cover tests only

What can we measure there?

Generally, most of the items listed in this paragraph are more about how to do things. But only one of them shows some measurable results and says what should be covered and what's already covered and how much. It's Mutation Testing and the metric we can get from this practice. This metric can be called as Mutations Coverage Rate and it shows how many of potential mutations we can inject into the system under test can be found out by tests. We'll code this value as M%.

Having this value calculated we may say how thorough do we check any system under test code line we invoke. Thus, this value actually compensates the results given by Overall Code Coverage metric we've received before. But before we also combined Overall Code Coverage characteristic with requirements coverage and got joint Overall Product Satisfaction Rate value. So, now we can get new quality characteristic named Overall Satisfaction Rate indicating our assurance on requirements and implementation covered. This value can be calculated as:

OSR = OPSR * M% = ORSR * OCC * M%

Getting back to our requirements/implementation/tests relationship, all the metrics we calculated before may be expressed by their coverage and involvement with the following diagram:

Just in order to summarize all the values we received we need to identify what questions each metric gives response to. They are:

ORSR - How many expectations are met at all?
OCC - Which part of the entire application code we invoked?
OPSR - Did we check all capabilities of our system for expectations satisfaction? If not, which part of the actual system under test meets expectations?
M% - How many potential problems do we cover and ready to detect with our tests?
OSR - Which part of the actual system under test we are sure meets expectations?

Thus, we can measure the most essential things related to the coverage and quality of our test verifications:

We can detect and measure which requirements are covered good enough and which require more tests
We can detect and measure which functionality wasn't implemented (non-covered requirements)
We can detect and measure what tests require more check points

So, with the above metrics we cannot make any cheats with empty or incomplete tests, partially covered requirements, partially covered implementation. We've received metric which involves all.

Is that enough?

No.

Firstly, the above metric is coverage-based and we actually used near 5 coverage metrics in it. But, for instance, the ISO/IEC/IEEE DIS 29119-4:2013 standard states near 20 coverage metrics which can be applied depending on different techniques we use. And yet, even if we integrate all of those metrics we still just minimize probability of leaving something non-covered as there always can be some coverage item which is superposition of already used items.

Secondly, it cannot be absolute quality metric as it doesn't cover such technical aspects like maintainability, testability and many other software characteristics (here is an example model for maintainability).

So, we always have a space for activity. But we have restricted budget and we always should think not just about absolute coverage but coverage of acceptable level.

How can we keep quality control on our automated tests?

What to test in tests?

This is another main topic of the chapter. Since automated tests are another form of software it should have similar practices applied. And testing shouldn't be an exception here. Logically, we should apply similar approach. But ... but subjectively testing for testing looks like an overhead. Imagine, we do testing for software, then testing for testing, then (if we still keep similar logic) testing for testing for testing and so on. It's insanity! We are not making software for the purpose of testing it. It is initial software which is the product we make but not the tests for them which are just targeted to simplify our lives but not making it more complicated.

What should we do here? The simplest way is to forget about such testing, everything works fine, I've checked that. Yes, we always can use excuse like that. But in this chapter I'm looking for some objective criteria stating that our testing solution is of appropriate quality. OK, before we've described entire way to measure system under test quality. So, now imagine out testing solution is that system under test and we should apply the same approach just ~~for lulz~~ to prove how our theory can be applied to some specific cases.

Overall automated testing solution structure can be represented with the following diagram:

Where:

Engine - the core driver of the system which is responsible for tests organization, execution, reporting and event handling. In some cases it's completely external module (e.g. any engine of xUnit family). In some cases this is something custom-written (even based on existing engine).
Core Library - the set of utility libraries and various wrapper functions which are still irrelevant to application under test but operate with higher abstractions than engine. Typically that can be various data conversion functions, some UI wrapper libraries, additional functions which are not specific to application under test but just made to minimize copy/paste
Business Functions - a set of functionality which reflects application-specific behavior and actually reflects actions to perform with system under test
Tests - final implementation of test scenarios

Here we have 2 major groups of items by their relation to application under test:

Technology-specific - the group of components which is not really bound to application under test and can be applied to similar applications or applications using similar technological stack
Application-specific - the group of components which reflect application under test functionality and cannot be used somewhere else outside of the application under test

How can this all be tested?

Each of the test automation solution structure component types can have individual approach for testing. But mainly testing can be applied the following way:

Group	Structure Component	Testing Approach
Technology-specific	Engine	There may be 2 major ways for testing this part: The engine is external software - it's normal case when we use existing engines. In this case all we can do is just to rely on existing functionality and/or use just parts we rely on The engine is our internal software written as a part of the project - in this case we should keep the engine as separate library, stored and maintained in separate location. In this case it's treated purely as the software component and we easily can apply unit, integration, system tests or whatever.
Technology-specific	Core Library	Since core library is also a kind of software which can be used outside specific project we can treat it as separate library and apply the same unit, integration, system tests to it considering that we're not bound to any specific application
Application-specific	Business Functions	Business functions are actually reflection of application under test functionality. So, tests themselves are some kind of unit, integration, system or whatever tests for all those business functions.
Application-specific	Tests	Normally each test is some kind of function which doesn't return any value and doesn't accept parameters (or at least we can expand it to that form in case of data-driven tests). The test result is either pass or fail depending on whether we encounter an error during execution or not. So, if we imagine hypotetical test which tests this test it would be a single instruction call without anything else. But it doesn't make any difference from normal test run. So, if we want to make tests for exactly tests we just need to make trial test runs on some test environments

From this table we can make several conclusions:

Lines highlighted with red show test automation solution components which do not require any additional tests to be created. Testing solution tests itself
Green highlighted lines reflect components which can be treated as separate software and we can apply all similar practices we use for testing our application under test. So, test solution components which are not specific to application under test can be treated as separate software components which should be tested separately.

Given the above we can conclude that the term tests for tests is not just wierd sounding but also it is something which doesn't exist as tests are testing themselves

NOTE:
Actually it's not really correct to say that any application specific functionality and resources do not require separate testing activities. There may be different cases. E.g. in one of my previous projects we used to practice tests verifying that our window definitions are up to date with current application. That was done for GUI-level testing and it was some kind of unit tests for such test type. But normally, if we talk about GUI testing there should be separate test which just navigates through different screens with minimal business actions and verifies that all controls which are supposed to be there actually exist. So, it doesn't break anything told above, it's more about proper interpretation of tests

NOTE:

Actually it's not really correct to say that any application specific functionality and resources do not require separate testing activities. There may be different cases. E.g. in one of my previous projects we used to practice tests verifying that our window definitions are up to date with current application. That was done for GUI-level testing and it was some kind of unit tests for such test type. But normally, if we talk about GUI testing there should be separate test which just navigates through different screens with minimal business actions and verifies that all controls which are supposed to be there actually exist. So, it doesn't break anything told above, it's more about proper interpretation of tests

How can we identify if our tests are of acceptable complexity?

Good. We know all what to test and how to detect when we have enough reliability level of our tests and all subsidiary components. Thus, we are not just confident about our system under test quality but also we're confident about quality of tools we use. But due to this confidence we shouldn't forget that our main goal is system under test development, not tests for them. So, if testing activities take more resources than actual development, well, probably there's something wrong with it. From technical side this problem may be caused by testing solution complexity. In order to control the situation and prevent such problem we may need to measure this complexity.

If we talk about code complexity we can use metric named Cyclomatic Complexity. For each function it shows the number of possible flows the function can be performed. There is common practice stating that each method/function should have Cyclomatic Complexity Number (further CCN) value less than or equal 10. If CCN is between 10 and 20 the method is moderately good. If higher the method is treated as non-testable. This is good metric to keep granularity of our code. But also, we can use it for complexity comparison between testing solution and application under test.

Complexity of tests

In previous paragraphs we've defined some criteria of good tests. And one of them sound like:

Test runs the same way multiple times with the same result

It means that each test has only one flow to pass. This can be reflected in CCN value:

CCN = 1

So, good test should have single flow at least at the highest level. Otherwise, we have to cover it with unit tests which is something we should avoid by proper test design. If we need to express this in a form of calculated metric we can operate with Tests Simplicity Rate (further TSR) which fit the following properties:

Each test has CCN >= 1
In the most ideal way all tests have CCN = 1
The more tests with CCN > 1 we have the less TSR value we have

Given the above properties we can calculate TSR value as:

TSR = TC_atc/∑ CCN(i) , i ∈ 0..TC_atc

Where:

TSR - tests simplicity rate value
CCN(i) - CCN number of test with i index
TC_atc - the number of automated tests

Alternatively, we can use this formula:

TSR = ∏ (1/CCN(i)) , i ∈ 0..TC_atc

In this form the entire TSR value goes to 0 faster in case of growing number of tests with CCN > 1.

With the above calculations we may express tests complexity with TSR value which has 100% rate when all tests have just one flow and value near 0 if tests are too complicated.

Complexity of subsidiary testing solution components

For subsidiary testing solution components like Engine or Core Library there's 1 major criteria of acceptable complexity: the subsidiary module should have less complexity that application under test. And this criteria is applied only for modules developed as a part of the project, so that e.g. we don't need to measure complexity of JUnit if we use it. But as soon as we write our custom extension of any JUnit class we should take this into account while calculating complexity.

For better comparison we can aggregate CCN numbers for all the code of system under test and the same values for subsidiary module. After that we may get Test Component Simplicity Rate (TCSR) value using the following formula:

TCSR = 1 - CCN_{Agg Test}/CCN_{Agg SUT}

This number can be even negative. Anyway, if it reaches 0 or below, the test solution is too complicated.

Is that enough?

No. The above characteristic was taken based on 1 factor value. But we can include much more to make measure more precise and visible. And main thing which should be of interest is the value which any testing effort bring. Everything spins around the value of it.

Where to go next?

In this chapter we've described several testing solution quality metrics which give us some visibility on how good we are with our testing. Eventually, we've managed to consolidate multiple metrics into one to give short and compact result. We may involve many other different metrics and consolidate them but we should always take into account the following:

No matter how many metrics we add there're always areas we can grow with. So, if we didn't reach the top we should expand our testing to reach it. If we reached the top, we need to find some other metrics.
We should always interpret results properly. 100% doesn't always mean perfect result
Any number we get should be used for the purpose. We should clearly understand what each number shows and what it doesn't

Thus, we'll be able to collect many other technical metrics. But what we also should concentrate on is the value we bring with all our efforts. This is much more visible part of our activity. But this is separate story.

Test Automation from inside

Search

Sunday, 21 September 2014

Measure Automated Tests Quality

Introduction

What tests are applied for?

How can we identify that the automated tests we have are enough to measure quality of end product?

How do we cover requirements?

How to make this measure more precise and simple?

Requirements detalization

Map auto-test to test case

Make test cases and automated implementation as a single unit

Make requirements executable

How do we cover implementation?

Is that enough?

How can we identify that our tests are really good?

When tests can be bad?

How can we detect that test is good?

What can we measure there?

Is that enough?

How can we keep quality control on our automated tests?

What to test in tests?

How can this all be tested?

How can we identify if our tests are of acceptable complexity?

Complexity of tests

Complexity of subsidiary testing solution components

Is that enough?

Where to go next?

No comments:

Post a Comment