Automatic Testing - Programmers Are Still The Problem

Written by Alex Armstrong

Friday, 16 June 2017

White box test generator tools can help you find bugs automatically by generating combinations of inputs that are likely to give the wrong answer. However there is a problem - how do you know when you have the wrong answer?

A recent study by David Hon and Zoltan Micskei of Budapest University of Technology and Economics puts automatic testing under the spotlight.

White-box test generator tools rely only on the code under test to select test inputs, and capture the implementation’s output as assertions. If there is a fault in the implementation, it could get encoded in the generated tests. Tool evaluations usually measure fault-detection capability using the number of such fault-encoding tests. However, these faults are only detected, if the developer can recognize that the encoded behavior is faulty.

They carried out a study using 54 grad students and two open source projects and the IntelliTest Tool which is still better known as Pex. The tool generated test cases and the subjects were asked to identify when the output was indicative of a fault. The results were some what surprising.

The results showed that participants incorrectly classied a large number of both fault-encoding and correct tests (with median misclassication rate 33% and 25% respectively). us the real fault-detection capability of test generators could be much lower than typically reported, and we suggest to take this human factor into account when evaluating generated white-box tests.

You can see the confusion matrix for the NBitcoin project's tests:

errorgrid

FP=False Positive, FN=False Negative, TP=True Positive, TN=True Negative

What this means is that in practice white box testing is subject to programmer errors in its own right. By examining videos of the subjects trying to interpret the tests the researchers note that they tended to use debugging methods to further explore the code and clarify the meaning of the test. An "exit" survey also indicates the sorts of difficulties the subjects felt they had with the task:

“Deciding whether a test is OK or wrong when it tests an unspecified case, e.g. comparing with null, or equality of null.”
“Distinguishing between the variables was difficult, e.g between assetMoney, assetMoney1, assetMoney2.”
“Tests should compare less with null and objects with themselves.”
“I think that some assertions are useless, and not asserting ’real problems’, just some technical details.”
“Generated test cases doesn’t separated into Arrange, Act, Assert and should create more private methods for these concerns.”
“Generate comments into tests describing what happening.”

The key finding is:

The implication of the results is that the actual fault-nding capabilities of the test generator tools could be much lower than reported in technology-focused experiments.

Of course it would be better if the programmer could be taken out of the loop, but this would require the white box tester to work out what the result of the test should be and this would require a lot of AI.

More Information

Classifying the Correctness of Generated White-Box Tests: An Exploratory Study

Code Digger Finds The Values That Break Your Code

Code Hunt - New Coding Game From Microsoft Research

Debugging and the Experimental Method

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Gleam 1.7 Improves Performance
09/01/2025

Gleam 1.7 has been released with faster record updates and more secure package manager credential handling. Gleam is a statistically typed-language the compiles to Erlang or JavaScript.

+ Full Story

ELIZA Makes a Comeback
19/01/2025

ELIZA was the world's first chatbot, more than 50 years before the term itself was coined. She was the brainchild of Joseph Weizenbaum who wrote the original program in the 1960s at MIT. Th [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Friday, 16 June 2017 )

More Information

Related Articles

Comments