CheckCell Detects Bugs In Spreadsheets
Written by Janet Swift   
Monday, 27 October 2014

A free Excel  plug-in that takes a new approach to automatically finding data errors in spreadsheets has been released. Rather than looking for errors in programs and formulas, CheckCell sets out to debug the underlying data.

The tool, called CheckCell comes from Daniel Barowy and Dimitar Gochev, computer science doctoral students at the University of Massachusetts Amherst. It was launched at this year's international computer programming language design conference, OOPSLA.

It addresses the problem that spreadsheet data errors can be consequential referring to the story we reported in April 2013, see Is Excel To Blame For Our Economic Pain?.

In his conference session Daniel Barowy told delegates:

"Consider the case of a paper written by Harvard economists Carmen Reinhart and Kenneth Rogoff a couple of years ago. The paper was influential, lending credibility to government austerity measures in Europe and the United States. But in 2013, UMass Amherst economist Thomas Herndon and colleagues found, in combing through the data by hand, that methodological errors undermined Reinhart and Rogoff's argument. In particular, Reinhart and Rogoff exaggerated the impact of key data values in a spreadsheet."

The CheckCell group, supervised by Prof Emery Berger. wondered whether software might be developed to find these kinds of errors automatically and successfully devised an approach that found a number of the same errors uncovered by Herndon.

The paper presented at OOPSLA refers to the approach as data debugging, which combines program analysis and statistical
analysis to automatically find potential data errors by locating data that has a disproportionate impact on the computation. 

Putting  it more informally Barowy explains:

"Basically, CheckCell identifies data points that have a big impact on the final result, even if the impact is super subtle and difficult to detect. CheckCell immediately flags data points that are very suspicious, the ones that deserve a second look. It's like having a helper who says, 'pay attention to these cells, they really matter.'"

The technique uses what Berger calls "a threshold of unusualness." CheckCell marks high-impact data points in red and asks the spreadsheet designer to check them. If they are indeed correct, they turn green and will not be flagged in subsequent analyses.

The example cited involves getting the digits in a data point transposed, so when a teacher has an “A” student who would be expected to get a 94 on a test and the spreadsheet says that student got a 49, CheckCell would flag it.  A similar example is used in the paper: 

 checkcelleg

Isn't this just range checking of the sort we have always told users to build into spreadsheets? 

I think it is fair to say it is a valuable extension of the idea that isn't as coarse grained as manual checks would be and by working from the unexpected outcome to the data brings a welcome degree of automation to the problem.

Berger says that in the future the team plan to extend CheckCell's use to large-scale data sets where errors that have a large impact are difficult to find. Having already shown that CheckCell would have uncovered some of the flaws in the Reinhart/Rogoff data, this seems a worthwhile endeavor. 

You can install CheckCell 1.0.2 for Excel 10 and later via GitHub and its source code is also available. gCheckCell, a version for Google Spreadsheet is also on GitHub.

 

Banner


Firefox 1.0 Released 20 Years Ago
10/11/2024

A news item with the headline "Firefox browser takes on Microsoft" from 20 years ago has attracted renewed attention. It was originally published on the BBC News website on November 9th, 2004 rec [ ... ]



Prompt Engineering Techniques To Make You An Expert
18/11/2024

Introducing a GitHub repository full of hot tips and instructions on how to build the perfect prompt presented in a collection of Jupiter Notebooks.


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 06 November 2014 )