COVID Results Skewed By Faulty Data Import
Written by Alex Denham   
Monday, 05 October 2020

The official number of coronavirus cases in the UK has been under-reported by 16,000 during recent days - because of a data import error. In addition to the figures being skewed, people who had tested positive weren't notified, meaning their contacts also went unnotified.

Public Health England, a UK governmental department, said that 15,841 cases between 25 September and 2 October were left out of the UK daily case figures. The missing cases were added back at the weekend, causing an apparent spike in case numbers.

corona

The problem has now been resolved, according to Public Health England. Their interim chief executive Michael Brodie said that a "technical issue" was identified overnight on Friday, 2 October in the process that transfers Covid-19 positive lab results into reporting dashboards. This was caused by some data files reporting positive test results exceeding the maximum file size.

News outlets and social media have reported that the problem arose when an Excel spreadsheet reached its maximum file size, meaning no further rows could be added. This scenario has the results from labs carrying out Covid tests automatically entering the figures into spreadsheets, then those spreadsheets being sent to a central PHE facility to be collated. Because Excel spreadsheets are limited in the maximum number of rows, while CSV files aren't, if a CSV file is opened the data values beyond the Excel maximum are truncated.

If that was the case, it would be quite shocking that a government department was trying to run a major data analysis on a spreadsheet. I'm not saying it wouldn't happen and doesn't happen, but for something of this magnitude?

A (hopefully more likely) view is that what actually happened was a script to import CSV data into something other than Excel timed out. The sources reporting this say the fix was simply to set the timeout parameter to something suitably massive. The Press Association reports that the data files have been split into several smaller subfiles to overcome the problem. Whichever version is correct, the problem shouldn't recur.

Either way, it's a reminder to developers everywhere. Error trapping and reporting can make the difference between a private aargh, let's run that again', and far-too-public reproaches.

corona 

More Information

Public Health England Website

Related Articles

What Skills Do Data Scientists Need

Programmer's Guide To Theory - Error Correction

End Manual Data Entry in Excel - Thanks AI!

Excel Adds New Data Types 

John Conway Dies From Coronavirus

Fighting Coronavirus At Home With Exascale Power

Smartphone App Borrows Power For Corona Virus Research  

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Ai-Da's Portrait of Alan Turing At Auction
01/11/2024

Sotheby's Digital Art Day Action, now underway, features a large-scale portrait of  Alan Turing created by Ai-Da, the humanoid robot artist whose work, including this canvas, was exhibited at the [ ... ]



Sequin - Open Source Message Stream Built On Postgres
31/10/2024

Sequin is a tool for capturing changes and streaming data out of your Postgres database, guaranteeing exactly once processing. What does that mean?


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 05 October 2020 )