Human Genes Renamed To Please Excel
Written by Janet Swift   
Friday, 07 August 2020

More than two dozen human genes have been renamed so that they can be typed into a spreadsheet without being formatted as dates. New guidelines for standardized gene naming explicitly allow for renaming genes to avoid problems with data handling.

HGNCbanner 

The human genome has tens of thousands of unique genes - originally it had been assumed to be more than 100,000 but this number has subsequently been revised downwards. Giving each individual gene a meaningful name is seen as important to facilitate effective communication and the fact that some genes have had to be renamed on account of Excel has attracted a great deal of attention.

It was the Verge that initially carried this story, alerted by a tweet that drew attention to this extract from the newly published  Guidelines for human gene nomenclature:

 

HGNC

The Verge outlined the problem with:

when a user inputs a gene's alphanumeric symbol into a spreadsheet, like MARCH1 -- short for "Membrane Associated Ring-CH-Type Finger 1" -- Excel converts that into a date: 1-Mar. This is extremely frustrating, even dangerous, corrupting data that scientists have to sort through by hand to restore. It's also surprisingly widespread and affects even peer-reviewed scientific work. One study from 2016 examined genetic data shared alongside 3,597 published papers and found that roughly one-fifth had been affected by Excel errors.

Elsepeth Bruford, coordinator of  the HUGO Gene Nomenclature Committee, revealed to The Verge that so far the names of some 27 genes have been changed and she noted that while there has been some dissent about the decision, it was easier to rename human genes than it was to change how Excel works.

In fact, HGNC had initially tried to change the way that geneticists used Excel and last year posted a YouTube video that showed how to enter data in Excel in order to avoid it converting gene names to dates:  

So, by changing gene names, are the geneticists now caving in when they should be asking Microsoft to fix the date formatting issues, which annoy other groups of users as well? 

The consensus both among those commenting on the Verge's article and on Hacker News which linked to it, is that eliminating names that contain dates is a sensible move. This is because Excel is a useful tool for scientists across all disciplines to work with data and that while it is possible to "tame" Excel's autoformatting this isn't foolproof, especially if you want to share spreadsheets with other users who have their own formatting options.

To us, it seems that this is the biggest case of the tail wagging the dog we have encountered in some time. I make you wonder what would have happened if Excel has wielded such power in former times? Perhaps e=mc2 would have been E1=M1*C1*C1 or quark might have been autocorrected to quart.

 

excellogo

More Information

Guidelines for human gene nomenclature

Related Articles

Calculating with Dates in Excel

Dates Are Difficult

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.

 

Banner


$200,000 Data Science Competition
18/09/2020

The newly launched C3.ai COVID-19 Grand Challenge is an international competition that invites participants to create new and innovative software solutions that provide data-driven insi [ ... ]



Beating Vulnerabilities in Open Source Code
31/08/2020

Open source downloads are on course to reach 1.5 trillion in 2020, an all-time high. At the same time the incidence of cyber attacks actively targeting open source software projects has increased [ ... ]


More News

graphics

 



 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Friday, 07 August 2020 )