LZ Compression And The One-Bit Catastrophe
Written by Mike James   
Monday, 17 July 2017

We use compression algorithms extensively to save both storage and time. The most popular algorithms are based on LZ dictionary compression and are used in GIF, Deflate, Zip, PNG. It was even named an IEEE Milestone. But we know little about them. One long standing question is, can adding a single bit to a file dramatically change the compression achieved? Surely not!

compress

LZ compression, specifically LZ 78, is of both practical and theoretical interest. It is one of a class of dictionary compression methods invented in 1978 by Abraham Lempel and Jacob Ziv.

A dictionary compression method is lossless and works by building up a dictionary of short sequences that can then be used to code the file in question. For example, if you had a file that was composed of nothing but 10,000  "A"s then the single dictionary entry "A" and the coding 10,000 would provide a very effective compression. In the real world, of course, files are made up of complex bit sequences but unless they are random some sequences repeat. The difficulty is finding them.

An optimal dictionary encoding gives an upper bound to the Kolmogorov complexity of a string. The Kolmogorov complexity is just the shortest program that will generate the string and clearly a compression algorithm (plus the dictionary) qualifies as a program and hence any other program that does the job has to be shorter.

With all this in mind, the question arises "can adding a single bit significantly change the compression ratio?". This is sometimes referred to as the "one-bit catastrophe".

"Suppose you compressed a file using your favorite compression algorithm, but you realize there were a typo that makes you add a single bit to the original file. Compress it again and you get a much larger compressed file, for a one-bit difference only between the original files. Most compression algorithms fortunately do not have this strange behaviour; but if your favorite compression algorithm is called LZ’78, one of the most famous and studied of them, then this surprising scenario might well happen. . ."

The question is as much about the nature of order and entropy as it is compression methods. You could rephrase this as "can a single bit significantly change the order in a bit sequence or string?"

The answer, revealed by Guillaume Lagarde and Sylvain Perifel at the Sorbonne, is that for infinite-bit sequences adding a single 0 to the start turns a compressible sequence into an incompressible one.

I guess you could say that a single bit can render order into disorder but that might be a bit theatrical.

For finite words the situation isn't quite as bad as the compressibility of a word with a bit added has to be close to the original word. But there are words that are easily compressible that are converted into words that are much more difficult to compress by adding a single bit.

This theoretical musing does have a practical consequence:

"This “catastrophe” shows that LZ’78 is not robust with respect to the addition or deletion of bits. Since a usual good behaviour of functions used in data representation is a kind of “continuity”, our results show that, in this respect, LZ’78 is not a good choice, as two words that differ in a single bit can have images very far apart."

 

compress

More Information

Lempel-Ziv: a "one-bit catastrophe" but not a tragedy

Related Articles

Data Compression The Dictionary Way

Kolmogorov Complexity

Information Theory

Non-computable numbers

Computability

Axiom Of Choice - The Programmer's Guide 

The Programmer's Guide To The Transfinite

Claude Shannon - Information Theory And More

Fractal Image Compression

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

 

Banner


C23 ISO Standard Is Here But You Probably Won't Read It
06/11/2024

At last ISO C23 has been published, but at $250 you probably aren't going to read it. Can we really tolerate this sort of profiteering on the work of others? This is worse than academic publishing!



AI Propels Python To Top Language on GitHub
30/10/2024

This year's Octoverse Report reveals how AI is expanding on GitHub and that Python has now overtaken JavaScript as the most popular language on GitHub. The use of Jupyter Notebooks has also surged.


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Monday, 17 July 2017 )