LZ Compression And The One-Bit Catastrophe |
Written by Mike James |
Monday, 17 July 2017 |
We use compression algorithms extensively to save both storage and time. The most popular algorithms are based on LZ dictionary compression and are used in GIF, Deflate, Zip, PNG. It was even named an IEEE Milestone. But we know little about them. One long standing question is, can adding a single bit to a file dramatically change the compression achieved? Surely not!
LZ compression, specifically LZ 78, is of both practical and theoretical interest. It is one of a class of dictionary compression methods invented in 1978 by Abraham Lempel and Jacob Ziv. A dictionary compression method is lossless and works by building up a dictionary of short sequences that can then be used to code the file in question. For example, if you had a file that was composed of nothing but 10,000 "A"s then the single dictionary entry "A" and the coding 10,000 would provide a very effective compression. In the real world, of course, files are made up of complex bit sequences but unless they are random some sequences repeat. The difficulty is finding them. An optimal dictionary encoding gives an upper bound to the Kolmogorov complexity of a string. The Kolmogorov complexity is just the shortest program that will generate the string and clearly a compression algorithm (plus the dictionary) qualifies as a program and hence any other program that does the job has to be shorter. With all this in mind, the question arises "can adding a single bit significantly change the compression ratio?". This is sometimes referred to as the "one-bit catastrophe". "Suppose you compressed a file using your favorite compression algorithm, but you realize there were a typo that makes you add a single bit to the original file. Compress it again and you get a much larger compressed file, for a one-bit difference only between the original files. Most compression algorithms fortunately do not have this strange behaviour; but if your favorite compression algorithm is called LZ’78, one of the most famous and studied of them, then this surprising scenario might well happen. . ." The question is as much about the nature of order and entropy as it is compression methods. You could rephrase this as "can a single bit significantly change the order in a bit sequence or string?" The answer, revealed by Guillaume Lagarde and Sylvain Perifel at the Sorbonne, is that for infinite-bit sequences adding a single 0 to the start turns a compressible sequence into an incompressible one. I guess you could say that a single bit can render order into disorder but that might be a bit theatrical. For finite words the situation isn't quite as bad as the compressibility of a word with a bit added has to be close to the original word. But there are words that are easily compressible that are converted into words that are much more difficult to compress by adding a single bit. This theoretical musing does have a practical consequence: "This “catastrophe” shows that LZ’78 is not robust with respect to the addition or deletion of bits. Since a usual good behaviour of functions used in data representation is a kind of “continuity”, our results show that, in this respect, LZ’78 is not a good choice, as two words that differ in a single bit can have images very far apart."
More InformationLempel-Ziv: a "one-bit catastrophe" but not a tragedy Related ArticlesData Compression The Dictionary Way Axiom Of Choice - The Programmer's Guide The Programmer's Guide To The Transfinite Claude Shannon - Information Theory And More
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info
|
Last Updated ( Monday, 17 July 2017 ) |