The GSQueeZ algorithm is a new way of compressing genome data that could save a lot of money  but does it deserve to be patented?
The Zip compression algorithm is so ubiquitous that even dumb users understand the verb "to zip". It is as much as any algorithm inventor can hope for. Now we have the GSQueeZ compression algorithm aimed at making genomic squencing data take up a lot fewer bytes.
It is claimed that the new algorithm can achieve compression rates of 80%, which is not at all surprising given the high degree of regularity and hence redundancy in the data. What makes the compression method so suitable for the data is that it preserves the order and it allows access to portions of the data without decompressing the entire file.
The new algorithm is a direct application of Huffman encoding  a fundamental algorithm from information theory. A standard Huffman encoding is computed from each base plus additional data  the higher the frequency of occurrence of the base+data the shorter the code used to represent it. This is pure Huffman coding procedure.
The data to be encoded is first scanned to create a frequency table and then a Huffman coding tree. The data is then scanned a second time and the data encoded using the Huffman code. The resulting code is then written out, complete with a header giving the Huffman code dictionary. To most programmers this would seem to be a common sense application of an existing algorithm.
If you are looking for the novel idea that makes this algorithm different you probably aren't going to find it. This is a worthwhile implementation of the standard Huffman algorithm  but it isn't a new idea and it isn't a new algorithm.
If you want to use GSQueeZ for Academic/noncommercial use then fine, you can even have the source code, but if you want to investigate or use it from a within a company that makes money, no matter how small, then you have to pay for the privilege. Which is a bit strange considering the company concerned is TGen which claims to be a Nonprofit Biomedical Research Institute.
The creators are also applying for a patent on the algorithm. If they get the patent then presumably it will cover any compression method that is an implementation of the Huffman code  which is arguably all such methods.
What is worse is that if you want to read the details of the algorithm you will have to pay, typically $25 for a day's access to the journal in question  bioinformatics.
Link to the paper on the Bioinformatics Journal
Of course if you have an academic affiliation the chances are that you will be allowed to access the paper for "free"  which means the institution pays for you. Programmers often don't have an academic affiliation and the combination of patent and academic restriction is oppressive in every sense.
The final though has to be  what if Huffman had patented his much more groundbreaking and fundamental algorithm?
Further reading
Coding theory
Data compression the dictionary way
Claude Shannon
Information theory
Traveling Salesman Applied To DNA Synthesis 13/01/2016
One of the fun things about being a programmer is that algorithms that you know can often be applied in unexpected ways in areas that are the domain of other specialists. A good example is the traveli [ ... ]

PostgreSQL Version 9.5 06/01/2016
With the new PostgreSQL version 9.5 about to be released, we take a look at the most anticipated upgrades it will bring, beginning with its new Upsert feature.
 More News 
<ASIN:0470089857>
<ASIN:0262101068>
<ASIN:059615450X>
<ASIN:354072799X>
<ASIN:0199230234>
<ASIN:1420063677>
<ASIN:0763751863> 