Previous | ToC | Next Labs: Error Correction and Compression. Part 2. Math Alive

Binary Version of Lempel-Ziv Algorithm


Suppose we have a binary stream we want to compress. We start with parsing exactly the same way we did with characters. Let's take as an example the following binary string:

001101100011010101001001001101000001010010110010110

The result looks like this:

,0,01,1,011,00,0110,10,101,001,0010, 01101,000,00101,001011,0010110

Since we will code the prefix number using the same 0s and 1s that also occur as characters in the string, we need our coded strings to have a fixed length. Since we have 16 strings (counting also the first "empty" string, which will be given the position number "zero"), we will need 4 bits for numbering them in binary. As we number the strings, starting from the first non-empty string (see Position Number in the table below), we also check what the Prefix is (that is the piece of the string before its last digit) and the Position Number of that prefix (which, by construction, is a string that was encountered before). The coded string is then constructed by taking the Position Number of the Prefix, and following that by the last bit of the string that we are considering.

This table shows the proper method for encoding. Fill in the empty slots:

Practice
Coding Individual Strings for Binary Version of Lempel-Ziv Algorithm

StringPosition Number
of this string
Position Number
in binary
Prefix Position Number
of Prefix
Coded String
01 0001empty000000000
01 2 00100000100011
1 3 0011empty000000001
0114 010001001000101
00 5 01010000100010
0110 6 0110011010001000
10 7 01111001100110
1018 100010011101111
0019 1001
001010 1010
0110111 1011
00012 1100
00101 13 1101
00101114 1110
0010110 15 1111

The LZ-coded version of the chain in this example is then formed by writing all the coded strings (from the last column in the table) in one long chain.
In our example, the original string of:

001101100011010101001001001101000001010010110010110

becomes:

0000000011000010010100010010000011001111
01011100100100101010101011101111101



Previous | ToC | Next Last Modified: August 2008