Previous | ToC | Next Labs: Error Correction and Compression. Part 2. Math Alive

Lempel-Ziv Algorithm: Coding a Text with Digits

Suppose our text contains not only letters and punctuation marks, but digits as well. How will we be able to distinguish in the coded chain whetheer a digit is a last character of a string (and thus a character from the original text) or part of the reference number of a prefix? There is an easy solution to this problem: make all the reference numbers have a fixed length, adding leading zeros when necessary. In our example we have fewer than 100 strings, so we can use two (decimal) digits for each reference number.
In our example, the original string of:
If_one_doctor_doctors_another_doctor,_does_the_doctor_who_doctors_the_doctor_doctor_ the_doctor_the_way_the_doctor_he_is_doctoring_doctors?_Or_does_he_doctor_the_doctor_ the_way_the_doctor_who_doctors_doctors?
becomes:
00I00f00_00o00n00e03d04c00t04r07o00c09o00r00s03a05o09h06r11c13r00,11e15_ 18e20t10_00w00h04_00d08t10s03t29e_26o14_31o12t27t35_38c21_25_28a00y34h06_ 42t27h48i24d32o14i05g36r15?03O37d04e24h48d53r47e56_44w00a46_44d63_ 28h30d63s56s00?

Thus, we can use the same symbols for prefix and for the last character, since all our encoded strings have the same length now, and the position of a character within an encoded string unambigously indicates its role. This idea allows us to compress binary text using only 0s and 1s. You will see an example on the next page.

To summarize, we have 4 steps:
1. parsing
2. counting the number of strings to choose the size for encoding prefix.
3. coding individual strings
4. joining individual coded strings to a chain.