[Coladam] sorting to compress (PART 2)

Daniel Bienvenu newcoleco at yahoo.fr
Sat Mar 15 20:52:51 CET 2008

For those who read the first message and still don't
get it on how there can be compression. I will compute
here in bits the size of the raw data sequence, than
compare with what the sorting phase can do.

Take these letters : A,B,C,D and E.

For a computer, these letters are values like 65, 66,
67, 68 and 69 in ASCII. total = 5 * 8 = 40 bits

For the huffman encoding technic, these letters can be
simplified as each 3 bits value, with some clues on
how to decompress it. total = 5 * 3 = 15 + extra

The integer value that represent how the sequence is
mixed, can be represented as a 7 bits integer. total =
7bits + extra information.

For small group of elements, the results are not
satisfying. In fact the extra information needed may
give no compression at all, but it's the problem of
all the compression technics.

Second attempt to explain my point of view. If we have
a sequence of data with the 22 first letters in the
alphabet, but all mixed up...

Again, in ASCII it's 8 bits each. total = 22 * 8 = 176

Huffman encoding will gives : = 110 bits + extra

For the integer value that represent how the values
are mixed up, the range is [0 , 22!-1], 22!-1 can be
represented as a 70 bits integer value. total = 70 +
extra information

So, more we have elements in the sequence, less we use
to represent what is the sequence.

The terrible questions are : is there a limit? yes. is
there a way that it may work to gain in compression
where there is no possible compression at all? yes and
no. It's hope to you to discover them and give
feedback about this.

Thanks for reading

Envoyez avec Yahoo! Mail. La boite email la plus appreciée au monde. http://mail.yahoo.fr

More information about the Coladam mailing list