[Coladam] sorting to compress : a way to experiment compressing uncompressable data sequences
newcoleco at yahoo.fr
Sat Mar 15 20:33:56 CET 2008
I have no idea what to do with this since I know it
works but we have to be quite lucky to find these
sequences of data. Anyway, this is what I think can
compress some uncompressable data.
We all have the experience of data compression, even
if we didn't realized yet that many many things in
digital multimedia is compressed a lot, sometimes
lossless, sometimes lossy.
Well, let me try to explain to you a way to compress a
sequence of data, all different values and scrambled.
No actual compression technic will do something about
it except stock it as raw data and try to compress the
next sequence to make an average good or bad job at
Well, if we can tell how the data is mixed, and tell
also what are the values in a simple and compact way
when they are sorted... then we have compression.
First problem, we not have to sort the elements but to
tell how the elements are mixed in the sequence. The
mathematical way can be expressed as a single integer
number. By example, if we have 22 mixed elements,
there is an integer value coded into 70 bits than can
tell where are each elements. Why 70 bits? 22
elements, 22! possibility of sequences, let's take
possible values from 0 to 22!-1, we got 22!-1 as the
biggest possible value and the number 22!-1 can be
represented into a 70 bits integer value.
Second problem is the way to specify what are the
values since they are sorted. For a lossy compression,
this is easy, but it can be impossible for lossless to
express the values in a compact enough way to make
For the first problem, I will use an exemple that I
hope will clarify how to calculate the integer value.
For the second problem : For lossy compression it can
be anything like telling which value is the smallest
and which value is the highest andm of course, how
many values there are. For lossless compression, it's
another story, and there's maybe no way to express it
in the bits we have left to still make compression.
The computation of the (suffle) integer value.
The idea is simple, "how many spaces before this
element? (or which position minus 1 I put this value
in the free spaces left)
For this example, we will use only 5 elements.
data sequence #1 : A B C D E
We start from the highest element, it's E. We put no
space yet, so there is no free space before E. And we,
replace E by a free space for the next step.
data sequence #1 : A B C D _
Now with D, no free space before D, so : = 0 * 1!
data sequence #1 : A B C _ _
Now with C, no free space before C, so : = 0 * 2!
data sequence #1 : A B _ _ _
Now with B, no free space before B, so : = 0 * 3!
data sequence #1 : A _ _ _ _
Now with A, no free space before A, so : = 0 * 4!
We add up the values and we got total : = 0
You probably notice that this first sequence example
was in order. And some of you may remark that 0 is the
smallest positive integer value.
Let's try with another data sequence
data sequence #2 : E D C B A
The highest is E. Again, it's obvious, no free space
yet so nothing to compute, just put a space where E
data sequence #2 : _ D C B A
Now with D, 1 free space before D, so : = 1 * 1! = 1
data sequence #2 : _ _ C B A
Now with C, 2 free spaces before C, so : = 2 * 2! = 4
data sequence #2 : _ _ _ B A
Now with B, 3 free spaces before B, so : = 3 * 3! = 18
data sequence #2 : _ _ _ _ A
Now with A, 4 free spaces before B, so : = 4 * 4! = 96
We add up the values. total = 96 + 18 + 4 + 1 = 119
You probably notice that this second sequence example
was in the reverse order. And some of you may remark
that 119 is 5! - 1, 5 elements, 5! possibilities,
there is 5! (120) possible values in the range [0,119]
(0 to 5!-1).
Well, if you have time, give me your impression about
Envoyez avec Yahoo! Mail. La boite email la plus appreciée au monde. http://mail.yahoo.fr
More information about the Coladam