[Coladam] About tone channels attenuation (volume) -and- digital sounds

Daniel Bienvenu newcoleco at yahoo.fr
Tue Dec 8 20:21:07 CET 2009


WARNING, THE FOLLOWING IS A TECHNICAL TEXT.

Hi!

Attenuation can be understand as the opposite of volume : less you give attenuation to a sound channel, louder it is. Based on the technical information (sound chip SN76489), there is a ratio of 10^-0.1 between two attenuation level. And the range of possible attenuations are from 0 to 14, 15 is supposed to be no sound at all (mute). If we use 1 as the sound without attenuation and 0 as mute, we can make this table of attenuation values.

0 -> 1,000000000000000
1 -> 0,794328234724282
2 -> 0,630957344480193
3 -> 0,501187233627272
4 -> 0,398107170553497
5 -> 0,316227766016838
6 -> 0,251188643150958
7 -> 0,199526231496888
8 -> 0,158489319246111
9 -> 0,125892541179417
10 -> 0,100000000000000
11 -> 0,079432823472428
12 -> 0,063095734448019
13 -> 0,050118723362727
14 -> 0,039810717055350
15 -> 0,000000000000000

Now, why I'm talking about this? Well, during the last 4 days I was thinking about digital sounds and the way I'm currently adding them in my projects. If you take a look at these values, you notice there is not the same difference (or delta) between attenuations 13 - 14 ( = 0,010308006307377 ) and between attenuations 4 - 5 ( = 0,081879404536659 )... which is a problem when talking about using volume variations to output a digitalized sound. But, don't forget that there is 3 tone channels, so 3 attenuations that can be combined to get more nuances.

Let's be ambicious and decide that, instead of a 4-bit precision output, we want a 6-bit precision output, which simply means to me a range of attenuation values between 0 and 63. So, I've calculated the 64 possible attenuations, well distributed, with almost no error. This is what I've got.

Notice here that I'm using hexadecimal values to represent the attenuation of the 3 tone channels. So, instead of 10 to 15 it's A to F.

Again, each line is the attenuation values to output but this time for the 3 tone channels at once, followed by the resulting volume output between 1.0 (full) and 0.0 (mute), and the percent of error calculated for an uniformed range of values between 0 and 0,730811801759260, not up to 1 to reduce the error near to 0%. And yeah, "0,000..." means "0.000...", it's just the french notation I'm using with "," instead of ".".

F F F -> 0,000000000000000 => error is 0.000%
F F E -> 0,013270239018450 => error is 0.167%
F F C -> 0,021031911482673 => error is 0.217%
F E C -> 0,034302150501123 => error is 0.050%
F E A -> 0,046603572351783 => error is 0.020%
F D 9 -> 0,058670421514048 => error is 0.067%
D B B -> 0,069661456769195 => error is 0.006%
C B A -> 0,080842852640149 => error is 0.036%
F B 7 -> 0,092986351656439 => error is 0.018%
D C 7 -> 0,104246896435878 => error is 0.015%
C 9 8 -> 0,115825864957849 => error is 0.018%
A 9 8 -> 0,128127286808509 => error is 0.053%
A 8 8 -> 0,138992879497408 => error is 0.021%
9 9 7 -> 0,150437104618574 => error is 0.037%
E D 4 -> 0,162678870323858 => error is 0.028%
B 9 5 -> 0,173851043556228 => error is 0.015%
B B 4 -> 0,185657605832784 => error is 0.005%
D C 3 -> 0,197038891348450 => error is 0.016%
F 9 3 -> 0,209026591602230 => error is 0.022%
8 6 6 -> 0,220288868516009 => error is 0.011%
C 5 5 -> 0,231850422160565 => error is 0.015%
F A 2 -> 0,243652448160064 => error is 0.005%
D 5 4 -> 0,254817886644354 => error is 0.039%
A 7 3 -> 0,266904488374720 => error is 0.010%
E 4 4 -> 0,278675019387448 => error is 0.027%
E 7 2 -> 0,290098097677477 => error is 0.009%
D C 1 -> 0,302514230845009 => error is 0.091%
D 4 3 -> 0,313035040412040 => error is 0.017%
B A 1 -> 0,324587019398903 => error is 0.022%
C 5 2 -> 0,336760281648350 => error is 0.035%
D 7 1 -> 0,347991063194632 => error is 0.001%
9 8 1 -> 0,359570031716603 => error is 0.004%
D C 0 -> 0,371071485936916 => error is 0.013%
D A 0 -> 0,383372907787576 => error is 0.057%
D 3 2 -> 0,394087767156731 => error is 0.032%
5 4 3 -> 0,405174056732536 => error is 0.083%
6 3 3 -> 0,417854370135168 => error is 0.025%
8 3 2 -> 0,430211299117859 => error is 0.100%
C 2 2 -> 0,441670141136135 => error is 0.086%
8 7 0 -> 0,452671850247666 => error is 0.026%
7 4 1 -> 0,463987212258222 => error is 0.002%
5 5 1 -> 0,475594588919319 => error is 0.001%
7 2 2 -> 0,487146973485758 => error is 0.006%
7 3 1 -> 0,498347233282814 => error is 0.046%
4 3 2 -> 0,510083916220321 => error is 0.032%
C 3 0 -> 0,521427656025097 => error is 0.058%
A 3 0 -> 0,533729077875757 => error is 0.012%
3 3 2 -> 0,544443937244913 => error is 0.076%
D 2 0 -> 0,556922687178514 => error is 0.011%
7 3 0 -> 0,566904488374720 => error is 0.150%
5 2 1 -> 0,580504448407104 => error is 0.050%
3 2 2 -> 0,587700640862553 => error is 0.391%
5 3 0 -> 0,605804999881370 => error is 0.260%
D 1 0 -> 0,614815652695670 => error is 0.001%
6 2 0 -> 0,627381995877050 => error is 0.097%
9 1 0 -> 0,640073591967899 => error is 0.206%
5 2 0 -> 0,649061703499010 => error is 0.055%
4 1 1 -> 0,662254546667353 => error is 0.104%
4 2 0 -> 0,676354838344563 => error is 0.354%
2 2 1 -> 0,685414307894889 => error is 0.100%
3 1 1 -> 0,696614567691945 => error is 0.060%
9 0 0 -> 0,708630847059806 => error is 0.102%
8 0 0 -> 0,719496439748704 => error is 0.028%
4 1 0 -> 0,730811801759260 => error is 0.000%
 
To answer the possible questions "why you don't use all the possible values instead or selecting only 64? Why only values from 0 to 0.73 instead of 0 to 1?" Well, it's a compromise : there is a total of 816 possible output from 0 to 1 by using the 3 tone channel attenuations, by 64 possible values can be somehow encode them as 6-bit values, reducing an original audio signal into digital 6-bit values is easy, storing and decoding 6-bit values is easier and takes so much less space than 10-bit values (815 = 1100101111), and to limit the error to almost nothing I should to not use all the range from 0 to 1 for the output.

LET'S CONTINUE

We want to encode a digital sound that will gives a nice quality and a small storage size. Using 6-bit values and convert them with the appropriate sound channels attenuations will gives this result. But 6-bit values is not easy to deal with compared to 4-bit values (nibbles), and are bigger too. So, a data compression method is needed, but nothing too complex to keep using an acceptable bitrate for the output.

COMPRESSION METHOD (SUGGESTION)

Instead of encoding all the 6-bit values, we can try to encode some of them and the rest can be 4-bit values corresponding to the difference (delta) between the previous 6-bit value and the next one. To do so, and in the hope to reduce the data size, I'll use a lossy compression that will somehow not really affect the sound quality.

I've calculated this table that could be used to accomplish the lossy data compression.

TABLE OF SELECTED DELTAs

0 : add 0
1 : add 1
2 : add 4
3 : add 7
4 : add 10
5 : add 13
6 : add 16
7 : add 19
8 : sub 19
9 : sub 16
A : sub 13
B : sub 10
C : sub 7
D : sub 4
E : sub 1
F : Read a 6-bit value.

Let's suppose a wave signal that uses 6-bit values from $00-$3F (0 to 63) and try to encode them with this table.

$20 $2F $36 $39 $3A $3B $3A $39 $36 $2F $20 $11 $0A $07 $06 $07 $0A $11 $20

ENCODE WITH THE SPECIAL TABLE. 

F $20 : read $20
6 : add 16 => $30 (error = +1)
3 : add 7 => $37 (error = +1)
1 : add 1 => $38 (error = -1)
1 : add 1 => $39 (error = -1)
1 : add 1 => $3A (error = -1)
0 : add 0 => $3A 
E : sub 1 => $39
D : sub 4 => $35 (error = -1)
C : sub 7 => $2E (error = -1)
A : sub 13 => $21 (error = +1)
9 : sub 16 => $11
C : sub 7 => $0A
D : sub 4 => $06 (error = -1)
0 : add 0 => $06
1 : add 1 => $07
2 : add 4 => $0B (error = +1)
3 : add 7 => $12 (error = +1)
5 : add 13 => $1F (error = -1)

Let's compare :

ORIGINAL SIGNAL
$20 $2F $36 $39 $3A $3B $3A $39 $36 $2F $20 $11 $0A $07 $06 $07 $0A $11 $20

OUTPUT SIGNAL
$20 $30 $37 $38 $39 $3A $3A $39 $35 $2E $21 $11 $0A $06 $06 $07 $0B $12 $1F

ENCODED DATA
$F $20 $6 $3 $1 $1 $1 $0 $E $D $C $A $9 $C $D $0 $1 $2 $3 $5 

ENCODED DATA REARRANGED NIBBLES INTO BYTES

$F2 $06 $31 $11 $0E $DC $A9 $CD $01 $23 $5x

Note : the 6-bit value $20 can be stored into a second table instead of being into the same table as the nibbles... and can may even be stored as a real 6-bit value, packing four(4) 6-bit values into three(3) bytes.

*BONUS* ENCODE WITH DITHERING EFFECT :

F $20 : read $20
6 : add 16 => $30 (error = +1), dithering $36 => next one is $35
2 : add 4 => $34 (error = -1), dithering $39 => next one is $3A
3 : add 7 => $3B (error = +1), dithering $3A => next one is $39
E : sub 1 => $3A (error = +1), dithering $3B => next one is $3A
0 : add 0 => $3A
0 : add 0 => $3A 
E : sub 1 => $39
D : sub 4 => $35 (error = -1), dithering $2F => next one is $30
D : sub 4 => $31 (error = +1), dithering $20 => next one is $1F
8 : sub 19 => $1E (error = -1), dithering $11 => next one is $12
A : sub 13 => $11 (error = -1), dithering $0A => next one is $0B
C : sub 7 => $0A (error = -1), dithering $07 => next one is $08
E : sub 1 => $09 (error = +1), dithering $06 => next one is $05
D : sub 4 => $05
1 : add 1 => $06 (error = -1), dithering $0A => next one is $0B
2 : add 4 => $0A (error = -1), dithering $11 => next one is $12
3 : add 7 => $11 (error = -1), dithering $20 => next one is $21
6 : add 16 => $21

Which one gives the best output signal for the human ears, I don't know for the moment, but using a dithering effect seems logic based on what I've read in the Internet. The output signal may looks more scramble, but these adjustements by dithering effect may do the difference for the sound quality at the end. We use dithering for our coleco bitmap pictures, why not with sounds.

BUT THIS IS JUST IN THEORY, WITH MATHS AND EXAMPLES

Yes, it's just mathematics, not tested yet, but usually if a model can predict something in experimentation, then it will means I'm correct and should continue developping this idea.

SO, WHAT DO I NEED?

I suppose a sound tool may exist already to reduce any sound we want into a mono 6-bit signal, at a bitrate the decoder can do, and probably using a dithering effect to reduce the perception of errors in the signal quality.

Inside the decoder, a table of 64 entries will gives what to output directly into the sound port $FF (to use opcode OTIR) to get these output "volume" values. Fast and easy. And I'll try without compression first to hear the sound quality and then with sound compression to see the difference in the bitrate and quality. This is the decoder part.

So, an interface will be necessary to convert data for the project based on this. And this is the encoder part.

An encoder and a decoder... equals a codec.

There are other strategies out there, including the one I'm currently using in my projects which includes adding timelaps information to reduce the encoded size of some things like silences (like RLE compression), and not encoding repetitions by using index values (like a dictionnary compression : LZSS). And depending on the approximation you may accept for the output signal, you can reduce even more the size of the encoded data.

A lot of fun in perspective!

I want to hear your reactions. But first, a simple question, do you understand what I try to accomplish here?

Have a nice day!

Daniel



      


More information about the Coladam mailing list