Aspects of
Sampling, Oversampling, Quantisation, Dither, and Noise Shaping,
as Applied to Digital Audio
Copyright,
Christopher Hicks, November 1994. V1.11
0. Copyright
This document is copyright 1994,
Christopher Hicks. Permission is
hereby granted to distribute or store it in electronic or
magnetic
form, and to generate paper copies for private study, educational
use or other non-commercial purpose provided that both of the
following conditions are met
Christopher Hicks, November 1994.
1.
Introduction
The aim of this article is to dispel as
many of the myths surrounding
the conversion of audio signals to the digital domain, and back
to the
analogue domain, as possible, without the aid of mathematics and
(much
more difficult) without the aid of diagrams. All of the
buzz-words in
the title are directly related to these two processes and are, to
a
large extent, analogous in the two.
The conversion of an analogue signal,
such as the signal from a
microphone, to a form in which it may be digitally stored or
manipulated requires two distinct processes - those of sampling
and
quantisation. Sampling and oversampling are concerned with the
capture
of an analogue quantity at a certain instant in time;
quantisation,
dither and noise-shaping are concerned with the representation of
this
quantity by a digital word of finite length.
2. Sampling
Sampling can be (roughly) defined as the
capture of a continuously
varying quantity at a precisely defined instant in time. Most
usually,
signals are sampled at a set of sample-points spaced regularly in
time. Note that this section says nothing about the digital word
format used to represent this sample - that is considered later
in the
section on quantisation. The Nyquist theorem states that in order
to
faithfully capture all of the information in a signal of
one-sided
bandwidth B, it must be sampled at a rate greater than 2B. A
direct
corollary of this is that if we wish to sample at a rate of 2B
then we
must pre-filter the signal to a one-sided bandwidth of B,
otherwise it
will not be possible to accurately reconstruct the original
signal
from the samples. The frequency 2B that is the minimum sample
rate to
retain all of the signal information is called the Nyquist
frequency.
The spectrum of the sampled signal is the
same as the spectrum of the
continuous signal except that copies (known as aliases) of the
original now appear centered on all integer multiples of the
sample
rate. As an example, if a signal of 20 kHz bandwidth is sampled
at 50
kHz then alias spectra appear from 30 - 70 kHz, 80 - 120 kHz, and
so
on. It is because the alias spectra must not overlap that a
sample
rate of greater than 2B is required. In digital audio we are
concerned with the base-band - that is to say the signal
components
which extend from 0 to B. Therefore, to sample at the standard
digital
audio rate of 44.1 kHz requires the input signal to be
band-limited to
the range 0 Hz to 22.05 kHz. Strictly speaking the input signal
must
be band-limited to infinitesimally less than 22.05 kHz, but this
is of
no practical significance.
2.1 Nyquist Sampling
The obvious, old and hard way of sampling
an analogue voltage at 44.1
kHz is to do just that - feed the voltage into a conventional
track
and hold sampler running at a 44.1 kHz sample rate. As shown
above,
this requires that the input signal be band-limited to half the
sample
rate, in this case 22.05 kHz, else the aliased spectra will
overlap
and information will be lost.
For a practical implementation this may
require an analogue filter of
order 8 or 10 to be inserted upstream of the sampler to provide
an
audio bandwidth of 20 kHz and also the 80 dB or so of attenuation
above about 24 kHz that is required for high-fidelity sound
reproduction. It is possible to design such a filter, but it
would
require a number of closely-toleranced components and would
suffer
from all of the usual ailments associated with analogue
electronics.
2.2 Oversampling
Analogue to Digital Converters
The less obvious, but easier and cheaper
way (at least with the advent
of cheap VLSI multipliers) is to sample the input at a higher
frequency, thereby relaxing the constraints on the analogue input
signal spectrum, and then low-pass filtering and decimating
(reducing
the sampling rate) in the digital domain. To do this the input is
sampled at a higher frequency such as 4 x 44.1 kHz. As before,
this
requires the signal to contain no significant components above
half
the Nyquist frequency, but because of the increased sampling rate
the
Nyquist frequency is now 176.4 kHz. Since the analogue filter can
still start to roll off at 20 kHz or so, but it does not need to
be 80
dB down until the first alias spectrum starts at 156 kHz it can
be of
lower order.
Now we have a digital data stream
representing an analogue signal with components from dc to 88.2
kHz. This data is passed through a digital filter with a sharp
cut-off at 22 kHz, which is relatively easy to
implement, and can be made to have a precisely linear phase
response.
The filter output is a digital data stream representing the
original
analogue waveform, but with all the components above 22 kHz
severely
attenuated. Now we discard three out of every four of the samples
to
get our stream of samples at a rate of 44.1 kHz.
The scheme just outlined is conceptually
fine. In a practical
implementation the digital filter would be designed so that,
rather
than discarding the unwanted output samples, they do not have to
be
calculated in the first place. This represents a significant
saving of
computational effort.
2.3 Oversampling
Digital to Analogue Converters
Again the aim is to reduce the complexity
of an analogue filter, this
time the interpolation filter after the DAC, whose purpose is to
remove the alias spectra from the converter output. For an audio
signal of 20 kHz bandwidth the reconstruction filter has
(ideally) to
have a gain of one from dc to 20 kHz and a gain of zero from 24
kHz
upwards. As in the case of the sampler this would require a
complicated analogue filter.
If, however, we increase the sample rate
by creating some new samples
digitally then the first few alias spectra can be removed by a
digital
filter, relaxing the performance requirements of the analogue
filter.
For example a four times oversampling audio DAC runs at 176.4
kHz, so
the first alias spectrum starts at around 156 kHz, and the
analogue
reconstruction filter can be of lower order since its
transition-band
is now 120 kHz wide.
The first step in this process is to
insert three new samples in
between each of the original ones. The value is unimportant but
zero
is often used as it enables an efficient hardware implementation.
If
zeros are used then the spectrum of the sampled signal at this
point
is unchanged, although the sample rate has been quadrupled.
Next, this faster sample stream is passed
through a digital filter
whose action is to make the new samples a smooth interpolation of
the
original data. The output of this filter is a sample stream at
176.4
kHz whose base-band spectrum (i.e. the music) is the same as the
original, but the first three alias spectra of the original
sampled
signal have now been removed.
Finally, the analogue signal is
reconstructed with a DAC running at
the four-times rate, and a low-order analogue filter, which
removes
the alias spectra centered around 176.4 kHz and multiples
thereof.
2.4 Jitter
Jitter is defined as the timing error at
the transitions of a digital
signal. Taking the ADC as an example, the effect of timing errors
on
the sample clock is to sample the input signal at slightly the
wrong
time instant, so although the average sample rate may be very
accurately 44.1 kHz the samples are not necessarily taken exactly
every 1/44100th of a second, but perhaps a little early or a
little
late. Since the input is constantly changing, a timing error on
the
sample clock translates to an erroneous sample level being
captured.
The effects of sample clock jitter become
more pronounced for high
amplitude and high frequency input signals. The level and nature
of
jitter required for its effects to be audible is a current topic
of
research, debate and religious wars.
3.
Quantisation
At some point the sampled analogue
quantity has to be converted to a
finite-length digital word; this process is called quantisation.
This
will generally be done immediately after the sampler so that the
subsequent data manipulation may be done digitally, but this is
not
necessarily the case, for example if a switched-capacitor
pre-filter
is used. Common word-lengths used for digital audio are 16, 18,
20 and
24 bits. The best ADC and DAC chips around that are fast enough
for
digital audio have a resolution of 20 bits, though 16 and 18 bit
parts
are cheaper and far more common.
In a standard ADC (the "old, hard
way" described above) the quantiser
resolution and the output resolution are the same (since the
digital
output comes directly from the quantiser). Assuming a random
input
signal the errors associated with this quantisation process are
white
and uncorrelated, and yield a best-case signal to noise ratio of
around 6n dB, where n is the number of bits in the output word.
3.1 Quantisation in
Oversampling Converters
Consider a 16-bit ADC clocked at 44.1
kHz. Its quantisation noise is
approximately 96 dB below a full-scale sinusoid, and is spread
evenly
from dc to 22.05 kHz. If the ADC is clocked faster then the total
quantisation noise power remains unaffected, but it is spread
over a
wider bandwidth. For example, if the converter speed is doubled,
the
quantisation noise power is spread from dc to 44.1 kHz. The
desired
signal is, of course, still in the band from dc to 22.05 kHz, and
the
quantisation noise power in this band is halved.
A digital low-pass filter with a cut-off
of 22.05 kHz cuts out half of
the quantisation noise, increasing the SNR by 3 dB, but leaving
the
audio-band signal unaffected. This filter is generally the same
one as
the digital anti-alias filter mentioned above. The process is
extendible, and for each doubling of the sampling rate, the
audio-band
quantisation noise is lowered by 3 dB. For example, the
quantisation
noise for a 4-times oversampling converter will be 6 dB lower
(after
filtering) than for the same converter operating without
oversampling.
By the same principle, if we ran a 15 bit
ADC at 4 x 44.1 kHz we would
get the same audio-band performance as with a 16-bit device
sampling
at 44.1 kHz, since the increase of quantisation noise due to the
poorer resolution is balanced by the SNR improvement brought
about by
oversampling. Again the process is extendible, and for each
factor of
four by which we increase the sample rate we can drop one bit of
resolution off the converter.
So if we oversample by a factor of 4^15
then theoretically we can drop
15 of the 16 bits, and use a one-bit converter. Alas, this
implies a
sample rate of about 50E12 Hz, which is well into the infra-red.
However, noise-shaping can be used to reduce the sample rate
required
to a such a level that the use of very low resolution converters
is
practical.
Oversampling gives similar benefits in
the digital to analogue
conversion process. For each factor of four by which the sample
stream
is oversampled, one bit may be dropped from each data word
without
significantly degrading the audio-band performance. Dropping bits
introduces noise into the signal, but if the signal has been
sufficiently oversampled then the power of this new noise in the
audio
band is lower than the noise in the original recording. Hence the
original noise dominates and the wordlength truncation introduces
insignificant amounts of extra noise.
Again, to reduce the word-length to one
bit in this simplistic manner
requires the same impractical sample rate as in the A to D case,
but
once again high-quality audio performance is achieved at
practical
sample rates with noise-shaping.
4. Dither
Much of the preceding text on
quantisation and requantisation noise
assumes the signal to be random. For many signals this is not the
case, and the result is that the quantisation noise, rather than
being
white, is found to be highly correlated with the signal. This
manifests itself as very nasty-sounding level-dependent
distortions
which become more prominent as the signal is decreased in
amplitude.
To avoid this problem a small amount of
noise is added to the signal
before quantisation, this process being known as dithering. The
dither
values are often drawn from a triangular distribution, and it is
desirable that they be uncorrelated. The dither power is chosen
such
that the quantiser transfer function is just linearised, without
adding excess noise to the quantised signal. This has the effect
of
decorrelating the quantisation noise from the signal, and
avoiding
quantisation-related distortion.
5.
Noise-Shaping
Quantisation (after the addition of
white-noise dither) ideally
results in a white power spectrum - that is, the noise floor has
constant noise power spectral density (NPSD). This is a direct
result
of rounding to the nearest value when performing the
quantisation,
with a random input signal.
However, if we base our decision of
whether to round up or down not
upon which is the nearer value but upon some other criterion,
then we
can make the output quantisation noise spectrum have almost any
form
we desire, but still have (roughly) the same total power. We are
not
going to hear noise above about 20 kHz, so we force as much of
the
quantisation noise as possible into the band above 20 kHz.
The Nyquist frequency for a 256 times
oversampling converter is about
11.2 MHz, (compared with 50 THz calculated for the hypothetical
1-bit
converter above). If we oversample by 256 times then we can put
as
much noise as we want into the band from 20 kHz to 5.6 MHz, where
it
will not interfere with the audio. By keeping the quantisation
NPSD
low enough in the audio band we are able to achieve an audio-band
SNR
of 90 dB or more; the NPSD above 20 kHz will be very much higher,
but
since there is no desired signal at those frequencies it really
does
not matter.
This is accomplished in practice by
placing the quantiser in a
feedback loop with a digital filter, such that the filtered
quantisation error is subtracted from a subsequent input sample.
A
detailed explanation of this is unfortunately beyond the scope of
this
article, being a virtual impossibility without pictures and a few
equations.
5.1 Noise Shaping in
Mastering
Noise shaping is used in a very similar
manner in mastering processes
such as Deutsche Gramaphon's ABI and Sony's Super Bit Mapping.
Using a
20-bit ADC to record the master tape and then requantising to
16-bits
with a suitable (proprietary) noise-shaping algorithm then,
rather
than have a white noise floor, they achieve low NPSD where the
ear is
most sensitive at around 3 kHz, and a much higher NPSD at high
frequencies. The shaped noise floor is found to be subjectively
quieter, though the total quantisation noise power is actually
slightly higher.
6. Conclusion
Various aspects of analogue to digital,
and digital to analogue
conversion have been discussed, with particular reference to
digital
audio applications. It was seen that A to D conversion is
performed as
two distinct processes - sampling and quantisation.
Criteria were stated for adequate
performance of both processes, and
various schemes to circumvent associated problems were put
forward.
Emphasis was placed on the technique of oversampling and the
application of dither and noise-shaping was also mentioned.