Aspects of Sampling, Oversampling, Quantisation, Dither, and Noise Shaping, as Applied to Digital Audio
Copyright, Christopher Hicks, November 1994. V1.11


0. Copyright

This document is copyright 1994, Christopher Hicks. Permission is
hereby granted to distribute or store it in electronic or magnetic
form, and to generate paper copies for private study, educational
use or other non-commercial purpose provided that both of the
following conditions are met

  1. The document must remain complete and unaltered,
    No charge may be levied for copying it, or for access to it.
  2. I reserve all other rights to this document. I disclaim responsibility
    for damage or loss that may arise from inaccuracies herein.

Christopher Hicks, November 1994.

1. Introduction

The aim of this article is to dispel as many of the myths surrounding
the conversion of audio signals to the digital domain, and back to the
analogue domain, as possible, without the aid of mathematics and (much
more difficult) without the aid of diagrams. All of the buzz-words in
the title are directly related to these two processes and are, to a
large extent, analogous in the two.

The conversion of an analogue signal, such as the signal from a
microphone, to a form in which it may be digitally stored or
manipulated requires two distinct processes - those of sampling and
quantisation. Sampling and oversampling are concerned with the capture
of an analogue quantity at a certain instant in time; quantisation,
dither and noise-shaping are concerned with the representation of this
quantity by a digital word of finite length.

2. Sampling

Sampling can be (roughly) defined as the capture of a continuously
varying quantity at a precisely defined instant in time. Most usually,
signals are sampled at a set of sample-points spaced regularly in
time. Note that this section says nothing about the digital word
format used to represent this sample - that is considered later in the
section on quantisation. The Nyquist theorem states that in order to
faithfully capture all of the information in a signal of one-sided
bandwidth B, it must be sampled at a rate greater than 2B. A direct
corollary of this is that if we wish to sample at a rate of 2B then we
must pre-filter the signal to a one-sided bandwidth of B, otherwise it
will not be possible to accurately reconstruct the original signal
from the samples. The frequency 2B that is the minimum sample rate to
retain all of the signal information is called the Nyquist frequency.

The spectrum of the sampled signal is the same as the spectrum of the
continuous signal except that copies (known as aliases) of the
original now appear centered on all integer multiples of the sample
rate. As an example, if a signal of 20 kHz bandwidth is sampled at 50
kHz then alias spectra appear from 30 - 70 kHz, 80 - 120 kHz, and so
on. It is because the alias spectra must not overlap that a sample
rate of greater than 2B is required. In digital audio we are
concerned with the base-band - that is to say the signal components
which extend from 0 to B. Therefore, to sample at the standard digital
audio rate of 44.1 kHz requires the input signal to be band-limited to
the range 0 Hz to 22.05 kHz. Strictly speaking the input signal must
be band-limited to infinitesimally less than 22.05 kHz, but this is of
no practical significance.

2.1 Nyquist Sampling

The obvious, old and hard way of sampling an analogue voltage at 44.1
kHz is to do just that - feed the voltage into a conventional track
and hold sampler running at a 44.1 kHz sample rate. As shown above,
this requires that the input signal be band-limited to half the sample
rate, in this case 22.05 kHz, else the aliased spectra will overlap
and information will be lost.

For a practical implementation this may require an analogue filter of
order 8 or 10 to be inserted upstream of the sampler to provide an
audio bandwidth of 20 kHz and also the 80 dB or so of attenuation
above about 24 kHz that is required for high-fidelity sound
reproduction. It is possible to design such a filter, but it would
require a number of closely-toleranced components and would suffer
from all of the usual ailments associated with analogue electronics.

2.2 Oversampling Analogue to Digital Converters

The less obvious, but easier and cheaper way (at least with the advent
of cheap VLSI multipliers) is to sample the input at a higher
frequency, thereby relaxing the constraints on the analogue input
signal spectrum, and then low-pass filtering and decimating (reducing
the sampling rate) in the digital domain. To do this the input is
sampled at a higher frequency such as 4 x 44.1 kHz. As before, this
requires the signal to contain no significant components above half
the Nyquist frequency, but because of the increased sampling rate the
Nyquist frequency is now 176.4 kHz. Since the analogue filter can
still start to roll off at 20 kHz or so, but it does not need to be 80
dB down until the first alias spectrum starts at 156 kHz it can be of
lower order.

Now we have a digital data stream representing an analogue signal with components from dc to 88.2 kHz. This data is passed through a digital filter with a sharp cut-off at 22 kHz, which is relatively easy to
implement, and can be made to have a precisely linear phase response.
The filter output is a digital data stream representing the original
analogue waveform, but with all the components above 22 kHz severely
attenuated. Now we discard three out of every four of the samples to
get our stream of samples at a rate of 44.1 kHz.

The scheme just outlined is conceptually fine. In a practical
implementation the digital filter would be designed so that, rather
than discarding the unwanted output samples, they do not have to be
calculated in the first place. This represents a significant saving of
computational effort.

2.3 Oversampling Digital to Analogue Converters

Again the aim is to reduce the complexity of an analogue filter, this
time the interpolation filter after the DAC, whose purpose is to
remove the alias spectra from the converter output. For an audio
signal of 20 kHz bandwidth the reconstruction filter has (ideally) to
have a gain of one from dc to 20 kHz and a gain of zero from 24 kHz
upwards. As in the case of the sampler this would require a
complicated analogue filter.

If, however, we increase the sample rate by creating some new samples
digitally then the first few alias spectra can be removed by a digital
filter, relaxing the performance requirements of the analogue filter.
For example a four times oversampling audio DAC runs at 176.4 kHz, so
the first alias spectrum starts at around 156 kHz, and the analogue
reconstruction filter can be of lower order since its transition-band
is now 120 kHz wide.

The first step in this process is to insert three new samples in
between each of the original ones. The value is unimportant but zero
is often used as it enables an efficient hardware implementation. If
zeros are used then the spectrum of the sampled signal at this point
is unchanged, although the sample rate has been quadrupled.

Next, this faster sample stream is passed through a digital filter
whose action is to make the new samples a smooth interpolation of the
original data. The output of this filter is a sample stream at 176.4
kHz whose base-band spectrum (i.e. the music) is the same as the
original, but the first three alias spectra of the original sampled
signal have now been removed.

Finally, the analogue signal is reconstructed with a DAC running at
the four-times rate, and a low-order analogue filter, which removes
the alias spectra centered around 176.4 kHz and multiples thereof.

2.4 Jitter

Jitter is defined as the timing error at the transitions of a digital
signal. Taking the ADC as an example, the effect of timing errors on
the sample clock is to sample the input signal at slightly the wrong
time instant, so although the average sample rate may be very
accurately 44.1 kHz the samples are not necessarily taken exactly
every 1/44100th of a second, but perhaps a little early or a little
late. Since the input is constantly changing, a timing error on the
sample clock translates to an erroneous sample level being captured.

The effects of sample clock jitter become more pronounced for high
amplitude and high frequency input signals. The level and nature of
jitter required for its effects to be audible is a current topic of
research, debate and religious wars.

3. Quantisation

At some point the sampled analogue quantity has to be converted to a
finite-length digital word; this process is called quantisation. This
will generally be done immediately after the sampler so that the
subsequent data manipulation may be done digitally, but this is not
necessarily the case, for example if a switched-capacitor pre-filter
is used. Common word-lengths used for digital audio are 16, 18, 20 and
24 bits. The best ADC and DAC chips around that are fast enough for
digital audio have a resolution of 20 bits, though 16 and 18 bit parts
are cheaper and far more common.

In a standard ADC (the "old, hard way" described above) the quantiser
resolution and the output resolution are the same (since the digital
output comes directly from the quantiser). Assuming a random input
signal the errors associated with this quantisation process are white
and uncorrelated, and yield a best-case signal to noise ratio of
around 6n dB, where n is the number of bits in the output word.

3.1 Quantisation in Oversampling Converters

Consider a 16-bit ADC clocked at 44.1 kHz. Its quantisation noise is
approximately 96 dB below a full-scale sinusoid, and is spread evenly
from dc to 22.05 kHz. If the ADC is clocked faster then the total
quantisation noise power remains unaffected, but it is spread over a
wider bandwidth. For example, if the converter speed is doubled, the
quantisation noise power is spread from dc to 44.1 kHz. The desired
signal is, of course, still in the band from dc to 22.05 kHz, and the
quantisation noise power in this band is halved.

A digital low-pass filter with a cut-off of 22.05 kHz cuts out half of
the quantisation noise, increasing the SNR by 3 dB, but leaving the
audio-band signal unaffected. This filter is generally the same one as
the digital anti-alias filter mentioned above. The process is
extendible, and for each doubling of the sampling rate, the audio-band
quantisation noise is lowered by 3 dB. For example, the quantisation
noise for a 4-times oversampling converter will be 6 dB lower (after
filtering) than for the same converter operating without oversampling.

By the same principle, if we ran a 15 bit ADC at 4 x 44.1 kHz we would
get the same audio-band performance as with a 16-bit device sampling
at 44.1 kHz, since the increase of quantisation noise due to the
poorer resolution is balanced by the SNR improvement brought about by
oversampling. Again the process is extendible, and for each factor of
four by which we increase the sample rate we can drop one bit of
resolution off the converter.

So if we oversample by a factor of 4^15 then theoretically we can drop
15 of the 16 bits, and use a one-bit converter. Alas, this implies a
sample rate of about 50E12 Hz, which is well into the infra-red.
However, noise-shaping can be used to reduce the sample rate required
to a such a level that the use of very low resolution converters is
practical.

Oversampling gives similar benefits in the digital to analogue
conversion process. For each factor of four by which the sample stream
is oversampled, one bit may be dropped from each data word without
significantly degrading the audio-band performance. Dropping bits
introduces noise into the signal, but if the signal has been
sufficiently oversampled then the power of this new noise in the audio
band is lower than the noise in the original recording. Hence the
original noise dominates and the wordlength truncation introduces
insignificant amounts of extra noise.

Again, to reduce the word-length to one bit in this simplistic manner
requires the same impractical sample rate as in the A to D case, but
once again high-quality audio performance is achieved at practical
sample rates with noise-shaping.

4. Dither

Much of the preceding text on quantisation and requantisation noise
assumes the signal to be random. For many signals this is not the
case, and the result is that the quantisation noise, rather than being
white, is found to be highly correlated with the signal. This
manifests itself as very nasty-sounding level-dependent distortions
which become more prominent as the signal is decreased in amplitude.

To avoid this problem a small amount of noise is added to the signal
before quantisation, this process being known as dithering. The dither
values are often drawn from a triangular distribution, and it is
desirable that they be uncorrelated. The dither power is chosen such
that the quantiser transfer function is just linearised, without
adding excess noise to the quantised signal. This has the effect of
decorrelating the quantisation noise from the signal, and avoiding
quantisation-related distortion.

5. Noise-Shaping

Quantisation (after the addition of white-noise dither) ideally
results in a white power spectrum - that is, the noise floor has
constant noise power spectral density (NPSD). This is a direct result
of rounding to the nearest value when performing the quantisation,
with a random input signal.

However, if we base our decision of whether to round up or down not
upon which is the nearer value but upon some other criterion, then we
can make the output quantisation noise spectrum have almost any form
we desire, but still have (roughly) the same total power. We are not
going to hear noise above about 20 kHz, so we force as much of the
quantisation noise as possible into the band above 20 kHz.

The Nyquist frequency for a 256 times oversampling converter is about
11.2 MHz, (compared with 50 THz calculated for the hypothetical 1-bit
converter above). If we oversample by 256 times then we can put as
much noise as we want into the band from 20 kHz to 5.6 MHz, where it
will not interfere with the audio. By keeping the quantisation NPSD
low enough in the audio band we are able to achieve an audio-band SNR
of 90 dB or more; the NPSD above 20 kHz will be very much higher, but
since there is no desired signal at those frequencies it really does
not matter.

This is accomplished in practice by placing the quantiser in a
feedback loop with a digital filter, such that the filtered
quantisation error is subtracted from a subsequent input sample. A
detailed explanation of this is unfortunately beyond the scope of this
article, being a virtual impossibility without pictures and a few
equations.

5.1 Noise Shaping in Mastering

Noise shaping is used in a very similar manner in mastering processes
such as Deutsche Gramaphon's ABI and Sony's Super Bit Mapping. Using a
20-bit ADC to record the master tape and then requantising to 16-bits
with a suitable (proprietary) noise-shaping algorithm then, rather
than have a white noise floor, they achieve low NPSD where the ear is
most sensitive at around 3 kHz, and a much higher NPSD at high
frequencies. The shaped noise floor is found to be subjectively
quieter, though the total quantisation noise power is actually
slightly higher.

6. Conclusion

Various aspects of analogue to digital, and digital to analogue
conversion have been discussed, with particular reference to digital
audio applications. It was seen that A to D conversion is performed as
two distinct processes - sampling and quantisation.

Criteria were stated for adequate performance of both processes, and
various schemes to circumvent associated problems were put forward.
Emphasis was placed on the technique of oversampling and the
application of dither and noise-shaping was also mentioned.