Frequency Domain Coding of Speech
The speech signal is divided into a set of frequency components which are quantized and encode separately. These take advantage of speech perception and generation models without making the algorithm totally dependent on the models used.Hence the quantization noise can be contained within bands and prevented from creating harmonic distortions outside the band.These schemes have the advantage that the number of bits used to encode each frequency component can be dynamically varied and shared among the different bands
Frequency domain coding algorithms:
- Sub-band Coding (SBC): Divides the speech signal into many smaller sub-bands and encodes each sub-band separately according to some perceptual criterion.
- Block Transform coding. (BTC): Codes the short-time transform of a windowed sequence of samples and encodes them with number of bits proportional to its perceptual significance
- Sub-band coding can be thought of as a method of controlling and distributing quantization noise across the signal spectrum
- In a sub-band coder, speech is typically divided into four or eight sub-bands by a bank of filters, and each sub band is sampled at a band pass Nyquist rate (which is lower than the original sampling rate) and encoded with different accuracy in accordance to a perceptual criteria
- Band-splitting can be done in many ways. One approach could be to divide the entire speech band into unequal sub-bands that contribute equally to the articulation index as shown below
- Another way to split the speech band would be to divide it into equal with' sub-bands and assign to each sub-band number of bits proportional to perceptual significance while encoding them.
- There are various methods for processing the sub-band signals
- One way is to make a low pass translation of the sub-band signal to zero Frequency by a modulation process equivalent to single sideband modulation. Thiskind of translation facilitates sampling rate reduction and possesses other benefits that accrue from coding low-pass signals
- The input signal is filtered with a band pass filter of width wn for the nth band w ln is the lower edge of the band and w 2n is the upper edge of the band. The resulting signal S n(t) is modulated by a cosine wave cos(w n)t and filtered using a low pass filter h0(t) with bandwidth (0 -w n ) The resulting signal rn(t) corresponds to the low pass translated version of S n(t) and can be expressed as
- This signal is then digitally encoded and multiplexed with encoded signals from other channels. At the receiver the data is demultiplexed into separate channels, decoded, and band pass translated to give the estimate of r n(t) for the nth channel.
- The low pass translation technique is straightforward and takes advantage of a bank of non-overlapping band pass filters
- Sub-band coding can be used for coding speech at bit rates in the range 9.6 kbps to 32 kbps. In this range, speech quality is roughly equivalent to that of ADPCM at an equivalent bit rate