Abstract
In speech and audio coding, quantization noise, also known as coding noise, is often modeled as additive white noise. This noise can cause noticeable distortions, particularly in frequency regions where human hearing is most sensitive. Noise shaping, a key technique in speech and audio coding, addresses this issue by redistributing the noise spectrum to less perceptible frequency regions, leveraging auditory masking effects to enhance perceived audio quality. This paper introduces a novel approach to perceptually shape the spectrum of coding noise in speech and audio codecs. The proposed approach is based on the well-known principle of pre- and post-processing with adaptive linear filters, but features a new method for estimating filter coefficients. Typically, these coefficients are estimated either from the current input to the pre-processor (forward adaptation) or from the past synthesized signal (backward adaptation). In contrast, the proposed approach estimates the pre-processing filter coefficients directly from the current input to the pre-processor, while the post-processing filter coefficients are estimated from the current input to the post-processor. This estimation approach eliminates the need for transmitting information and introduces no lag between the frame used for estimation and the frame being filtered. Because the adaptation is achieved without any increase in bitrate, the method is referred to as "zero-bit." To function correctly, some constraints must be imposed on calculating the pre-processor filter coefficients. An implementation of this new approach, termed "constrained adaptation," is described. Subjective evaluation results demonstrate that constrained adaptation performs at least as well as forward or backward adaptation in shaping coding noise, with no cost in bit rate or temporal lag.