Abstract
The article presents a novel audio zero-watermarking technique that is free of creating any watermark data in the original audio. The strategy depends on the robust features of the audio in order to generate an audio fingerprint. Primarily, the audio is divided into non-overlapping frames, and time-based, frequency-based, and structural features are obtained out of each frame. The features are then weighted on the basis of its stability to construct a very robust composite feature representation. The adaptive thresholding scheme is used to apply a binary sequence to the composite representation and the watermark data is then appended to the binary sequence to produce the final audio fingerprint. Lastly, the features extracted along with the corresponding binary sequence that acts as a label of every frame are then fed to a Random Forest (RF) classifier. The feature vectors derived when extracting the attacked audio are fed to the trained Random Forest classifier in order to reconstruct the original binary sequence during extraction. This sequence is subsequently combined with the existing fingerprint and it becomes possible to reconstruct the watermark. The results of simulations indicate that the technique preserves the original quality of the audio, but provides good resistance to numerous types of attacks, such as additive noise, filtering, MP3 compression, re-sampling and re-quantization changes, as well as time and pitch rescaling.