Abstract
Differentiable frontends, such as the LEArnable Frontend (LEAF), have drawn increasing interest from the computer audition (CA) community combining the rigour of traditional signal processing techniques with the flexibility and potential of end-to-end deep learning approaches. Concretely, they promise the ability to automatically learn task-specific features, resulting in both higher performance and better interpretability of CA applications. With the adaptability of LEAF’s parameters being questioned in recent literature, we further dig into the reasons why LEAF does not adjust its parameters. We thus perform a detailed analysis investigating the effects of filterbank initialisation for LEAF in a wide, previously unmatched range of computer audition tasks, namely speech recognition, speech emotion recognition, acoustic scene classification, and bird activity detection. In line with literature, we report that performance stays constantly high irrespective of filterbank initialisation, so long as it covers the entire frequency spectrum, in which case adaptation is minimal. Crucially, however, a filterbank initialised with all frequency bands equally does change its centre frequencies and bandwidths, yet remains with a lower performance. This effect is seemingly independent of how information is spread across frequencies, as we confirm in an additional set of experiments with controlled frequency distributions. This points towards the critical role of initialisation and the inductive bias of LEAF and manifests concerns about the adaptability and interpretability of LEAF across many settings. The code for our experiments is publicly available under https://github.com/millinma/LEAFFrequencyAnalysis.