Abstract
OBJECTIVE: Cough is a key symptom in respiratory diseases, yet its clinical assessment remains challenging, often relying on subjective questionnaires or inefficient manual cough counting. Existing automated cough detection algorithms have limited generalisability due to a lack of validation on large, diverse datasets. This study aimed to develop and evaluate a fully automated cough detection system using spectro-temporal analysis and a Vision Transformer (ViT) model. METHODS: A total of 231 annotated 24-hour cough recordings across 9 diagnostic categories from the RaDAR database were analysed. Recordings were segmented and converted into spectrograms using different short-time Fourier transform settings. Data were split subject-wise into training, validation, and test sets to prevent data leakage. The ViT model was fine-tuned in two stages: a pilot stage using 10% of the data to determine optimal spectrogram parameters, followed by full-scale training on the remaining data. RESULTS: Spectrogram configuration significantly affected performance, with 750 ms segment duration, 128-point frame size, and 32-point hop identified as optimal. With these parameters, the model achieved an F1 score of 85.02%, sensitivity of 83.64%, precision of 86.44%, and specificity of 99.67% on the test set. Diagnostic category-wise analysis showed high F1 scores in interstitial lung disease (90.83%), chronic obstructive pulmonary disease (89.60%), asthma (88.13%) and chronic cough (85.32%). CONCLUSIONS: Vision Transformers with optimised spectrogram preprocessing enable accurate, scalable cough detection across diverse populations, performing comparably to popular convolutional neural networks on a larger, more diverse, and more challenging dataset. These findings support the use of ViT-based systems for objective, automated cough monitoring in clinical practice.