Abstract
The paper proposes the use of Bag-of-Words classifiers for the reliable detection of tuberculosis infection from cough recordings. The effect of using both independent and combined distinct feature extraction procedures and encoding strategies is evaluated in terms of standard performance metrics such as the Area Under Curve (AUC), accuracy, sensitivity, and F1-score. Experiments were conducted on two distinct large datasets, using both the original recordings and extended versions obtained by augmentation techniques. Performances were assessed by repeated k-fold cross-validation and by employing external datasets. An extensive ablation study revealed that the proposed approach yields up to 0.77 accuracy and 0.84 AUC values, comparing favorably against existing solutions and exhibiting robustness against various combinations of the setup parameters.