Abstract
Deep learning–based Automated Fault Detection and Diagnosis (AFDD) for Air Handling Units (AHUs) has often performed well. However, most prior studies relied on single-building datasets and fixed feature schemas, limiting applicability to new sites. This study evaluated TabTransformer and TabNet on a unified multi-building dataset pooling auditorium, hospital, and office data to examine coverage-aligned combined training across buildings. Extensive hyperparameter optimization covering 3240 models and systematic analyses were performed, including checks for underfitting and overfitting, assessment of validation–test variation, best-model selection, attention heatmaps, and comparisons with non-attention baselines. The optimized TabNet trained on unified data achieved, for the auditorium, an F1 score of 97.43% and an accuracy of 97.91%; for the hospital, an F1 score of 92.01% and an accuracy of 92.50%; and for the office, an F1 score of 92.25% and an accuracy of 92.46%. Single-building TabNet baselines reached 96.82% F1 and 97.37% accuracy for the auditorium, 95.40% F1 and 97.21% accuracy for the hospital, and 96.27% F1 and 97.29% accuracy for the office. Across all three buildings, gains from combined training arose primarily under strong coverage alignment of fault classes between the training and target sets; when alignment was weak, the gains diminished. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-025-24959-9.