Abstract
OBJECTIVES: To identify and characterise distinct subgroups of patients with asthma with severe acute exacerbations (AEs) by using a multistep clustering methodology that combines supervised and unsupervised machine learning. METHODS: This cohort study used anonymised, all-payer medical and prescription US claim data from October 2015 to May 2022. First, gradient-boosted decision trees were trained to predict AE in 4 132 973 patients with asthma, of whom 86 735 experienced AE. This model was applied to a holdout set of 86 434 patients with asthma with AE to derive SHapley Additive exPlanations (SHAP) values. SHAP values were then subjected to non-linear dimensionality reduction and density-based clustering to identify distinct subgroups among these patients. These subgroups were described using key clinical and demographic characteristics. RESULTS: Clustering identified five distinct subgroups of patients with asthma with AE, broadly differentiated by histories of acute care encounters, healthcare utilisation, AE treatments, coded asthma severity, specialist encounters, first-hand tobacco exposure, mood disorders and patient demographics. Notably, there was considerable between-cluster variability in the predicted likelihood of AE, with some subgroups comprised of patients who posed a challenge for the predictive model and would have been missed with predictive modelling alone. DISCUSSION: By identifying distinct subgroups among patients with asthma experiencing AE, this study highlights the heterogeneity within this population and emphasises the need for more personalised management of AE. CONCLUSION: Applying predictive modelling and clustering to real-world data can help identify discrete phenotypes of patients and offer an important source of information for developing risk assessment and mitigation efforts.