Abstract
BACKGROUND: Machine learning provides a powerful framework to model the complex patterns underlying migraine attack onset from real-world high dimensional datasets. In this study, we used machine learning to forecast headache days using mobile health (mHealth) data from a migraine biofeedback treatment app. METHOD: This was a machine learning analysis of data from the BioCer clinical trial (NCT05616741) evaluating app-based biofeedback for preventive treatment of episodic migraine. Participants completed three months of daily biofeedback sessions with wearables measuring trapezius muscle tension, heart rate variability, and peripheral skin temperature. Input data for the models included summary metrics from the biofeedback sessions and daily headache diary entries. The outcome of interest was the presence of a moderate-to-severe headache (defined as an intensity of 4 or higher on an 11-point scale of 0-10) on the next calendar day and the next three calendar days. The dataset was randomly split into training, validation, and test sets. Multiple standard machine learning architectures, foundation models, and time-series models were trained and optimized using the area under the receiver operating characteristics curve (AUC) as the primary scoring metric. Among these three classes of machine learning models, the best optimized model in each class identified during training was applied on the unseen test set. Permutation feature importance (PFI) was created for model explainability. RESULTS: 146 individuals, with a total of 21,550 headache days, were included in the forecasting models. For the next calendar day predictions, the top performing standard machine learning approach (decision tree) and foundation model achieved a test set AUC of 0.59 (95% CI 0.56 to 0.61) and 0.55 (95% CI 0.55 to 0.56), respectively. The best time-series model achieved a test set AUC of 0.84 (95% CI 0.82 to 0.85). For the three-calendar day forecasting window, the test set performances were 0.55 (95% CI 0.53 to 0.56), 0.55 (95% CI 0.54 to 0.57), and 0.76 (95% CI 0.74 to 0.77), respectively. The most important features were headache intensity, duration of the headache, and heart rate scores. CONCLUSION: Time-series machine learning models using a relatively large dataset could forecast moderate-to-severe headaches with good accuracy in patients with episodic migraine.