Abstract
Background: Suicide remains a leading cause of death among youth, yet effective tools to predict suicide attempts (SA) in individuals under 18 are scarce. This study aims to develop machine learning (ML) models to predict SA in paediatric populations using Google Trends data. Methods: Relative Search Volumes (RSVs) from Google Trends were analysed for terms linked to suicide risk factors. Pearson Correlation Coefficients (PCC) identified terms strongly associated with SA rates. Based on these, several ML models were developed and evaluated, including Random Forest Regression, Support Vector Regression (SVR), XGBoost, and Linear Regression. Model performance was assessed using metrics such as PCC, mean absolute error (MAE), mean squared error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE). Results: Terms related to suicide prevention and symptoms, including psychiatrist and anxiety disorder, showed the strongest correlations with SA rates (PCC ≥ 0.90). Random Forest Regression emerged as the top-performing ML model (PCC = 0.953, MAPE = 20.12%, RMSE = 17.21), highlighting burnout, anxiety disorder, antidepressants, and psychiatrist as key predictors of SA. Other models' scores were XGBoost (PCC = 0.446, MAPE = 22.57%, RMSE = 18.03), SVR (PCC = 0.833, MAPE = 42.23%, RMSE = 47.32) and Linear Regression (PCC = 0.947, MAPE = 23.64%, RMSE = 17.66). Conclusions: Google Trends-based ML models suggest potential utility for short-term prediction of youth SA. These preliminary findings support the utility of search data in identifying real-time suicide risk in paediatric populations.