Uncovering predictors of myopia in youth: a secondary data analysis using a machine learning approach

利用机器学习方法进行二次数据分析,揭示青少年近视的预测因素

阅读:1

Abstract

INTRODUCTION: Myopia is a multifactorial condition driven by an interplay of genetic predisposition and environmental triggers. This study aims to harmonize and analyze risk predictors from two distinct datasetsone historical and clinical, the other contemporary and behavioralto develop an integrated framework for myopia risk prediction. METHODS: We analyzed two datasets: the Orinda Longitudinal Study of Myopia (OLSM), a 1995 US cohort (n≈500) with detailed ocular biometrics (e.g., spherical equivalent refraction, axial length) and lifestyle factors, and a 2022-2023 Chinese cross-sectional study (n=100,000) highlighting modern behaviors (e.g., screen time, posture). We employed multiple machine learning modelsincluding logistic regression, Explainable Boosting Machine (EBM), gradient boosting decision trees (GBDT) on OLSM, and deep neural networks (DNN) and XGBoost on the Chinese datasetto identify key predictors. Model interpretability was assessed using SHapley Additive exPlanations (SHAP). We also tested three ensemble strategies (sequential, averaging, transfer learning) to merge insights across the structurally divergent datasets. RESULTS: Both datasets confirmed parental myopia as a universal risk factor and time spent outdoors as a protective factor. In the OLSM dataset, spherical equivalent refraction and parental myopia were the top predictors, with models achieving an AUC of up to 0.92. In the Chinese dataset, the DNN model achieved 71% accuracy, identifying screen time, posture, and parental history as major risk factors. Cross-dataset integration via transfer learning proved most effective, successfully amplifying features like outdoor activity and posture while retaining core behavioral predictors like screen time. This approach bridged the clinical depth of OLSM with the granular, modern lifestyle insights from the Chinese dataset. DISCUSSION: Our analysis confirms the multifactorial nature of myopia, blending historical biological mechanisms with contemporary behavioral drivers. The study demonstrates a scalable strategy for global myopia risk prediction by adaptively integrating diverse datasets. While not yet a turnkey clinical tool, this work lays the groundwork for future multimodal risk-prediction frameworks that can bridge era-specific biases and harness machine learning to capture the evolving profile of myopia risk.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。