S12. A MACHINE LEARNING FRAMEWORK FOR ROBUST AND RELIABLE PREDICTION OF SHORT- AND LONG-TERM CLINICAL RESPONSE IN INITIALLY ANTIPSYCHOTIC-NAïVE SCHIZOPHRENIA PATIENTS BASED ON MULTIMODAL NEUROPSYCHIATRIC DATA

S12. 基于多模态神经精神病学数据,对初次接受抗精神病药物治疗的精神分裂症患者的短期和长期临床反应进行稳健可靠的机器学习框架

阅读:1

Abstract

BACKGROUND: The treatment response of patients with schizophrenia is heterogeneous, and markers of clinical response are missing. Studies using machine learning approaches have provided encouraging results regarding prediction of outcomes, but replicability has been challenging. In the present study, we present a novel methodological framework for applying machine learning to clinical data. Herein, algorithm selection and other methodological choices were based on model performance on a simulated dataset, to minimize bias and avoid overfitting. We subsequently applied the best performing machine learning algorithm to a rich, multimodal neuropsychiatric dataset. We aimed to 1) classify patients from controls, 2) predict short- and long-term clinical response in a sample of initially antipsychotic-naïve first-episode schizophrenia patients, and 3) validate our methodological framework. METHODS: We included data from 138 antipsychotic-naïve, first-episode schizophrenia patients, who had undergone assessments of psychopathology, cognition, electrophysiology, structural magnetic resonance imaging (MRI). Perinatal data and long-term outcome measures were obtained from Danish registers. Baseline diagnostic classification algorithms also included data from 151 matched healthy controls. Short-term treatment response was defined as change in psychopathology after the initial antipsychotic treatment period. Long-term treatment response (4–16 years) was based on data from Danish registers. The simulated dataset was generated to resemble the real data with respect to dimensionality, multimodality, and pattern of missing data. Noise levels were tunable to enable approximation to the signal-to-noise ratio in the real data. Robustness of the results was ensured by running two parallel, fundamentally different machine learning pipelines, a ‘single algorithm approach’ and an ‘ensemble approach’. Both pipelines included nested cross-validation, missing data imputation, and late integration. RESULTS: We significantly classified patients from controls with a balanced accuracy of 64.2% (95% CI = [51.7, 76.7]) for the single algorithm approach and 63.1% (95% CI = [50.4, 75.8]) for the ensemble approach. Post hoc analyses showed that the classification primarily was driven by the cognitive data. Neither approach predicted short- and long-term clinical response. To validate our methodological framework based on simulated data, we selected the best, a medium, and the most poorly performing algorithm on the simulated data and applied them to the real data. We found that the ranking of the algorithms was kept in the real data. DISCUSSION: Our rigorous modelling framework incorporating simulated data and parallel pipelines discriminated patients from controls, but our extensive, multimodal neuropsychiatric data from antipsychotic-naïve schizophrenia patients were not predictive of the clinical outcome. Nevertheless, our novel approach holds promise as an important step to obtain reliable, unbiased results with modest sample sizes when independent replication samples are not available.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。