Abstract
BACKGROUND: Osteoarthritis (OA) affects over 500 million patients worldwide, significantly impacting their quality of life and independence. The incidence of OA is expected to rise due to aging populations and increasing obesity rates. Traditionally considered a degenerative, age-related disease, OA is now recognized as heterogeneous with different phenotypes. This study aimed to evaluate the use of real-world evidence (RWE) data for phenotyping OA and developing a predictive model for total knee replacement (TKR) using insurance data from Israel. METHODS: The study utilized anonymized electronic health record data from Maccabi Healthcare Services, covering over 26% of the Israeli population. Patients diagnosed with knee OA 2000–2023, were included. Data on medical history, socioeconomic status, and lifestyle factors were extracted from the database. The outcome was the time to the first TKR, analyzed using a regularized Lasso Cox regression model. The model’s performance was assessed using the area under the receiver operating characteristic curve and C-index metrics. RESULTS: A total of 135,691 patients met the inclusion criteria. The population was divided into a training (80%) and test set (20%) for model development and performance assessment. Baseline data were comparable between the training and test sets. In total, 3230 (2.4%) TKR events were observed with an estimated rate of 2% (95% CI: 1.9–2.1) at 2 years. Out of 62 initial variables the most predictive ones for TKR were age, presence of allergy and BMI. The final model categorized patients into low- (75%) and high-risk (25%) groups based on a predicted risk score with 4% (95%CI: 3.8–4.2) of TKR events by 2 years in high-risk vs. 1.4% (95%CI: 1.2–1.6) in the low-risk group. CONCLUSION: The study demonstrated the feasibility of using RWE data for risk phenotyping in OA and predicting TKR. More granular phenotyping reflecting OA heterogeneity proved impossible. The predictive model, however, based on data typically available in clinical practice, could support shared decision-making, simplify feasibility assessments and enrichment strategies for research for large trials. The use of RWE data may better reflect the epidemiologic reality and reduce biases associated with clinical trials and registries that otherwise form the basis of trial planning. CLINICAL TRIAL NUMBER: Not applicable. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-026-03435-y.