Abstract
BACKGROUND: Accurate estimation of low-density lipoprotein cholesterol (LDL-C) is crucial for atherosclerotic cardiovascular disease (ASCVD) risk management. Extensive international validation studies have demonstrated that traditional formulas (Friedewald, Martin/Hopkins, Sampson) often yield significant errors under conditions of extreme hypertriglyceridemia. This study aimed to assess the performance of these conventional formulas in Chinese populations and develop a novel neural network-based LDL-C estimation model [LDL-C(NN)]. METHODS: In this retrospective study, we analyzed 188,887 lipid profiles-including total cholesterol, triglycerides, high-density lipoprotein cholesterol, and directly measured LDL-C-from Peking University Shenzhen Hospital using Mindray (outpatients, n = 83,731) and Beckman (inpatients, n = 105,156) systems. The test results from the two detection systems are non-overlapping. We used stratified random sampling based on TG levels to select 30,000 profiles from each of the two systems as the training dataset (60,000 profiles in total). Within this training dataset, 70 % of profiles were used for parameter learning, 15 % were used for early-stopping validation, and 15 % were used for post-training testing. The remaining profiles constituted the independent test set for the final performance evaluation (Mindray: n = 53,731; Beckman: n = 75,156). We then compared the performance of LDL-C(NN) with the Friedewald, Martin/Hopkins, and Sampson formulas using correlation coefficient (r), root mean square error (RMSE), Concordance Correlation Coefficient (CCC) and clinical risk stratification accuracy. RESULTS: Compared with directly measured LDL-C, LDL-C(NN) demonstrated higher correlation and lower RMSE than other traditional LDL-C equations in the Mindray system (r = 0.9778, RMSE = 0.1762 mmol/L; vs Friedewald quation: r = 0.8894, RMSE = 0.4783 mmol/L; vs Martin/Hopkins quation: r = 0.9658, RMSE = 0.2463 mmol/L; vs Sampson quation: r = 0.9548, RMSE = 0.2934 mmol/L, particularly patients with high triglycerides (TG levels, 9.03-13.56 mmol/L, neural network Model: CCC = 0.8750, vs Friedewald quation: CCC = 0.3320; vs Martin/Hopkins quation: CCC = 0.7278; vs Sampson quation: CCC = 0.4176). Beckman database shows the same performance. The clinical classification accuracy for LDL-C(NN) reached 87.5 % (Mindray) and 83.4 % (Beckman), surpassing that of other traditional LDL-C equations (66.6-78.7 %). CONCLUSIONS: By overcoming the linear assumptions of conventional equations, the neural network-based model significantly improves LDL-C estimation in hypertriglyceridemia (especially≥9.03 mmol/L) and complex lipid profiles, thereby expanding the applicability of traditional formulas, while demonstrating robust performance across multiple analytical systems.