Abstract
OBJECTIVE: To evaluate the clinical utility and implementation considerations of artificial intelligence (AI)-based fetal health classification systems using the Kaggle Fetal Health Classification dataset, with a focus on obstetric physicians' perspectives. METHODS: We analyzed the Kaggle Fetal Health Classification dataset (n=2,126), containing 21 cardiotocography parameters. Five machine-learning algorithms were evaluated: logistic regression, random forest, gradient boosting, support vector machine, and decision tree. Class weighting was applied to address the dataset imbalance. The model performance was assessed using standard classification metrics. An expert opinion-based clinical utility assessment framework was developed to assess interpretability, workflow integration, and safety. RESULTS: With class weighting applied, gradient boosting achieved the highest accuracy (89.67%), followed by random forest (88.50%) and logistic regression (82.16%). The most important predictive features were abnormal short-term variability (16.23% importance) and the percentage of time with abnormal long-term variability (13.21% importance). An analysis of all 21 features revealed that contraction-related parameters, including uterine_contractions, contributed minimally to the classification performance. The 35.3% false negative rate for pathological cases represents a significant safety concern and requires physician oversight. CONCLUSION: AI-based fetal health classification systems show potential for future applications when properly validated. However, the significant false negative rate for pathological cases indicates that these systems cannot function independently. External validation using multicenter clinical data and prospective outcome studies is essential before clinical implementation.