Abstract
OBJECTIVE: To develop an electronic medical record (EMR) data processing tool that confers clinical context to machine learning (ML) algorithms for error handling, bias mitigation, and interpretability. MATERIALS AND METHODS: We present Trust-MAPS, an algorithm that translates clinical domain knowledge into high-dimensional, mixed-integer programming models that capture physiological and biological constraints on clinical measurements. EMR data are projected onto this constrained space, effectively bringing outliers to fall within a physiologically feasible range. We then compute the distance of each data point from the constrained space modeling healthy physiology to quantify deviation from the norm. These distances, termed "trust-scores," are integrated into the feature space for downstream ML applications. We demonstrate the utility of Trust-MAPS by training a binary classifier for early sepsis prediction on data from the 2019 PhysioNet Computing in Cardiology Challenge, using the XGBoost algorithm and applying SMOTE for overcoming class-imbalance. RESULTS: The Trust-MAPS framework shows desirable behavior in handling potential errors and boosting predictive performance. We achieve an area under the receiver operating characteristic curve of 0.91 (95% CI, 0.89-0.92) for predicting sepsis 6 hours before onset-a marked 15% improvement over a baseline model trained without Trust-MAPS. DISCUSSIONS: Downstream classification performance improves after Trust-MAPS preprocessing, highlighting the bias reducing capabilities of the error-handling projections. Trust-scores emerge as clinically meaningful features that not only boost predictive performance for clinical decision support tasks but also lend interpretability to ML models. CONCLUSION: This work is the first to translate clinical domain knowledge into mathematical constraints, model cross-vital dependencies, and identify aberrations in high-dimensional medical data. Our method allows for error handling in EMR and confers interpretability and superior predictive power to models trained for clinical decision support.