Abstract
INTRODUCTION: Traditional methods to handle missing data rely on making assumptions about missing data patterns. Locally estimated scatterplot smoothing (LOESS) regression models were explored as a data-driven option to minimize missing weight data in a longitudinal cohort of breast cancer patients. METHODS: Outpatient weights from 2 years prior to breast cancer diagnosis to 10 years post were extracted from electronic health records for 10,778 women with invasive breast cancer diagnosed from 2005-2013 at Kaiser Permanente. LOESS regression models estimated weights at baseline (breast cancer diagnosis) and 6 follow-up time points (6, 12, 24, 48, 72, and 96 months post-baseline). The weights identified by the LOESS models were compared with those identified by the closest-available method, in which the weight measurement closest to each timepoint within a specified time window was selected. RESULTS: Compared with the closest-available method, LOESS models identified fewer weights at baseline and 6 months post, but significantly more weights at later follow-up periods. At all timepoints, more than 80% of the weights identified by both approaches differed by 2.50 kilograms or less. CONCLUSIONS: LOESS regression makes effective use of available longitudinal data and may be a beneficial tool to minimize missing longitudinal data in future EHR-based studies.