Abstract
BACKGROUND: Chronic back pain is a severe health condition with underlying biopsychosocial factors that make diagnosis difficult, and pain chronicity has been shown to be an important variable for studying patient outcomes. Due to the absence of standardized criteria, pain chronicity needs to be manually annotated by clinicians in electronic health records (EHRs), which is not only time consuming but also has the potential to introduce variability in analysis and interpretation among practitioners. OBJECTIVE: Pain chronicity is not typically recorded in EHRs and currently needs to be manually annotated by experts. Using a dataset from an interdisciplinary spine clinic consisting of 386 patients manually annotated for pain chronicity by clinical experts, this study has two objectives: (1) to examine the relationship between expert-annotated chronicity and social determinant variables present in EHRs and (2) to evaluate the feasibility of extracting pain chronicity from the EHR without expert annotation. METHODS: We used a supervised machine learning model, specifically univariate regression, to examine associations between clinician-annotated pain chronicity values and the structured variables present in EHRs. Next, we trained a random forest model to predict pain chronicity by using both structured and unstructured data extracted by clinical Text Analysis and Knowledge Extraction System, a natural language processing (NLP) tool. The features extracted included clinical keywords; duration of pain reported; and the International Classification of Diseases, Tenth Revision codes. The model was assessed using the Pearson correlation coefficient and mean absolute error (MAE). RESULTS: The study analyzed 386 patients (mean age 60.2 years, SD 16.1 years and median age 62.0 years, IQR 48.8-72.0 years) from the San Francisco Bay Area, with 62.7% (n=242) identifying as women. Our univariate regression analysis identified structured EHR variables associated with pain chronicity, which include pain severity before the last visit (P=.006), number of imaging orders (P=.006), number of visits to the neurology department (P=.01), and Medi-Cal insurance coverage (P=.03). Our random forest model using structured data showed a strong correlation of 0.887 (P<.001) with an MAE of 18.45 between predicted and observed chronicity, whereas our model that used the NLP tool to extract information from unstructured clinical notes and structured data showed a slightly higher correlation of 0.968 (P<.001) with an MAE of 10.87 between predicted and observed chronicity. CONCLUSIONS: Our study indicates that pain chronicity from EHR data could be used to study more topics on larger datasets in the future without the need for manual annotation and that using NLP tools to automate prediction is feasible.