Abstract
STUDY OBJECTIVES: Demographic data are critical in identifying and addressing disparities but are challenged by data classification issues, particularly for Hispanic or Latino patients. Ethnicity data are typically collected through (1) binary Hispanic or Latino response and/or (2) country-of-origin checklist; however, there is no consensus on which populations are represented by the term Hispanic or Latino. Our objective was to examine the agreement between the commonly collected binary ethnicity variable and country-of-origin-based definitions. METHODS: We conducted a cross-sectional study among patients in a regional health care system (January 1, 2021 to November 16, 2023). The primary outcome was agreement between the binary Hispanic or Latino ethnicity and country of origin. Given the variation in countries represented by the term Hispanic or Latino, we used multiple definitions including from the US Office of Management and Budget. RESULTS: Among the 2,919,810 patients identified, 83.1% had completed responses to the binary Hispanic or Latino ethnicity question and 75.1% had completed responses to the country-of-origin ethnicity variable. Using the binary variable, 241,391 were documented as Hispanic or Latino and of these, 169,731 (70%) had countries of origin identified in the Office of Management and Budget definition. An expanded definition additionally including Brazil, Haiti, Belize, and Guyana had increased agreement (n=176,048; 73%). CONCLUSION: Our findings highlight the limitations of using only the binary Hispanic or Latino ethnicity variable, specifically in that it may lead to underestimation. Efforts to improve data quality and nuance, particularly in the emergency department, are critical as inaccurate assessment of disparities may lead to misdirection of interventions, and, ultimately, missed opportunities to decrease disease burden.