Abstract
INTRODUCTION: Many clinical data networks often focus on a single use-case or disease. By contrast, the TriNetX Dataworks-USA Network contains real-world clinical information that can be applied to multiple research questions and use cases. The purpose of this study is to describe the Network's characteristics, as well as its generalizability to the US population, particularly the healthcare-seeking population. METHODS: Using the Dataworks-USA Network, a large, regularly updated data network containing de-identified patient electronic health record (EHR) information from across the United States, basic demographics were summarized and compared to the US Census Bureau International Database (IDB) 2022 data and the National Cancer Institute's version of the Census Bureau's U.S. County Population Data for 2022 to examine the generalizability of the Network. RESULTS: Patients in the Dataworks-USA Network are approximately 5 years older than the Census, and the Network has a larger proportion of female patients. The Network has a lower proportion of patients identified as Asian and White race, and a higher proportion who identify as other relative to the Census; other races are similar between the two data sources (< 1% difference). Regionally, Dataworks-USA has a smaller proportion of patients in all race categories compared with the Census due to the larger proportion of patients of Unknown or Other race. CONCLUSIONS: TriNetX's Dataworks-USA Network provides a robust data source for many use cases and is broadly generalizable to the US population, particularly the healthcare-seeking population, with differences related to the underlying nature of the data sources.