Abstract
There has been a proliferation of large-scale electronic health record (EHR) data platforms that pool across multiple healthcare organizations, such as the National Institutes of Health's All of Us in the federal space and TriNetX and Epic Cosmos in the commercial space. There are unique issues that occur when EHR data are aggregated across disparate healthcare systems beyond the general-and more well known-concerns about secondary analysis of EHR data from a single entity. In this article, we define aggregated EHR data, contrasting it to other real-world data sources, highlight benefits and challenges when working with aggregated EHR data, offer several "good practices" to address these challenges, and conclude by discussing whether it is appropriate to pool these data together or not.