Abstract
Congressional data are essential for analyzing the U.S. law-making process, political networks, and policy outcomes. However, accessing up-to-date official data for scientific research often requires a framework to ingest large-scale, unstructured data from government sources, along with an automated pipeline to validate, curate, and synthesize these data. We introduce the Bulk Ingestion of Congressional Actions & Materials (BICAM) dataset, which includes eleven components of congressional activities, actors, and materials, covering all electronically available official records from 1789 to the present: Bills, Amendments, Members, Committees, Committee Reports, Prints, Meetings, Nominations, Hearings, Treaties, and Congresses. To support integration with external datasets and facilitate quantitative and qualitative research on the U.S. Congress, BICAM also provides standardized identifiers for each component. By linking BICAM to filings from the Lobbying Disclosure Act of 1995, we demonstrate its applicability and potential to advance empirical research on legislative processes.