Abstract
We present chromosome-level, phased diploid genome assemblies of two widely used human fibroblast cell lines: BJ (46,XY) and IMR-90 (46,XX). Using Oxford Nanopore, PacBio HiFi, and Hi-C sequencing data, we generated assemblies spanning 5.9 and 6.0 Gbp with diploid quality values exceeding QV 60. To validate structural integrity, we developed KaryoScope, an alignment-free tool for generating computational karyograms from k-mer feature databases. We identify >50 000 structural variants relative to T2T-CHM13v2.0, the majority of which are heterozygous and cell-line-specific. Combining reference-based and de novo gene annotation, we uncover a previously unreported 1 Mbp homozygous duplication at the 16p11.2 locus in BJ, demonstrating that even karyotypically normal cell lines can harbor clinically relevant submicroscopic rearrangements. We show that mapping publicly available short-read, RNA-seq, and ChIP-seq data to sample-matched diploid assemblies substantially improves read alignment and enables haplotype phasing of 23%-28% of short reads. The BJ and IMR-90 assemblies and associated variant calls are publicly available as a resource for the research community.