Abstract
Ewing sarcoma refers to a family of bone tumors primarily characterized by various fusion gene types, with EWSR1-FLI1 being the most common. These fusion genes lead to the development of tumors growth over time. While most fusion gene detection tools rely on RNA-based data, few utilize genomic data. Existing methodologies face challenges in sensitivity and accuracy of detection. To improve genomic understanding, a novel de novo assembly-based analysis tool, DenovoFusion, was developed. DenovoFusion is a specialized tool designed to detect DNA-level chimeric events and address current limitations in accurate breakpoint identification. The complex fusion detection pipeline includes contig assembly, alignment, and realignment steps that can be customized to the user's needs, focusing on identifying potential fusion breakpoints. After validation using simulated datasets and 100 Ewing sarcoma DNA sequencing datasets from a study at the Curie Institute in Paris, this approach was compared with the approaches of related tools such as HMFtools, Genefuse and FACTERA. In the analysis of 100 samples, DenovoFusion demonstrated superior accuracy and a false-positive rate of zero when compared to FACTERA. It also performed comparably to methods that utilize predefined fusion lists. By examining ETS family-related fusions in all samples, known fusions were validated, and rare breakpoint variants, including novel ones within the intronic region between EWSR1 exon 11 and FLI1 exon 6, were discovered. The novel assembly-based approach for fusion gene detection improves accuracy in short-read sequencing data, offers a customizable research platform, and shows promise for future applications with long-read sequencing technologies.