Abstract
Missense mutations have been extensively studied in tumor-suppressing antigens (TP53) to understand oncogenesis within malignant epithelial cells. Using Whole Exome Sequencing (WXS), missense mutations can be profiled into protein sequences to identify the most common variants in tumor samples. Since most mutations arise randomly, it is necessary to isolate those that produce dysfunctional proteins within large cohorts. Using threading and generative algorithms such as AlphaFold and ColabFold, large cohorts of WXS information can be converted into computationally analyzable structures. By evaluating both high- and low-confidence regions in these structures, these antigens can be studied en masse using pipelines that generate analytical inputs for quantum chemistry analysis. We created a pipeline that processed whole-exome sequencing (WXS) data and selected 28 representative TP53 missense mutants from the TCGA-BRCA cohort for quantum-chemical feasibility analysis. These structures were systematically cleaned using tools such as OpenBabel and AmberTools, and each was prepared for Natural Population Analysis (NPA), Electrostatic Potential (ESP) calculations, and Highest and Lowest Occupied Molecular Orbital (HOMO/LUMO) evaluation within Q-Chem. Using this pipeline, population genomics can be integrated with chemoinformatics to analyze electron density concentrations and generate hypothesis-generating electronic descriptors associated with protein dysfunction. By modifying the generated inputs, additional analyses such as Fukui orbitals, chemical shifts, and Raman shifts can also be performed. This provides a computational means to probe electronic properties not readily accessible at scale using experimental techniques.