Abstract
Objective: High-throughput biological data, with its vast complexity and higher dimensions, continues to require innovative analytic methodologies for meaningful exploration. Most methods for reducing data dimensions overlook the shape and topology of data, even though these are vital components of the data structure and complexity. This study leverages topological data analysis (TDA) and shows, using breast cancer (BC) gene expression data as an illustrative example, the power of including the shape of data. Results: In addition to delineating the known subtypes of BC, TDA identifies a new subtype within luminal B cancer along with the features that define the subtype. The final outcome is shown via three-dimensional (3D) scatter plots which demonstrate how the underlying patterns that we identified through TDA map to 3D space. Conclusions: The new subtype, obtained unsupervised and validated by prior knowledge, demonstrates the power of embedding the topology and shape of data in the analyses.