Abstract
Mid-infrared (MIR) imaging is an emerging label-free modality for classifying tissue types, including viable tumor in highly heterogeneous cancers, by assessing spatial differences in chemical composition. However, common data analysis neglects spatial vicinity and relies on time-consuming pathological insight for hotspot prediction of viable tumor areas and computational tissue type annotation. Here, we present a method that uses spatial autocorrelation on MIR projection images computed from data of selected wavenumbers found by random forest ranking for computational tissue type annotation: Interdependent data processing enabled high accuracy annotations, whereas referencing of sequentially added new samples to a hyperspectral tissue database ensured computational efficiency and scalability for larger cohorts. Applied to clinical colorectal cancer liver metastasis samples, the method matched manual pathology assessment in a double-blind study. As an option, MIR-based hotspots can be correlated with mass spectrometry imaging. This multimodal approach identified sphingomyelin isoforms as lipidomic tumor marker candidates by imaging parallel reaction monitoring-parallel accumulation serial fragmentation (iprm-PASEF) directly on tissue. Taken together, spatial autocorrelation analysis on MIR imaging data could improve automated accurate annotation of tissue morphologies of heterogeneous cancer specimens and support the discovery of spatial cancer biomarkers.