Abstract
This study explores using unsupervised deep learning to find a low-dimensional representation of infrared molecular fingerprints of human blood. We developed a fully convolutional denoising autoencoder to process Fourier transform infrared (FTIR) spectroscopy data, aiming to condense the spectra into a set of latent variables. By utilizing the autoencoder's bottleneck architecture and a custom loss function, we effectively reduced noise while retaining essential molecular information. This method improved lung cancer detection accuracy by 2.6 percentage points in a case-control study. The resulting latent space not only compacts spectral data, but also highlights variables linked to disease presence, offering potential for improving diagnostics. Trial Registration: German Clinical Trials Register (DRKS): DRKS00013217.