Abstract
To address the limited accuracy of speech-based Alzheimer's Disease (AD) screening and the shortage of paired multimodal data, this paper proposes a detection framework based on feature alignment and Rectified Flow-driven latent representation generation. The EEG dataset consists of 36 AD patients and 29 Healthy Controls (HC). The speech dataset contains 399 samples, which include 114 AD cases, 132 Mild Cognitive Impairment (MCI) cases, and 153 HC cases. We extracted multidimensional features of EEG signals, such as time-domain and frequency-domain characteristics, alongside behavioral representations of speech. A heterogeneous alignment network was used to map these features into a common semantic subspace, where an adaptive interpolation strategy reconstructed the missing pathological trajectories of MCI within the latent space. On this basis, a conditional Rectified Flow model was introduced to learn the optimal transport mapping from speech to EEG. This model generated physiological-information-rich latent representations to compensate for semantic gaps. Experimental results showed that the fused features from speech and latent representations achieved a three-class classification accuracy of 89.08%, a precision of 88.77%, and a recall of 88.71%. This performance represented an accuracy improvement of 9.28% compared with the speech-based baseline system. Our method combines the convenience of speech screening with the high reliability of neurophysiological signals, and it provides a new approach for low-cost early detection of AD.