Abstract
Brain age is widely regarded as a powerful marker of general brain health. Brain age models are typically trained on large datasets to predict chronological age, which may offer advantages in predicting specific health outcomes, much like the success of finetuning large language models for specific applications. However, it is also well accepted that machine learning models trained to directly predict specific outcomes (i.e., direct models) often outperform those trained on surrogate objectives. Therefore, despite their much larger training data, it is unclear whether brain age models outperform direct models in predicting specific brain health outcomes. Here, we compare large-scale brain age models (pretrained on 53,542 participants) and direct models for predicting specific health outcomes related to Alzheimer's disease (AD) dementia. Using anatomical T1 scans from three continents (N = 1,848), we find that summarizing brain age with a single scalar (i.e., brain age gap) led to poor prediction performance. Using higher-dimensional intermediate representations of brain age models led to better prediction, but was still worse than direct models without finetuning. Using intermediate representations of finetuned brain age models was necessary to achieve similar performance to direct models. Overall, our results do not discount brain age as a useful marker of general brain health but suggest that using chronological age as a pretraining target might be suboptimal for predicting specific health outcomes.