Uncertainty quantification in epigenetic clocks via conformalized quantile regression

利用共形分位数回归对表观遗传时钟的不确定性进行量化

阅读:2

Abstract

DNA methylation (DNAm) is a chemical modification of DNA that can be influenced by various factors, including age, the environment, and lifestyle. An epigenetic clock is a predictive tool that measures biological age based on DNAm levels. It can provide insights into an individual's biological age, which may differ from their chronological age. This difference, known as the epigenetic age acceleration, may reflect health status and the risk for age-related diseases. Moreover, epigenetic clocks are used in studies of aging to assess the effectiveness of anti-aging interventions and to understand the underlying mechanisms of aging and disease. Various epigenetic clocks have been developed using samples from different populations, tissues, and cell types, typically by training high-dimensional linear regression models with an elastic net penalty. While these models can predict mean biological age based on DNAm with high precision, there is a lack of uncertainty quantification which is important for interpreting the precision of age estimations and for clinical decision-making. To understand the distribution of a biological age clock beyond its mean, we propose a general pipeline for training epigenetic clocks, based on an integration of high-dimensional quantile regression and conformal prediction, to effectively reveal population heterogeneity and construct prediction intervals. Our approach produces adaptive prediction intervals not only achieving nominal coverage but also accounting for the inherent variability across individuals. By using the data collected from 728 blood samples in 11 DNAm datasets from children, we find that our quantile regression-based prediction intervals are narrower than those derived from conventional mean regression-based epigenetic clocks. This observation demonstrates an improved statistical efficiency over the existing pipeline for training epigenetic clocks. In addition, the resulting intervals have a synchronized varying pattern to age acceleration, effectively revealing cellular evolutionary heterogeneity in age patterns in different developmental stages during individual childhoods and adolescent cohort. Our findings suggest that conformalized high-dimensional quantile regression can produce valid prediction intervals and uncover underlying population heterogeneity. Although our methodology focuses on the distribution of measures of biological aging in children, it is applicable to a broader range of age groups to improve understanding of epigenetic age beyond the mean. This inference-based toolbox could provide valuable insights for future applications of epigenetic interventions for age-related diseases.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。