Abstract
OBJECTIVES: To develop a machine learning method that estimates future liver biomarkers' values from longitudinal lifestyle (diet, activity) data for early detection of nonalcoholic steatohepatitis (NASH). MATERIALS AND METHODS: The method in this study is developed utilizing the nonalcoholic fatty liver disease adult dataset, by National Institute of Diabetes and Digestive and Kidney Diseases, a real-world dataset representative of common electronic health records in the United States. We have developed time-series Machine Learning/Deep Learning and tree-based models to forecast future values for liver biomarkers, identified the minimum requirement of initial data points for optimal forecasting performance, and developed time-series classifier models for detecting NASH from longitudinal lifestyle data and initial biomarker values. RESULTS: Our experiments show that lifestyle-informed forecasting models, such as Attention-long short-term memory and TimeSeriesForestRegressor accurately predict future biomarker trajectories with as few as 2 observed timepoints (prediction error as low as 0.62), and NASH classifiers trained on these Forecasting liver Biomarkers (FoBi) estimated biomarkers achieve performance (accuracy 86%) comparable to or exceeding existing biopsy-aligned methods. DISCUSSION: The proposed approach, FoBi, is the first method to forecast liver biomarker trajectories from lifestyle data and demonstrate that both observed and model-estimated biomarkers can support effective NASH detection in real-world clinical settings. CONCLUSION: Lifestyle-driven biomarker forecasting offers a promising, minimally invasive foundation for early NASH detection and long-term disease management, reducing dependence on frequent laboratory testing and biopsy-aligned measurements.