Abstract
Biomass is mainly composed of cellulose, hemicellulose, and lignin, where lignin is almost one-third of the amount of biomass. Lignin is removed from the biomass matrix because its complex, recalcitrant structure acts as a physical and chemical barrier, impeding the accessibility of reagents and enzymes to cellulose and hemicellulose. Lignin sterically hinders hydrolysis and can also non-productively bind enzymes, thereby reducing overall efficiency in biofuel production. Several techniques are used to remove lignin from the biomass matrix, where ozonation is the most novel, emerging, and ecofriendly technique. This study investigates the application of machine learning (ML) techniques to predict lignin removal efficiency during ozonation pretreatment an emerging and eco-friendly delignification method. Experimental data from ozonation-based lignin removal were used to train and evaluate 19 regression models using the PyCaret framework. Among these, the Extra Trees Regressor demonstrated the highest predictive accuracy for delignification outcomes. Feature importance was further interpreted using SHapley Additive exPlanations (SHAP) to quantify the contribution of each process variable. The results reveal that ML models can effectively capture the complex relationships governing ozonation-based delignification, offering valuable insights into optimizing operational parameters. This work highlights the potential of ML as a predictive and interpretative tool in chemical engineering, paving the way for more efficient, data-driven approaches to biomass valorization and sustainable biofuel.