Abstract
This research conducts a comparative analysis of nine Machine Learning (ML) models for temperature and humidity prediction in Photovoltaic (PV) environments. Using a dataset of 5,000 samples (80% for training, 20% for testing), the models-Support Vector Regression (SVR), Lasso Regression, Ridge Regression (RR), Linear Regression (LR), AdaBoost, Gradient Boosting (GB), Decision Tree (DT), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost)-were evaluated based on Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R²). For temperature prediction, XGBoost demonstrated the best performance, achieving the lowest MAE of 1.544, the lowest RMSE of 1.242, and the highest R² of 0.947, indicating strong predictive accuracy. Conversely, SVR had the weakest performance with an MAE of 4.558 and an R² of 0.674. Similarly, for humidity prediction, XGBoost outperformed other models, achieving an MAE of 3.550, RMSE of 1.884, and R² of 0.744, while SVR exhibited the lowest predictive power with an R² of 0.253. This comprehensive study serves as a benchmark for the application of ML models to environmental prediction in PV systems, a research area that is relatively important. Notably, the results underscore the performance advantage of ensemble-based approaches, especially for XGBoost and RF compared to simpler, linear-based methods such as LR and SVR, when it comes to well-dispersed environmental interactions. The proposed machine-learning based power generation analysis approach shows significant improvements in predictive analytics capabilities for renewable energy systems, as well as a means for real-time monitoring and maintenance practices to improve PV performance and reliability.