Abstract
This study investigates utilization of machine learning for the regression task of predicting the size of PLGA (Poly lactic-co-glycolic acid) nanoparticles. Various inputs including category and numeric were considered for building the model to predict the optimum conditions for preparation of nanosized PLGA particles for drug delivery applications. The proposed methodology employs Leave-One-Out (LOO) for categorical feature transformation, Local Outlier Factor (LOF) for outlier detection, and Bat Optimization Algorithm (BA) for hyperparameter optimization. A comparative analysis compares K-Nearest Neighbors (KNN), ensemble methods such as Bagging and Adaptive Boosting (AdaBoost), and the novel Small-Size Bat-Optimized KNN Regression (SBNNR) model, which uses generative adversarial networks and deep feature extraction to improve performance on sparse datasets. Results demonstrate that ADA-KNN outperforms other models for Particle Size prediction with a test R² of 0.94385, while SBNNR achieves superior accuracy in predicting Zeta Potential with a test R² of 0.97674. These findings underscore the efficacy of combining advanced preprocessing, optimization, and ensemble techniques for robust regression modeling. The contributions of this work include the development of the SBNNR model, validation of BA's optimization capabilities, and a comprehensive evaluation of ensemble methods. This method provides a reliable framework for using machine learning in material science applications, particularly nanoparticle characterization.