Abstract
Optimizing organic photovoltaic (OPV) performance requires navigating the high-dimensional, interdependent processing parameters governing bulk heterojunction morphology. To address this, we have constructed a standardized database integrating donor/acceptor pairs, nine key fabrication parameters, and device efficiencies, consolidating over a decade of experimental results. Leveraging this resource, we developed a three-tiered machine learning framework using gradient boosting regression trees. The strategy progresses from single-parameter baseline models to stage-combined models that capture intraprocess synergies, culminating in a global nine-parameter optimization model. This final model achieves a Pearson correlation of >0.9 and a success rate of >80% in identifying optimal multiparameter configurations. Validation on 78 external systems, each containing a previously unseen donor or acceptor, demonstrates robust generalization with >75% accuracy in predicting the optimal or secondary condition for individual parameters. This work establishes a practical, data-driven framework for accelerating the rational optimization of OPV photoactive layers.