Abstract
A positive and unlabeled machine learning (PU learning) model was trained to predict substrate reactivity in the oxidative homocoupling of phenols under different conditions. We demonstrated its effectiveness by conducting validation using two descriptor sets: 28-dimensional descriptors considered to influence reactivity and extended-connectivity fingerprints. We performed parameter tuning of the model using our experimental data and determined that the optimized parameters provided excellent prediction accuracy for the existing experimental data, regardless of the reaction conditions. Furthermore, the prediction results obtained using 30 types of unlabeled data matched the experimental results for approximately 83.3-86.7% of substrates, and the prediction accuracy of the PU learning model was shown to be superior to that of a model trained with both positive and negative reactivity data. Because negative data are not required to train a PU learning model, it can be applied to reactions reported in many previous studies, informing the cost-effective synthesis of molecules based on model-predicted results.