Abstract
Effective prediction of blood-brain barrier (BBB) permeability remains essential for central nervous system drug development. This study evaluates multiple supervised machine learning models using a public dataset of permeable and non-permeable compounds. Random Forest models demonstrate optimal balance between accuracy and generalizability, outperforming more complex gradient boosting methods that were prone to overfitting. Feature analysis identifies NH/OH and NO group counts as key determinants of passive diffusion, with reduced hydrogen bond donor and heteroatom counts enhancing permeability. Additionally, model performance deteriorates at NH/OH count = 3, establishing this as a decision boundary where hydrogen bonding complexity disrupts reliable prediction. This study shows the non-linear structure-permeability relationships that challenge traditional descriptor-based approaches, while demonstrating that machine learning can simultaneously provide both accurate prediction and applicable insights for drug discovery applications.