Abstract
Accurate prediction of pharmacokinetic parameters-particularly drug clearance (CL) and volume of distribution (VD)-is critical in early drug development, yet existing approaches largely treat these physiologically correlated endpoints as independent tasks and rely on single molecular data modalities. To address these gaps, we propose a Multi-Task Gradient Boosting Machine (MTGBM) framework that simultaneously predicts CL and VD by integrating three complementary molecular modalities-CNN-based structural embeddings, MLP-derived descriptor embeddings, and physicochemical and preclinical PK parameters-into shared decision trees. MTGBM outperformed single-task LightGBM baselines in MSE and R² for both targets, with CL GMFE also showing improvement, except for VD GMFE in the low VD range. These findings were further reinforced by 10 repeated random splits of the entire dataset. SHAP analysis revealed complementary contributions from all three modalities and suggested cross-target information sharing between CL and VD embeddings. These results establish MTGBM as a proof-of-concept for multi-modal, multi-task pharmacokinetic modeling, offering a foundation for future integration with larger datasets or foundation model-derived representations.