Abstract
Human carbonic anhydrase (hCA) isoforms IX and XII are promising anticancer targets. Yet, their selective inhibition remains elusive due to close similarity with the abundant hCA II, whose off-target inhibition causes harmful side effects. Here, we introduce an interpretable machine learning framework to predict inhibition across hCA II, IX, and XII. To address this issue, our approach combines rigorous data curation, systematic benchmarking of classical and deep learning models, and integration of conformal prediction for uncertainty quantification with counterfactual explanations for molecular interpretability. After extensive benchmarking, we find that Support Vector Machines with extended-connectivity fingerprints consistently outperform more complex models, underscoring the importance of data quality and validation over algorithmic complexity. Here, conformal prediction provides rigorous activity estimation, while counterfactual analysis rationalizes structural features governing isoform selectivity, together enabling interpretable guidance for inhibitor design. To further test our model capability, we examine it on SLC-0111, as a selective inhibitor, which leads to a compatible result with the experiment. Our model reiterates experimental findings that modifications in the tail region strongly affect molecular selectivity, emphasizing the tail group as a key structural determinant for differentiating inhibitor activity among hCA isoforms II, IX, and XII. To facilitate adoption, we also release CAInsight, a user-friendly software with a graphical interface for virtual screening and generative design of a selective hCA inhibition.