Abstract
Achieving optimal target activity while maintaining synthetic accessibility and drug-likeness represents a major challenge in computational drug discovery. Existing de novo generative models often yield chemically invalid or synthetically intractable structures and struggle to optimize multiple objectives simultaneously. Here, we introduce ALCHIMIA, an interpretable hybrid framework combining reinforcement learning (RL) and a genetic algorithm (GA), built based on a vocabulary of 33 medicinal chemistry-inspired molecular transformations. The RL component trains a policy network to prioritize transformation sequences that improve synthetic accessibility (SA) and the quantitative estimate of drug-likeness (QED) scores, embedding these constraints directly into molecular generation. The GA component applies the learned policy as a mutational operator within population-based optimization guided by molecular docking, enabling the exploration of diverse chemical lineages while converging toward high-affinity ligands. ALCHIMIA was applied to two different pharmacologically relevant targets: human Cannabinoid Receptor 2 (CB2R) and human Sigma nonopioid intracellular Receptor 1 (S1R). We considered three different scenarios: (i) unconstrained hit identification; (ii) scaffold-constrained lead optimization; and (iii) design of dual modulators. The framework generated chemically valid molecules with QED and SA scores comparable to or better than those obtained with random baselines and selected de novo design methods. By codifying typical medicinal chemistry actions as learnable transformations and coupling multiobjective optimization with GA-based diversity maintenance, ALCHIMIA, freely available as a GitHub repository (https://github.com/alberdom88/ALCHIMIA), provides a practical, interpretable, and scalable framework for molecular de novo design.