Abstract
Activity cliffs represent an important challenge to tackle in cheminformatics and drug design. One of the most common indicators to quantify them is the SALI index. Here we expose mathematical limitations of SALI's formulation, the most evident: it is undefined in instances where the similarity between two molecules is one. We show how using a simple Taylor's series can aid this main problem, yielding a defined expression that can capture the ranking information from the original SALI. The second issue to solve is the quadratic complexity of using SALI to describe the roughness of the activity landscape of a set. Here, we propose iCliff, an indicator that can quantify the roughness in linear complexity. For this, we leverage the iSIM framework to obtain the average similarity of the set and a rearrangement to obtain the average of the squared property differences. The calculations for 30 different AC-focused databases suggest that there is a strong correlation between iCliff and the average pairwise of SALI's pairwise Taylor Series. To further explore the individual effects of removing each molecule in the activity landscape, we propose complementary iCliff. With this tool, we were able to identify the molecules that have a high number of activity cliffs with the rest of the molecules in the set.