Abstract
Clustering complex data structures remains a pivotal challenge in unsupervised learning, particularly when determining the optimal number of clusters in highly non-linear datasets. In this paper, we introduce RISM (Relative Density and Inter-Cluster Connectivity Degree-based Split-and-Merge), a novel clustering algorithm that integrates both density-based and connectivity-driven principles to automatically infer the optimal cluster configuration. RISM operates in two distinct phases: splitting and merging. During the splitting phase, we define a new relative density metric, which evaluates the local density of each data point in relation to its k-nearest neighbors, allowing for the identification of potential cluster centers. Subclusters are then formed based on both relative density and relative distance, effectively capturing dense regions in the data. In the merging phase, we propose a connectivity-aware inter-cluster distance measure that incorporates both inter-cluster distance factors and inter-cluster connectivity degrees, enabling a principled and iterative merging of clusters based on their proximity and structural relationships. The final number of clusters is determined by maximizing the difference in inter-cluster distances observed throughout the merging process. Extensive empirical evaluations on both synthetic and real-world datasets demonstrate the superiority of RISM over nine state-of-the-art clustering algorithms, particularly in terms of clustering accuracy, robustness to noise, and scalability.