Abstract
OBJECTIVES: Higher education students, particularly those from underrepresented backgrounds, experience heightened levels of anxiety, depression, and burnout. Clinical informatics approaches leveraging K-means clustering can aid in mental health risk stratification, yet they often exacerbate disparities. We present a socially fair clustering framework that ensures equitable clustering costs across demographic groups while minimizing within-cluster variability. MATERIALS AND METHODS: Our framework compares standard and socially fair K-means clustering to assess the impact of demographic disparities. It identifies factors affecting clustering across demographics using omnibus and post hoc statistical tests. Subsequently, it quantifies the influence of statistically significant factors on cluster development. We illustrate our approach by identifying racially equitable clusters of mental health among students surveyed by the Healthy Minds Network. RESULTS: The socially fair clustering approach reduces disparities in clustering costs by as much as 30% across racial groups while maintaining consistency with standard K-means solutions in socioeconomically homogenous populations. Discrimination experiences were the strongest indicator of poorer mental health, whereas stable financial conditions and robust social engagement promoted resilience. DISCUSSION: Integrating fairness constraints into clustering algorithms reduces disparities in risk stratification and provides insights into socioeconomic drivers of student well-being. Our findings suggest that standard models may overpathologize middle-risk cohorts, whereas fairness-aware clustering yields partitions that better capture disparities. CONCLUSION: Our work demonstrates how integrating fairness-aware objectives into clustering algorithms can enhance equity in partitioning systems. The framework we present is broadly applicable to clustering problems across various biomedical informatics domains.