Abstract
BACKGROUND: Primary Sjögren's Syndrome (pSS) exhibits significant clinical heterogeneity, and traditional organ-based classification systems fail to capture the underlying disease mechanisms. This study aims to identify distinct clinical immune phenotypes of pSS through a data-driven approach and explore their predictive biomarkers. METHOD: This cross-sectional study included 1087 patients who met the 2016 ACR/EULAR classification criteria for primary Sjögren's syndrome between 2014 and 2024. Unsupervised K-means clustering analysis was applied to 10 organ involvement variables to identify natural patient subgroups. Network analysis was used to explore the associations between organ involvement and laboratory biomarkers. Multivariable logistic regression was employed to identify independent predictors of subgroup assignment, and restricted cubic spline analysis was conducted to assess the nonlinear relationships between key biomarkers and subtype risk. RESULTS: Clustering analysis identified two distinct phenotypes: Phenotype 1 (multi-system inflammatory subtype, n = 594) was characterized by widespread musculoskeletal involvement (100 %) and significantly elevated inflammatory markers (RF: 246.41 ± 1177.49 vs 32.75 ± 126.74 IU/mL, P < 0.001); Phenotype 2 (glandular-limited high immunoglobulin subtype, n = 493) was primarily characterized by glandular involvement (40.7 %), higher IgG levels, and less systemic involvement. Network analysis revealed a strong correlation between RF and musculoskeletal involvement (r = 0.32, P < 0.001). Independent predictors of Phenotype 1 included male gender (OR 2.559, 95 % CI 1.109-6.090), elevated potassium (OR 1.607, 95 % CI 1.061-2.433), and elevated RF levels (OR 1.004, 95 % CI 1.002-1.005). A composite clinical prediction score incorporating these biomarkers achieved an AUC of 0.717 (95 % CI: 0.684-0.751) for phenotype discrimination. Nonlinear analysis showed complex U-shaped and inverted U-shaped relationships between key biomarkers and phenotype risk. CONCLUSION: pSS consists of distinct clinical phenotypes with varying pathophysiological characteristics. The data-driven classification system complements traditional severity grading and provides new insights into precision medicine approaches. RF is a key biomarker linking musculoskeletal manifestations with the severity of systemic inflammation and may serve as an important indicator for precise subtyping and targeted therapy.