Abstract
IgA nephropathy (IgAN) is the most common primary glomerulonephritis, requiring improved diagnostic tools. We analyzed three cohorts (GSE37460, GSE93798, and GSE115857 are internal validation cohorts) using gene set enrichment analysis on 7751 pathways. A machine learning model was developed and externally validated in multi-cohort gene expression data (external validation cohorts are GSE99339, GSE116626, and GSE104948). Additionally, immunohistochemistry was performed to validate the expression of key biomarkers and the presence of functionally active immune cells. We developed and validated a multi-cohort machine learning diagnostic model. The selected two-step glmBoost + Enet [alpha = 0.4] model achieved high concordance in GSE37460 (κ = 0.704, p < 0.001), GSE93798 (κ = 0.486, p < 0.001), and GSE115857 (κ = 1.000, p < 0.001). Applying the same model to the external validation cohorts demonstrated strong diagnostic accuracy with AUCs of 0.938 (GSE99339), 0.871 (GSE116626), and 0.926 (GSE104948); the corresponding Kappa statistics were κ = 0.699 (p < 0.001), 0.443 (p = 0.018), and 0.615 (p = 0.023). Nine genes were identified as significant for the diagnosis of IgAN, and HLA-DRA and VASH1 emerged as robust biomarkers across the cohorts (all p < 0.05). Additionally, immunohistochemistry validation demonstrated a marked increase in HLA-DRA and VASH1 expression in IgAN patients. Immunofluorescence staining indicated a greater presence of CD4 + HLA-DRA+ functionally active/activated CD4+ T cells in IgAN tissues than in controls. This study delivers a reproducible 9-gene machine-learning classifier for precise IgA nephropathy diagnosis and highlights HLA-DRA and VASH1 as promising biomarkers.