Abstract
BACKGROUND: Rheumatoid arthritis (RA) and osteoarthritis (OA) are prevalent joint diseases with overlapping clinical manifestations but distinct pathogenesis and treatment strategies. Misclassification may lead to inappropriate management. Therefore, accurate molecular discrimination between RA and OA is important. This study aimed to identify diagnostic genes associated with RA, with a particular emphasis on distinguishing RA from OA using integrated bioinformatics and machine learning approaches. METHODS: Public GEO transcriptomic datasets were analyzed to identify differentially expressed genes (DEGs) between the RA group and comparison groups. LASSO and SVM-RFE algorithms were applied for feature selection. Immune cell infiltration was estimated using the ssGSEA algorithm. A protein-protein interaction (PPI) network and transcription factor analysis were performed to explore potential regulatory mechanisms. Drug sensitivity analysis based on CellMiner IC50 data was conducted as an exploratory approach. In vitro validation was performed using TNF-α-stimulated HFLS-RA cells, followed by RT-qPCR analysis. RESULTS: Three key genes-EPYC, MAGED1, and LAP3-were identified as overlapping features between the LASSO and SVM-RFE models. ROC analysis demonstrated good discriminatory performance (AUC > 0.85). EPYC and LAP3 were associated with immune cell infiltration patterns. TNF-α stimulation significantly modulated the mRNA expression of these genes in the HFLS-RA cells. CONCLUSION: EPYC, MAGED1, and LAP3 are inflammation-associated genes with potential diagnostic relevance in RA. Further validation in larger independent cohorts and protein-level studies is needed to confirm their clinical applicability.