Abstract
BACKGROUND: Osteosarcoma (OS) is a malignant tumor originating in bones with high morbidity rates among adolescents. Current approaches for early diagnosis of OS face significant challenges. This study aimed to construct a comprehensive diagnostic model for OS. METHODS: In this study, we used the GSE16088 and GSE14359 datasets from the Gene Expression Omnibus (GEO) database and the OS dataset from the TARGET database as the training set. GSE12865 and The Cancer Genome Atlas (TCGA)-Sarcoma (SARC) dataset were used as validation. Differentially expressed genes (DEGs) between normal and OS tissues were screened. Enrichment analysis was performed to uncover the common functions and pathways of these biomarkers. Random forest models were constructed to screen out key OS-signature genes. The diagnostic model was constructed and verified by Artificial Neural Network (ANN). Finally, quantitative real time polymerase chain reaction (qRT-PCR) was used to detect the expression of 7 OS signature genes in tissues and cells. RESULTS: In this study, 2128 DEGs were screened. A random forest classifier identified seven representative gene. The OS diagnostic model we constructed has good performance with areas under the curves (AUCs) of 0.826 and 0.766 in the training and validation groups. The experimental results show that in OS tumor tissues and cells, the expression levels of CIC, SLC16A2 and GTF2A2 were significantly downregulated, while the expression levels of DCAF11, DLGAP5, PRKD3 and C5orf22 were significantly upregulated. CONCLUSIONS: Our findings identified the potential biomarkers for the diagnosis of OS, which may provide novel diagnostic strategies for OS patients. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s12672-026-04662-5.