Abstract
INTRODUCTION: Tracing the histological origin of metastatic renal cancer (MRC) and locating the pathological root cause lead to precise treatment and improved prognosis. MATERIAL AND METHODS: A total of 3336 patient cases with clear tissue origins from The Cancer Genome Atlas (TCGA) database were screened as experimental data material and feature selection was performed using the differential expression method; the random forest (RF) algorithm was improved to establish a medical retrospective heterogeneous filtered feature selection random forest weighted (ReliefFk_RFw) model to locate tissue origins. RESULTS: The differential expression analysis method screened 60 signature genes with good differential expression for tracing tissue origins (kidney renal clear cell carcinoma, kidney renal papillary cell carcinoma, lung adenocarcinoma, lung squamous cell carcinoma, liver hepatocellular carcinoma, pancreatic adenocarcinoma). Compared with traditional machine learning (support vector machine, decision tree, RF) models, the ReliefFk_RFw algorithm increased the average accuracy from 98.65%, 98.79% and 98.57% to 99.53%, the average precision from 95.58%, 96.40% and 96.54% to 99.36%, and the average sensitivity from 97.03%, 96.61% and 96.76% to 98.89%, mean specificity from 99.50%, 99.39% and 99.35% to 99.90%, and mean F1 score from 96.30%, 96.50% and 96.64% to 99.11%. The highest accuracy in localizing the origin of primary pancreatic cancer was achieved with 100.00% for different models of retrospective metrics. CONCLUSIONS: The improved ReliefFk_RFw model is best for comprehensive assessment and can be used to trace the origin of MRC tissue to assist physicians in diagnosis and treatment.