Abstract
This study aimed to identify the clinical metabolic markers associated with gynecologic cancer (GC) by comparing machine learning (ML) algorithms with orthogonal partial least squares-discriminant analysis (OPLS-DA). Untargeted metabolomic analysis was performed on plasma from 42 patients with GC (24 cervical cancer [CC], 9 endometrial cancer [EC], and 9 ovarian cancer [OC]) and 57 healthy female participants. GC and healthy control groups were classified using OPLS-DA and 8 ML algorithms. The ML algorithm with the best classification performance was used to assess CC, EC, and OC with healthy subjects, and metabolite candidates involved in each GC were selected. Upon comparing the classification model performance between the GC and control groups, random forest (RF) model displayed the best performance with an area under the curve (AUC) of 0.9999. The multi-classification RF model was established to distinguish all 4 groups and was achieved an AUC of 0.8351. The AUCs of the 3 GC subgroup assessment RF models comparing patients with CC, EC, and OC with healthy subjects were 0.9838, 0.7500, and 0.7321, respectively. Plasma concentrations of 2 identified metabolites significantly increased in patients with GCs. Several ML algorithms were used to distinguish GC, showed better performance than conventional OPLS-DA. Proline betaine and lysophosphatidyl ethanolamine (18:0/0:0) selected in RF models were suggested as metabolite candidates associated with GCs.