Abstract
Customer churn prediction is a crucial application of machine learning in business analytics. This article presents a controlled benchmarking of a multilayer perceptron model trained with one-hot encoding and standard scaling using 10,000 customer records with 12 features. Data pre-processing was performed using one-hot encoding and standard scaling to improve model generalisation. The model achieved an ROC AUC of 0.8640 with a recall of 0.4178 and an F1 score of 0.5534. The measure of correct predictions among cases labelled positive (precision) was relatively high at 0.8214 ± 0.0253. The moderately low false-positive rate indicates that the model rarely misclassifies non-churners as churners, which is very important for cost-efficient customer retention programs. A group-level heterogeneity review showed that model performance, measured by BRIER scores, was best for France (0.076439), followed by Spain (0.077394), and worst for Germany (0.106267). The model successfully identified the 7 most important fields of judgement, with permutation scores ranging from 0.000147 to 0.120842. The calibration test helps identify underconfident and overconfident model performance through quantile bins. The performance is evaluated using a bin range from 0.12 to 0.25, with a trustworthy prediction for low- to mid-risk customers. A comparative analysis with basis models of machine learning has shown that the best overall accuracy (0.8720) and balanced F1-score (0.6098) were obtained using the Gradient Boosting algorithm, with the best performance, while the proposed Neural Network with categorical encoding and standard scaling attained the highest precision (0.8528), effectively minimising false positives in churn detection. Although the PR_AUC was slightly lower (0.7140), the model had better recall (0.4128) and an equal ROC_AUC (0.8657), indicating the proposed model’s strength and ability to operate on complex, high-dimensional churn data.