Abstract
Automobile insurance fraud poses a significant challenge for insurers, leading to substantial financial losses through fabricated claims and exaggerated damages. Traditional machine learning approaches often struggle with high-dimensional, imbalanced data and limited interpretability, reducing their practical applicability. To address these issues, we propose a penalty-driven feature selection method with particle swarm optimization, which penalizes highly correlated features to improve model generalization and maintain interpretability. The method was evaluated on the real-world "Angoss carclaims" dataset, comprising 33 features and 15,420 records, and balanced using the synthetic minority oversampling technique. Eleven machine learning classifiers, including random forest, support vector machine, K-nearest neighbors, logistic regression, decision tree, artificial neural networks, gradient boosting, adaptive boosting, categorical boosting, light gradient boosting machine, and stacking classifier were tested, including ensemble and boosting methods, with hyperparameters tuned via grid search and assessed under four threshold values (α = 0.85, 0.75, 0.65, 0.50). The Stacking Classifier achieved the most reliable performance, reaching 97.55% accuracy with a balanced F1-score of 0.9754 when the feature set was reduced to 16 at α = 0.65. These findings demonstrate that the proposed framework effectively balances predictive accuracy with interpretability, offering a practical tool for fraud detection in insurance analytics.