Abstract
Integrating machine learning (ML) with Statistical Process Control (SPC) is important for Industry 4.0 environments. Contemporary manufacturing data exhibit high-dimensionality, autocorrelation, non-stationarity, and class imbalance, which challenge classical SPC assumptions. This systematic review, conducted following the PRISMA 2020 guidelines, provides a problem-driven synthesis that links these data challenges to corresponding methodological families in ML-based SPC. Specifically, we review approaches for (1) high-dimensional and redundant data (dimensionality reduction and feature selection), (2) autocorrelated and dynamic processes (time-series and state-space models), and (3) data scarcity and imbalance (cost-sensitive learning, generative modeling, and transfer learning). Nonlinearity is treated as a cross-cutting property within each category. For each, we outline the mathematical rationale of representative algorithms and illustrate their use with industrial examples. We also summarize open issues in interpretability, thresholding, and real-time deployment. This review offers structured guidance for selecting ML techniques suited to complex manufacturing data and for designing reliable online monitoring pipelines.