Abstract
As edge computing becomes increasingly central to modern digital infrastructure, it also creates opportunities for sophisticated malware attacks that traditional security systems struggle to address. This study proposes a natural language processing (NLP) framework integrated with ensemble learning into next-generation firewalls (NGFWs) to detect and mitigate malware attacks in edge computing environments. The approach leverages unstructured threat intelligence (e.g., cybersecurity reports, logs) by applying NLP techniques, such as TF-IDF vectorization, to convert textual data into structured insights. This process uncovers hidden patterns and entity relationships within system logs. By combining Random Forest (RF) and Logistic Regression (LR) in a soft voting ensemble, the proposed model achieves 95% accuracy on a cyber threat intelligence dataset augmented with synthetic data to address class imbalance, and 98% accuracy on the CSE-CIC-IDS2018 dataset. The study was validated using ANOVA to assess statistical robustness and confusion matrix analysis, both of which confirmed low error rates. The system enhances detection rates and adaptability, providing a scalable defense layer optimized for resource-constrained, latency-sensitive edge environments.