Abstract
Despite the widespread adoption of network monitoring tools, Managed Service Providers (MSPs), specifically small- and medium-sized enterprises (SMEs), continue to face persistent challenges in achieving predictive, multi-tenant-aware visibility across distributed client networks. Existing monitoring systems lack integrated predictive analytics and edge intelligence. To address this, we propose an AI- and IoT-driven monitoring and visualisation framework that integrates edge IoT nodes (Raspberry Pi Prometheus modules) with machine learning models to enable predictive anomaly detection, proactive alerting, and reduced downtime. This system leverages Prometheus, Grafana, and Mimir for data collection, visualisation, and long-term storage, while incorporating Simple Linear Regression (SLR), K-Means clustering, and Long Short-Term Memory (LSTM) models for anomaly prediction and fault classification. These AI modules are containerised and deployed at the edge or centrally, depending on tenant topology, with predicted risk metrics seamlessly integrated back into Prometheus. A one-month deployment across five MSP clients (500 nodes) demonstrated significant operational benefits, including a 95% reduction in downtime and a 90% reduction in incident resolution time relative to historical baselines. The system ensures secure tenant isolation via VPN tunnels and token-based authentication, while providing GDPR-compliant data handling. Unlike prior monitoring platforms, this work introduces a fully edge-embedded AI inference pipeline, validated through live deployment and operational feedback.