Abstract
The rapid proliferation of Internet of Things (IoT) devices across industries has created a need for robust, scalable, and real-time data processing architectures capable of supporting intelligent analytics and predictive maintenance. This paper presents a novel comprehensive architecture that enables end-to-end processing of IoT data streams, from acquisition to actionable insights. The system integrates Kafka-based message brokering for the high-throughput ingestion of real-time sensor data, with Apache Spark facilitating batch and stream extraction, transformation, and loading (ETL) processes. A modular machine-learning pipeline handles automated data preprocessing, training, and evaluation across various models. The architecture incorporates continuous monitoring and optimization components to track system performance and model accuracy, feeding insights to users via a dedicated Application Programming Interface (API). The design ensures scalability, flexibility, and real-time responsiveness, making it well suited for industrial IoT applications requiring continuous monitoring and intelligent decision-making.