Abstract
The increasing complexity of industrial environments requires the development of real-time hazard detection and environmental monitoring using intelligent robotic systems. This paper introduces RoboFusion, an integrated framework that combines Autonomous Mobile Robots (AMRs), fixed sensing nodes, and a novel hybrid dataset generation pipeline for data-driven industrial safety. Deployed in a functioning industrial testbed, RoboFusion collected real-time telemetry over 180 days using four sensor suites: two fixed units and two additional units mounted on Near-Field Communication (NFC) guided AMRs, each equipped with 12 sensors sampling at one-minute intervals. This deployment yielded approximately one million multi-modal sensor records, including temperature, humidity, gas concentrations, air quality, and pressure. Data streams were processed onboard using ESP32 microcontrollers, and they were transmitted via Message Queuing Telemetry Transport (MQTT) to an Internet of Things (IoT) cloud platform. The scarcity and imbalance of hazard events in real collected data create a challenge for effective model training. RoboFusion addresses this issue through a structured synthetic dataset generation framework. This framework augments non-hazardous data using statistical augmentation techniques, and it simulates hazardous data through multi-phase curve fitting, spatial propagation modeling, and location-aware hazard scenarios. The resulting synthetic dataset improves coverage of rare and safety-critical scenarios while maintaining consistency with real-world dynamics. Evaluation across four machine learning models, namely Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Multi-Layer Perceptron (MLP), demonstrates significant cross-domain gains. As an example, hazard F1 scores improved from 0.47 to 0.85 for the RF model, and from 0.16 to 0.79 for the SVM model when models trained on synthetic data were tested against real hazard events. RoboFusion therefore delivers a reproducible robotic sensing platform and an openly accessible hybrid dataset. It introduces a novel approach to hazard simulation that mimics real-world hazards and supports the development of resilient Artificial Intelligence (AI) systems for industrial hazard detection and autonomous safety intelligence.