Abstract
The fusion of Artificial Intelligence and the Internet of Things (AI-IoT, also widely referred to as AIoT) offers transformative potential for smart cities, yet presents a critical challenge: how to process heterogeneous data streams from intelligent sensing-particularly crowd sensing data derived from citizen interactions like text, voice, and system logs-into reliable intelligence for sustainable urban governance. To address this challenge, we introduce the Intelligent Multimodal Ticket Processing System (IMTPS), a novel AI-IoT smart system. Unlike ad hoc solutions, the novelty of IMTPS resides in its theoretically grounded architecture, which orchestrates Information Theory and Game Theory for efficient, verifiable extraction, and employs Causal Inference and Meta-Learning for robust reasoning, thereby synergistically converting noisy, heterogeneous data streams into reliable governance intelligence. This principled design endows IMTPS with four foundational capabilities essential for modern smart city applications: Sustainable and Efficient AI-IoT Operations: Guided by Information Theory, the IMTPS compression module achieves provably efficient semantic-preserving compression, drastically reducing data storage and energy costs. Trustworthy Data Extraction: A Game Theory-based adversarial verification network ensures high reliability in extracting critical information, mitigating the risk of model hallucination in high-stakes citizen services. Robust Multimodal Fusion: The fusion engine leverages Causal Inference to distinguish true causality from spurious correlations, enabling trustworthy integration of complex, multi-source urban data. Adaptive Intelligent System: A Meta-Learning-based retrieval mechanism allows the system to rapidly adapt to new and evolving query patterns, ensuring long-term effectiveness in dynamic urban environments. We validate IMTPS on a large-scale, publicly released benchmark dataset of 14,230 multimodal records. IMTPS demonstrates state-of-the-art performance, achieving a 96.9% reduction in storage footprint and a 47% decrease in critical data extraction errors. By open-sourcing our implementation, we aim to provide a replicable blueprint for building the next generation of trustworthy and sustainable AI-IoT systems for citizen-centric smart cities.