Abstract
Complex event detection (CED) adds value to camera stream data in various applications such as workplace safety, task monitoring, security, and health. Recent CED frameworks have addressed the issues of limited spatiotemporal labels and costly training by decomposing the CED into low-level features, as well as spatial and temporal relationship extraction. However, these frameworks suffer from high resource costs, low scalability, and an increased number of false positives and false negatives. This paper proposes GICEDCAM, which distributes CED across edge, stateless, and stateful layers to improve scalability and reduce computation cost. Additionally, we introduce a Spatial Event Corrector component that leverages geospatial data analysis to minimize false negatives and false positives in spatial event detection. We evaluate GICEDCAM on 16 camera streams covering four complex events. Relative to a strong open-source baseline configured for our setting, GICEDCAM reduces end-to-end latency by 36% and total computational cost by 45%, with the advantage widening as objects per frame increase. Among corrector variants, Bayesian Network (BN) yields the lowest latency, Long Short-Term Memory (LSTM) achieves the highest accuracy, and trajectory analysis offers the best accuracy-latency trade-off for this architecture.