Abstract
Fake review detection is critical for maintaining trust and ensuring decision reliability across digital marketplaces and IoT-enabled ecosystems. This study presents a zero-shot framework for multi-modal anomaly detection in user reviews, integrating textual and metadata-derived signals through instruction-tuned transformers. The framework integrates three complementary components: language perplexity scoring with FLAN-T5 to capture linguistic irregularities, unsupervised reconstruction via a transformer-based autoencoder to identify structural deviations, and semantic drift analysis to measure contextual misalignment between task-specific and generic embeddings. To enhance applicability in sensor-driven environments, the framework incorporates device-level metadata such as timestamps, product usage logs, and operational signals to enable cross-validation between unstructured text and structured sensor features. A unified anomaly score fusing textual and sensor-informed signals improves robustness under multi-modal detection scenarios, while interpretability is achieved through token-level saliency maps for textual analysis and feature-level attributions for sensor metadata. Experimental evaluations on the Amazon Reviews 2023 dataset, supplemented by metadata-rich sources including the Amazon Review Data 2018 and Historic Amazon Reviews (1996-2014) datasets demonstrate strong zero-shot performance (AUC up to 0.945) and additional few-shot adaptability under limited supervision (AUC > 0.95), maintaining stable precision-recall trade-offs across product domains. The proposed framework provides real-world impact by enabling real-time, multi-modal fake review detection in IoT-driven platforms and smart spaces, supporting consumer trust, automated decision-making, and transparent anomaly detection in sensor-enhanced digital ecosystems.