Abstract
Infrared image colorization has gained widespread attention in recent years as an important means of enhancing image visibility and semantic expression. However, existing evaluation methods mostly rely on pixel-level differences or feature distribution distances, failing to comprehensively reflect the usability of colorization results in practical tasks. To address this, we propose a task-oriented colorization quality evaluation metric called Recognition-Task based Detection Score (RDS), which uses the recognition accuracy of object detection models on colorized images as a proxy indicator to measure their actual performance in downstream tasks, thereby achieving consistency between image quality assessment and task performance. RDS incorporates three key characteristics in its design: enhancing position robustness through the matching mechanism of object detection tasks, providing fine-grained interpretability through category-level accuracy calculation, and achieving task adjustability through flexible category division strategies. Systematic experiments conducted on both NIR-RGB and FLIR-5C datasets demonstrate that RDS maintains good subjective-objective consistency with traditional metrics under standard registration conditions, exhibits superior stability under registration error scenarios, and possesses fine-grained interpretability and task adjustability that traditional metrics lack. RDS maintains a 5.7% improvement in discriminative Score Gap under misalignment while PSNR degrades by 69.8%, and flexible category merging raises TIC-CGAN's RDS from 76.05% to 96.45% on unseen scenes, providing more practically valuable criteria for the evaluation and optimization of infrared colorization models.