Abstract
Reliable autofocus is a fundamental prerequisite for precise positioning in micro-assembly systems, where complex reflections, scale variations, and narrow depth-of-field often degrade the robustness of traditional sharpness metrics. To address these challenges, we propose an efficient two-stage autofocus method for a dual-camera micro-vision system based on a spatial-frequency image quality assessment (IQA) model. First, we design WaveMamba-IQA for image sharpness estimation, synergistically combining the Discrete Wavelet Transform with Vision Transformers to capture high-frequency details and semantic features, further enhanced by Multi-Linear Transposed Attention and Vision Mamba for global context modeling. Moreover, we implement a coarse-to-fine autofocus workflow, employing the Covariance Matrix Adaptation Evolution Strategy for global optimization on the horizontal camera, followed by geometric prior-based precise adjustment for the oblique camera. Experimental results on a custom microsphere dataset demonstrate that WaveMamba-IQA achieves a Spearman correlation coefficient of 0.9786. Furthermore, the integrated system achieves a 98.33% autofocus success rate across varying lighting conditions. This method significantly improves the robustness and automation level of micro-assembly systems, effectively overcoming the limitations of manual and traditional focusing techniques.