Abstract
In froth flotation, the features of froth images are important information to predict the concentrate grade. However, the froth structure is influenced by multiple factors, such as air flowrate, slurry level, ore properties, reagents, etc., which leads to highly complex and dynamic changes in the image features. Additionally, issues such as the immeasurability of ore properties and measurement errors pose significant uncertainties including aleatoric uncertainty (intrinsic variability from ore fluctuations and sensor noise) and epistemic uncertainty (incomplete feature representation and local data heterogeneity) and generalization challenges for prediction models. This paper proposes an uncertainty quantification regression framework based on cross-modal interaction fusion, which integrates the complementary advantages of Selective Kernel Networks (SKNet) and Vision Transformers (ViT). By designing a cross-modal interaction module, the method achieves deep fusion of local and global features, reducing epistemic uncertainty caused by incomplete feature expression in single-models. Meanwhile, by combining adaptive calibrated quantile regression-using exponential moving average (EMA) to track real-time coverage and adjust parameters dynamically-the prediction interval coverage is optimized, addressing the inability of static quantile regression to adapt to aleatoric uncertainty. And through the localized conformal prediction module, sensitivity to local data distributions is enhanced, avoiding the limitation of global conformal methods in ignoring local heterogeneity. Experimental results demonstrate that this method significantly improves the robustness of uncertainty estimation while maintaining high prediction accuracy, providing strong support for intelligent optimization and decision-making in industrial flotation processes.