Abstract
Mushroom farms in the United States continue to face persistent labor shortages, especially during the harvesting of white button mushrooms (Agaricus bisporus) which requires selective picking by skilled workers. This study addresses this challenge by developing a depth-guided computer vision framework for automated mushroom detection, segmentation, and tracking to support timely harvest decisions, providing the foundation needed to support selective and timely robotic harvesting. The specific objectives of the study were to (1) develop a novel image-processing algorithm (RD-GuideNet) that integrates RGB and depth images for accurate detection and segmentation of mushrooms; (2) implement a custom depth-guided tracking algorithm to preserve mushroom identities across sequential frames; (3) compare the performance of RD-GuideNet against state-of-the-art deep learning models, YOLOv8 and YOLOv11, focusing on segmentation and tracking accuracies. The proposed RD-GuideNet achieved an F1-score of 0.93 for segmentation, outperforming YOLOv8 (0.88) and YOLOv11 (0.86), and produced sharper, more geometrically consistent boundaries that closely followed true mushroom cap contours. Its tracking consistency reached 92.7%, compared to YOLOv8 (95.3%) and YOLOv11 (94.6%). Although slightly lower, RD-GuideNet maintained high temporal consistency across dense mushroom beds. These results suggest that depth-based geometric reasoning and deep learning approaches exhibit complementary strengths in dense production scenes. Combining the two may further enhance detection reliability and shape fidelity, supporting high-precision perception for autonomous mushroom harvesting. A comprehensive quantitative evaluation of such a hybrid framework will be investigated in future work.