Abstract
Environmental microorganism (EM) detection from microscopy images supports scalable water-environment monitoring, but fair comparison of detection algorithms is hindered by inconsistent experimental protocols and limited benchmark reporting. We construct a reproducible EM object detection benchmark on the EMDS-7 dataset and, using a fixed split, unified input normalization, and from-scratch training without external pretraining, evaluate 25 representative detectors spanning two-stage proposal-based methods, one-stage dense detectors, keypoint-based formulations, and Transformer-based end-to-end approaches. Performance is measured with COCO-style mAP over IoU 0.50:0.05:0.95, complemented by AP across IoU thresholds, recall-based analysis, and backbone comparisons across ResNet-18/50/101 where supported. Two-stage detectors achieve the strongest overall accuracy on EMDS-7, with Faster R-CNN obtaining the best mAP (64.0%) and Cascade R-CNN close behind (63.9%), while modern one-stage detectors substantially narrow the gap, exemplified by RTMDet-X (60.9%). AP consistently decreases as IoU becomes stricter, indicating that precise localization is a key bottleneck in EM microscopy; backbone scaling does not yield uniform gains under from-scratch training, and recall analysis reveals distinct operating characteristics across paradigms, including high-recall but lower-precision tendencies for some methods. These results suggest that EMDS-7 favors detectors with robust localization and controlled false positives under cluttered microscopic backgrounds, and that future progress should emphasize high-IoU localization for small instances, boundary-aware learning, and false-positive suppression rather than relying solely on deeper backbones; the proposed benchmark provides reproducible baselines and diagnostic evidence to guide subsequent EM detection research.