Abstract
Atomic force microscopy (AFM) is a widely used tool for nanoscale characterization across materials science, energy research, and biology. However, its adoption in high-throughput materials discovery and statistically driven studies remains limited by a strong dependence on expert operator input and by the scarcity of annotated experimental AFM datasets needed to enable data-driven automation. Here, we introduce SimuScan, a synthetic-data-driven framework that enables reliable AFM feature identification, segmentation, and targeted imaging without requiring large manually labeled experimental datasets. SimuScan generates tunable, high-fidelity synthetic AFM images of defined morphologies while incorporating realistic experimental artifacts, including tip-sample convolution, noise, flattening distortions, and surface debris. These datasets are shown to support scalable, label-free training of modern deep learning models for AFM analysis. When integrated into data-driven AFM workflows, SimuScan-trained models can locate and analyze nanoscale structures across large datasets and guide targeted follow-up imaging. We validate this approach on nanostructured surfaces, DNA assemblies, and bacterial cells, demonstrating robust generalization across diverse sample types with minimal operator intervention. More broadly, this work establishes a general strategy for generating explicitly conditioned, task-relevant synthetic data to improve the reliability of downstream models in autonomous microscopy.