Abstract
OBJECTIVES: To verify that federated genomic study sites applied identical preprocessing pipelines without disclosing raw genotypes. MATERIALS AND METHODS: Each institution perturbs a 100-SNP slice using local differential privacy (LDP), trains a RandomForest classifier, and transmits one LIME explanation vector to a coordinating server. The server simulates 15 preprocessing combinations and trains a RandomForest classifier to predict each site's configuration. RESULTS: In centralized simulation, the verifier achieved 80% accuracy across 15 preprocessing configurations on the GMMAT (n = 400) and synthetic genome (n = 2504) datasets while maintaining membership-inference attack power below 0.05 at ε = 3. In distributed Flower FL experiments with data partitioned across three sites, binary compatibility detection reached 70% accuracy at 500 SNPs. DISCUSSION: A single differentially private explanation vector provides an auditable preprocessing fingerprint. The gap between centralized and distributed accuracy reflects expected FL data partitioning effects. CONCLUSION: This framework demonstrates the feasibility of automated preprocessing verification in federated genomic consortia without compromising participant privacy.