Abstract
Genome-wide association studies (GWAS) are vital for investigating single-nucleotide polymorphisms (SNPs) in diseases. Comparisons of detected SNPs for the same disease between populations often reveal gaps, which are attributed to factors such as population, sample size, and rare variants, among others. We propose a method referred to as ‘Genotype Subtyping’ that is based on a classical GWAS to identify seed SNPs; each SNP stratifies cases and controls by its genotype, followed by a sub-GWAS per genotype while keeping the additional number of statistical tests low. We evaluated four databases and populations, including Crohn’s disease, schizophrenia, breast cancer, and type 2 diabetes. Our findings and simulations demonstrate that Genotype Subtyping identifies additional significant SNPs or highlights SNPs that increase their significance importantly in all tested diseases. Some of these SNPs have already shown associations with the disorders in other populations. This result suggests that additional associations not observed before can be potentially novel. For instance, we confirmed associations near CASC15 (rs7760611) with breast cancer and near NDUFAF4 (chr6:97339280) with schizophrenia, even though they initially fell short of genome-wide significance. Moreover, we noted a SNP near BUD13 (rs1263149) showing a potential link to type 2 diabetes. We demonstrate that Genotype Subtyping is highly sensitive and specific by positive and negative simulations. Our results underscore the reliability of Genotype Subtyping as a method for identifying and validating associations for specific phenotypes by reanalyzing GWAS data, ultimately facilitating target gene identification. The implementation is available in a GitHub repository (https://github.com/vtrevino/Genotype_Subtyping). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-025-00512-2.