Abstract
MOTIVATION: Gene-damaging mutations are highly informative for studies seeking to discover genes underlying developmental disorders. Traditionally, these de novo variants are recognized by evaluating high-quality DNA sequence from affected offspring and parents. However, when parental sequence is unavailable, methods are required to infer de novo status and use this inference for association studies. RESULTS: We use data from autism spectrum disorder to illustrate and evaluate methods. Separating de novo from rare inherited variants is challenging because the latter are far more common. Using a classifier for unbalanced data and variants of known inheritance class, we build an inheritance model and then a de novo score for variants when parental data are missing. Next, we propose a new Random Draw (RD) model to use this score for gene discovery. Built into an existing inferential framework, RD produces a more powerful gene-based association test and controls the false discovery rate. AVAILABILITY AND IMPLEMENTATION: The implementation code and publicly available data are provided at: https://github.com/HaeunM/TADA-RD.