Abstract
Advances in functional genomic technology, notably CRISPR using Cas9 or Cas12, now allow for large-scale double perturbation screens in which pairs of genes are inactivated, allowing for the experimental detection of genetic interactions (GIs). However, as it is not possible to validate GIs in high-throughput, there is no gold standard dataset where true interactions are known. Hence, we constructed a Double-CRISPR Knockout Simulation (DKOsim), which allows users to reproducibly generate synthetic simulation data where the single gene fitness effect of each gene and the interaction of each gene pair can be specified by the investigator. We adapted Monte-Carlo randomization methods to extend single knockout simulation methods to double knockout designs, which simulate the gene-gene interactions between all possible combinations of the input genes. Using DKOsim, we generated simulated datasets that closely resemble real double knockout CRISPR datasets in terms of Log Fold Change (LFC), GI distribution, and replicate correlation. We further inferred optimal CRISPR library designs by systematically investigating critical experimental parameters including depth of coverage, guide efficiency, and the variance of initial guide distribution. This simulation scheme will help to identify optimal computational methods for GI detection and aid in the design of future dual knockout CRISPR screens.