Abstract
BACKGROUND: In some GWAS studies, particularly those involving biobank data, linear regression is employed to obtain summary statistics on binary traits, while others report the log odds or odds ratios from the logistic regression of the genomic variants. However, some studies applied a transformation equation between logistic regression to linear regression. AIM: The current study aims to assess the performance of the Wald ratio using logistic regression, linear probability models (LPM), and transformation approaches in comparison with structural equation modelling (SEM) and Two Stage Predictor Substitution (TSPS), Two Stage Residual (TSRI) via simulation and real data analysis. METHODS: Simulation data based on a bivariate Bernoulli distribution were applied within an instrumental variable framework to estimate empirical bias. Four sensitivity analysis scenarios were considered, varying the sample size, IV prevalence, exposure and outcome prevalence, and confounder effect. Additionally, real data from the Golden Retriever Lifetime Study were analyzed to estimate the potential causal effect of activity level on cancer risk. RESULTS: In the simulation data, for the positive effect size with a low confounder effect and a weak instrumental variable scenario, the median (Q1-Q3) biases of the Wald ratio were as follows: under SEM, the bias for logistic regression was 0.77 (-0.20-1.74), for LPM it was 0.03 (-3.78-3.95), and for the transformation method it was 0.03 (-3.79-3.94). Under TSPS, the bias for logistic regression was 0.72 (-2.24-3.61), for LPM it was 0.00 (-0.04-0.05), and for the transformation method it was 0.00 (-0.01-0.01). Under TSRI, the bias for logistic regression was 0.75 (-2.47-3.87), for LPM it was 0.02 (-0.21-0.26), and for the transformation method it was 0.02 (-0.22-0.26). In the real data analysis, for the SNP Affx-205724246_A, the biases of the Wald ratio were as follows: under SEM, -0.14 for logistic regression, -1.30 for LPM, and -1.45 for the transformation method. Under TSPS and TSRI, the bias for logistic regression was 1.24, for LPM 0.08, and for the transformation method -0.07. CONCLUSION: The findings indicated that increasing the strength of the instrumental variable led to a reduction in bias, with the best performance observed at instrumental strengths of 0.5 and 0.7. The LPM and transformation approaches produced relatively lower bias in the TSPS framework when the confounder effect was below 0.1 and the prevalence of the outcome or the exposure is within 0.5 to 0.62. When the prevalence of the exposure and outcome ranged from 0.67 to 0.84 and the instrumental strength was high (0.7), the bias of the LPM and transformation methods were slightly higher but comparable to that observed when the prevalence ranged from 0.5 to 0.62. Furthermore, low prevalence of exposure or outcome (ranged from 0.12 to 0.23) produced larger bias than those observed at higher prevalence values (ranged from 0.67 to 0.84). In the real data analysis, the bias of the Wald ratio using LPM and transformation methods were lower under TSPS and TSRI and higher under the SEM method, while the bias of the Wald ratio using logistic regression was lower under SEM compared to the other methods.