Abstract
Genetic risk stratification is fundamental to the clinical management of acute myeloid leukemia (AML). While targeted next-generation sequencing (NGS) panels have replaced traditional single-gene assays, significant inter-laboratory variability in assay design and bioinformatic pipelines persists. This systematic review and meta-analysis evaluates the analytical performance and standardization of NGS panels used for AML risk stratification. Major databases were searched for studies reporting the analytical validation or inter-laboratory harmonization of NGS panels for AML compared to orthogonal reference standards (PCR/capillary electrophoresis). The primary outcomes were analytical concordance (overall percent agreement) and diagnostic accuracy. Risk of bias was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). A random-effects meta-analysis was conducted using the Freeman-Tukey double arcsine transformation to pool concordance rates. Heterogeneity was explored via meta-regression of assay limit of detection (LOD). The certainty of evidence was evaluated using Grading of Recommendations, Assessment, Development, and Evaluation (GRADE). Seven studies comprising 775 patient samples met the inclusion criteria. The pooled analytical concordance between NGS and reference standards was 96.4% (95% CI: 86.9%-100.0%). Diagnostic accuracy was excellent, with a hierarchical summary receiver operating characteristic (HSROC) area under the curve of 0.98. However, significant heterogeneity was observed (I(2) = 91.9%, p < 0.0001). Meta-regression indicated that assay sensitivity (LOD) was a significant moderator; highly sensitive NGS assays (LOD ≤ 10(-4)) frequently detected low-level variants missed by standard PCR, leading to apparent discordance. Inter-rater reliability was strong (Global κ = 0.876). Targeted NGS panels demonstrate superior analytical sensitivity and high concordance with traditional methods, supporting their use as the new gold standard for AML risk stratification and measurable residual disease (MRD) monitoring. However, substantial inter-laboratory heterogeneity highlights the critical need for standardized reporting thresholds and harmonized bioinformatic pipelines to ensure consistent clinical interpretation.