Abstract
As flight environments and training populations grow more complex, traditional equal-weighted cumulative assessments struggle to capture the heterogeneous contribution of each checkpoint to trainee proficiency. This can suppress high performers and inflate low performers, weakening instructional improvement and risk warning. We propose a multi-dimensional dynamic weighting scheme that integrates Phi correlation, category discriminatory power (CDP), and dimension-level priority (β). Concretely, Phi (via chi-square) quantifies the association between each checkpoint and the final A/B grade; CDP describes pass-rate disparity between categories; and expert-defined β encodes dimension priorities, yielding checkpoint weights after within-dimension normalization and inter-dimension aggregation. On a 13-hour development cohort (175 trainees, 158 checkpoints), the proposed method raised Accuracy/Precision/Recall/F1 from 0.90/0.98/0.91/0.94 (baseline) to near-perfect values under the fixed institutional threshold, with clearly widened between-class margins and reduced within-class dispersion in score distributions. In a freeze-and-apply external test on a 9-hour cohort (179 trainees; one dimension absent; 50 fewer checkpoints), the method substantially outperformed the baseline (e.g., Accuracy 0.9162; F1 0.9563) without re-estimating parameters. These results suggest that dynamic weighting improves separability and interpretability and offers practical decision support for remediation, risk alerts, and promotion reviews, while remaining lightweight and deployable, offering richer data support for full-cycle training and personalized interventions.