Abstract
Purpose: The purpose of this study was to investigate how the choice of axial CT level affects the reliability and diagnostic accuracy of the Dejour classification for trochlear dysplasia and to evaluate a novel level defined at the most superior extent of the Blumensaat line. Materials and methods: Patients who presented with patellar instability or acute patellar dislocation between 2014 and 2024 and had preoperative CT scans were retrospectively reviewed. Fifty patients were randomly selected based on an a priori sample size calculation. For each knee, four axial CT levels were reconstructed: midpatellar level, Roman arc level, 3 cm above the joint line, and the top of the Blumensaat line. A consensus Dejour grade (A-D) was established by an experienced musculoskeletal radiologist and an orthopedic sports surgeon and used as the reference standard. Two orthopedic surgeons independently graded all 200 axial images twice at least 15 days apart. Quadratic weighted kappa (κ) with 95% confidence intervals (CI) was used to assess intra- and inter-observer reliability and agreement with the consensus. Diagnostic accuracy was defined as the proportion of correctly classified cases relative to the consensus and was compared across levels using Cochran's Q test. Results: When all four levels were combined, intra-observer reliability was almost perfect for both observers (κ = 0.96 and 0.84; exact agreement 91% and 84%), and inter-observer reliability was substantial to almost perfect (κ = 0.72 and 0.78; exact agreement 72-73%). Agreement with the consensus across all levels was moderate (κ = 0.52-0.58; exact agreement 51-52%). Analyzing levels separately, intra-observer κ remained high at all levels, whereas inter-observer agreement and agreement with the consensus varied markedly. The midpatellar level showed only moderate inter-observer reliability and fair-to-moderate agreement with the consensus (κ = 0.36; accuracy 34-40%), whereas the top of the Blumensaat line showed the highest agreement with the consensus (κ 0.69) and the highest accuracy (up to 64%; pooled 61%); however, statistically significant between-level differences were detected in only one observer-time comparison. The 3 cm above the joint line and the Roman arc level demonstrated intermediate performance. Conclusions: Although intra-observer reliability of the Dejour classification is high regardless of axial CT level, both inter-observer agreement and diagnostic accuracy depend strongly on the selected slice. The axial CT level at the top of the Blumensaat line showed a consistent trend toward higher agreement and accuracy relative to the consensus standard and may be used as a standardized reference slice within routine multi-slice CT assessment to improve reproducibility; however, it should complement comprehensive imaging review and clinical evaluation.