Abstract
Background and Objectives: Müllerian duct anomalies (MDAs) are congenital malformations of the female genital tract for which several classification systems have been proposed. The objective of this study is to estimate the interrater reliability of the American Fertility Society (AFS), European Society of Human Reproduction and Embryology/European Society for Gynaecological Endoscopy (ESHRE/ESGE), American Society for Reproductive Medicine (ASRM) and Congenital Uterine Malformation by Experts (CUME) classification systems for Müllerian duct anomalies. Materials and Methods: This retrospective cohort study was conducted at a tertiary care hospital and included 71 patients aged up to 45 years who were assessed for a Müllerian duct anomaly between January 2000 and April 2023. Pelvic MRI images were independently evaluated by four readers, followed by a consensus meeting. The primary outcome was interrater reliability (Krippendorff's α), and the secondary outcomes were the proportions of indeterminate and unclassifiable cases after consensus meeting. Results: The interrater reliability for MDA diagnosis was very low for all the classification systems (AFS α 0.63, 95% CI [0.57, 0.67]; ASRM α 0.46, 95% CI [0.41, 0.52]; ESHRE/ESGE α 0.33, 95% CI [0.29, 0.38]; CUME α 0.57, 95% CI [0.45, 0.72]). After consensus meeting, the ESHRE/ESGE system had more indeterminate cases (9.9%) and the ASRM system had more unclassifiable cases (20.6%). Conclusions: All the classification systems for Müllerian duct anomalies had a very low interrater reliability, with more indeterminate cases in the ESHRE/ESGE system and more unclassifiable cases in the ASRM system. We present our recommendations for the improvement of each classification system. The ultimate goal of future research should be the development of a single uniform system integrating the best features of these systems and with clinically relevant cut-off values, considering patients' reproductive outcomes.