Abstract
BACKGROUND: The intricate relationships between diseases are characterized by the sequence and temporal intervals of their onset, which are critical for understanding the essence of comorbidity and predicting disease progression. This study seeks to investigate the interdependencies and chronological order of various diseases that occur in the same patient by employing sequential pattern mining algorithms. Specifically, the research endeavors to delineate the disparities in the time intervals between the onset of distinct disorders and to scrutinize the concordance and discordance in disease sequence patterns across gender groups. METHODS: Patient identity information, visit dates, and diagnostic data were aggregated from the electronic medical record databases of three large general hospitals. The diagnostic information included the International Classification of Diseases, Tenth Revision (ICD-10) codes, along with their corresponding descriptions. A total of 1,060,344 diagnostic entries from 269,973 patients who visited during 2012-2022 were incorporated into the mining model, which was constructed using the Sequential Pattern Discovery using Equivalence Classes (SPADE) algorithm. RESULTS: A total of 212 highly supported sequential pattern rules were ultimately identified, most of which were related to disorders of the endocrine and circulatory systems. In 66 patterns, the order of disease incidence or diagnosis was relatively well-defined. The time interval between the onset of two diseases ranged from 1 to 2 years in most patterns. For patterns with short-term relationships, the interval was less than 2 months, whereas in some cases, the interval extended to 5 to 10 years. Among the extracted patterns, 176 exhibited stronger support in the male dataset compared to the female dataset. Patterns related to cardiovascular and liver diseases were more prevalent in males, while those associated with orthopedic and endocrine disorders showed higher prevalence in females. CONCLUSION: Our findings demonstrate the effectiveness of the constrained SPADE (cSPADE) algorithm in comorbidity research and highlight several clinically significant sequential comorbidity patterns. These patterns are expected to contribute to disease prevention, etiological research, and the development of clinical decision support systems.