Abstract
BACKGROUND: Breast cancer is the most prevalent cancer diagnosed among Cuban women and the second leading cause of mortality due to cancer, making it a major public health issue. It is necessary to identify the factors affecting the diagnosis time of breast cancer to avoid possible diagnosis delay, thus improving the survival rates. A large number of variables influencing diagnosis time are now available due to recent advancements in technologies for data collection. Standard regression models are inconsistent for modeling these high-dimensional datasets because of their tendency to overfit. METHODS: This study uses a wide range of demographic, biomedical, and socioeconomic variables to provide a detailed understanding of the factors affecting the breast cancer diagnosis time using survival analysis methods. Due to the high volume and huge complexity of the data, survival machine learning methods like random survival forest, classification and regression tree are used for the analysis. RESULTS: The analysis identifies age at menopause, BI-RADS (Breast Imaging-Reporting and Data System) category, and the number of biopsies conducted as the key variables affecting the time-to-diagnosis of breast cancer among Cuban women. These findings indicate the key areas that require urgent attention to prevent diagnosis delay. CONCLUSIONS: This study identifies important variables that the administration of Cuba can focus on to develop tailored policies to avoid delays in breast cancer diagnosis. By assessing these gaps in the existing cancer-related initiatives undertaken by the government of Cuba, the development of more effective strategies can be achieved for reducing the overall cancer incidence in the country.