Abstract
This study examines the application of flexible copula regression models to analyze the complex interdependencies among clinical variables in breast cancer data. As the most commonly diagnosed cancer and the second leading cause of cancer-related deaths among women worldwide, breast cancer presents both clinical and analytical challenges. Unlike traditional multivariate approaches, copulas offer greater flexibility in capturing complex, nonlinear, and asymmetric dependencies between mixed-type outcomes. The present study examines copula-based regression models to investigate the joint behavior of clinical variables in patients with breast cancer. We explored multiple copula families to jointly model overall survival (binary) and age at diagnosis (continuous) in the METABRIC dataset. Goodness-of-fit metrics guide model comparison and selection, with the Gumbel copula demonstrating superior performance in capturing the upper tail dependence associated with favorable outcomes, such as younger age and improved survival. Formal model comparison against an independent margins baseline confirmed that accounting for dependence via a copula significantly improves model fit (likelihood ratio test: [Formula: see text], df = 1, p < 0.0001), and PIT diagnostics validated the adequacy of both marginal specifications. The findings support the integration of copula models into clinical research, facilitating a more nuanced understanding of cancer progression and enabling more accurate risk assessment and data-driven decision-making in oncology.