Abstract
Background: Despite the availability of direct oral anticoagulants, warfarin remains essential for mechanical valves, renal impairment, and resource-limited settings. Traditional dosing achieves therapeutic range in only 55-65% of patients, increasing bleeding and thrombotic complications. This systematic review evaluates the literature on machine learning (ML) approaches for warfarin dose prediction (2022-2025). Methods: We analysed 14 studies encompassing 122,400 patients across nine countries following PRISMA guidelines. Studies utilizing ML algorithms for warfarin dosing with quantifiable performance metrics were included. Risk of bias was assessed using PROBAST. Results: Reinforcement learning demonstrated superior performance, achieving an 80.8% excellent responder ratio versus 41.6% for standard practice and 99.5% safety responder ratio versus 83.1%. Support vector machines achieved R(2) up to 0.98 in homogeneous populations. Mean absolute error ranged from 0.11 to 1.8 mg/day, consistently outperforming traditional methods. Seven studies included external validation, whilst 78.6% were retrospective designs. Limited implementation studies showed therapeutic INR rates improving from 47.5% to 61.1%. Critically, only three studies (21.4%) reported any safety outcomes, with none adequately powered to detect differences in major bleeding events. Conclusions: While ML algorithms demonstrate improved dosing accuracy in retrospective analyses, the near-complete absence of adequately powered safety outcome data represents the primary barrier to clinical implementation. Without robust evidence on bleeding, thromboembolism, and mortality, the risk-benefit profile remains unknown. Implementation requires addressing: the predominance of retrospective studies (78.6%), limited prospective validation, restricted geographic diversity (43% from China), absence of African and South American studies, and no new Hispanic population data. Multicentre prospective trials with safety endpoints, population-specific validation, and interpretable models are essential before widespread clinical adoption can be recommended.