Abstract
Osteosarcoma is a highly aggressive primary bone cancer characterized by rapid progression and complex tissue heterogeneity, making accurate diagnosis and treatment planning particularly challenging. Conventional imaging-based diagnostic approaches often struggle to reliably differentiate viable tumor tissue, non-viable tumor regions, and healthy bone, leading to segmentation inaccuracies and elevated false-positive rates that can adversely affect clinical decision-making. To address these challenges, this work proposes DWT-N-MSDAUNet, a hybrid deep attention-based framework for precise osteosarcoma segmentation and multi-class classification.The proposed framework integrates Dynamic Wavelet Transform Normalization (DWT-N) for image enhancement, followed by a Multi-Scale Dense Attention U-Net (MSDA-UNet) to achieve accurate multiscale tumor segmentation. From the segmented representations, Binary Horse Herd Optimization (B-HHO) is employed to select the most discriminative features, reducing redundancy while preserving critical structural and textural information. Final classification into viable tumor, non-viable tumor, and healthy tissue classes is performed using a Bottleneck Transformer, optimized via the Puma Optimizer, to effectively model long-range spatial and contextual dependencies.The framework was evaluated on an osteosarcoma imaging dataset comprising approximately 1144labeled images, using fivefold cross-validation to ensure robustness and generalization. The MSDA-UNet achieved a Dice coefficient of 99.3% and a Jaccard index of 98.1%, with low boundary errors (ASSD = 0.3 mm, HD = 1.2 mm), outperforming UNet, UNet + + , Attention-UNet, and R2-UNet. The proposed Bottleneck Transformer classifier attained an overall accuracy of 99.5%, with precision, recall, and F1-scores above 99.5%, and a specificity of 99.55%, indicating a reduced false-positive rate. Class-wise evaluation further confirmed consistent performance across all tissue categories, with AUC values ranging from 0.96 to 0.98. Despite increased computational cost compared to lightweight models, the proposed approach demonstrated feasible deployment characteristics with an inference time of 63 ms per image, supporting its suitability for clinical diagnostic workflows. These results highlight the framework’s effectiveness in balancing high diagnostic accuracy with clinical reliability, particularly for biopsy guidance and treatment monitoring in osteosarcoma.