Abstract
BACKGROUND: Image segmentation is crucial in medical diagnosis, helping to identify diseased areas in images for more accurate diagnoses. The U-Net model, a convolutional neural network (CNN) widely used for medical image segmentation, has limitations in extracting global features and handling multi-scale pathological information. This study aims to address these challenges by proposing a novel model that enhances segmentation performance while reducing computational demands. METHODS: We introduce the LUNeXt model, which integrates Vision Transformers (ViT) with a redesigned convolution block structure. This model employs depthwise separable convolutions to capture global features with fewer parameters. Comprehensive experiments were conducted on four diverse medical image datasets to evaluate the model's performance. RESULTS: The LUNeXt model demonstrated competitive segmentation performance with a significant reduction in parameters and floating-point operations (FLOPs) compared to traditional U-Net models. The application of explainable AI techniques provided clear visualization of segmentation results, highlighting the model's efficacy in efficient medical image segmentation. CONCLUSIONS: LUNeXt facilitates efficient medical image segmentation on standard hardware, reducing the learning curve and making advanced techniques more accessible to practitioners. This model balances the complexity and parameter count, offering a promising solution for enhancing the accuracy of pathological feature extraction in medical images.