WU Nengguang, LIN Zhigang, CHEN Tuo, ZHUANG Feifei, CHEN Hong
Objective To address the common problems faced in medical image segmentation, such as complexity, blurred tissue boundaries, and low organ contrast, especially with the wide application of the Transformer self-attention mechanism in medical image segmentation, it has the pain point of poor segmentation effect. Methods Multi-scale attention aggregation and Mamba-Like linear attention mechanism were fused to construct a U-shaped network MM-UNet. Among them, the Mamba-Like linear attention mechanism was introduced to enhance the model’s learning ability for global dependencies. At the same time, a multi-scale attention aggregation mechanism was adopted. Through multi-scale convolution, key features could be captured and utilized in a targeted manner, and multi-scale semantic information could be captured more comprehensively to improve the accuracy of the model in image edge segmentation. Results Algorithms such as MM-UNet, UNet, VM-UNet, and LightM-UNet were subjected to comparative experiments on multiple datasets including dermoscopy ISIC18, ISIC17, and ultrasound thyroid nodule TN3K. The results showed that the mIoU of MM-UNet on the dermoscopy ISIC18, ISIC17 and ultrasound thyroid TN3K datasets were 83.22%, 83.63% and 80.19% respectively, and the Dice coefficients were 90.84%, 91.08% and 89.00% respectively. The accuracies were 95.86%, 96.57%, and 97.38% respectively, and the specificities were 97.95%, 98.36%, and 98.70% respectively. Among them, compared with models such as UNet, VM-UNet, and LightM-UNet, the mIoU, Dice coefficient and accuracy of MM-UNet all performed the best. The specificity of MM-UNet performed best on the ISIC18 and TN3K datasets. Conclusion MM-UNet achieves high segmentation accuracy on various medical image data with different imaging backgrounds, and its generalization ability is also good, which has certain application value.