Abstract
BACKGROUND: Abdominal ultrasound is non-invasive and efficient, yet acquiring standard planes remains challenging due to operator dependency and procedural complexity. We propose AbVLM-Q, a vision-language framework for automated quality assessment of abdominal ultrasound standard planes. METHODS: In this study, we assembled a multi-center dataset comprising 7,766 abdominal ultrasound scans, which were randomly divided into training (70%), validation (15%), and testing (15%) subsets. The proposed method, AbVLM-Q, was developed using a three-step approach: (1) hierarchical prompting that incorporates spatially aware querying and sequential reasoning; (2) a quantifiable scoring mechanism based on multi-level clinical penalty criteria; and (3) LoRA (Low-Rank Adaptation)-based fine-tuning of a pretrained vision-language model. Performance was evaluated using mean recall, precision, label accuracy, subset accuracy, and confusion matrix analysis. RESULTS: The system achieved key structure detection with 88.90% mean recall and 98.10% precision, showing higher precision and comparable recall to Faster R-CNN (89.77% recall, 88.64% precision at a 0.5 confidence threshold). Plane classification yielded 98.96% label accuracy and 96.28% subset accuracy, surpassing the best CNN (97.84%, 94.29%; P < 0.05). Image scoring accuracy for the clinically critical "Excellent" grade (scores 8-10) reached 85.11% with the best-performing backbone. Confusion matrix analysis confirmed consistent performance across different backbones, with discrepancies primarily observed at grade boundaries. CONCLUSIONS: AbVLM-Q provides a novel method for automated ultrasound quality assessment, functioning as both an evaluation tool and a training platform for standardized scanning. It bridges AI-driven imaging analysis with clinical workflows, enhancing quality control in ultrasound diagnostics.