Abstract
Accurate fetal health assessment is challenging due to scarcity of abnormal cases, class imbalance, and limited interpretability of AI models. This study proposes a multi-modal AI framework using Siamese Neural Network (SNN) with few-shot and multi-task learning to address these gaps. The SNN employs contrastive learning with hybrid loss functions to simultaneously detect abnormalities and localize anatomical regions, improving data efficiency by learning robust embeddings from limited abnormal samples. To mitigate potential domain shift from heterogeneous data sources, we implemented curriculum-based pair sampling and stratified cross-validation, ensuring reported performance is not inflated by source-specific features. Clinical data streams are integrated using ensemble models with SHAP-based interpretability, enabling transparent identification of key maternal and fetal risk factors. Additionally, a vision-language model distilled from a large teacher network into a compact student model generates radiologist-style diagnostic summaries. With INT8 post-training quantization, the system reduces model size to <10 MB, supporting edge deployment in resource-limited settings. The framework achieves 98.6 % classification accuracy while reducing manual screening time by 60-70 %, offering scalable and interpretable solution for prenatal anomaly detection. Key methods employed include:•Siamese Neural Network with contrastive + multi-task loss.•Ensemble models (Random Forest, XGBoost) with SHAP interpretability.•Vision-Language distillation for clinical reporting.