Abstract
To address the long-standing challenges children with autism face in social skills and emotional regulation, this study introduces Emotion-based Music Intelligent Network (EmoMusik-Net)-a deep learning model designed for intelligent music therapy. The model focuses on emotional impairments exhibited during social interactions, integrating Transformer-based temporal modeling with a Transfer Learning-based Graph Convolutional Network (TL-GCN). This combination enables high-precision recognition of facial expression sequences and supports a dynamically adaptive, closed-loop mechanism for personalized music recommendation. EmoMusik-Net was trained and optimized using three publicly available emotional video datasets. A pre- and post-intervention study, conducted in collaboration with the families of 182 children with autism, employed questionnaire-based assessments to systematically evaluate the model's real-world feasibility and effectiveness. Experimental results demonstrated that EmoMusik-Net achieved an emotion recognition accuracy above 0.970, an F1-score consistently over 0.960, and an Area Under the Curve (AUC) of 0.978. The model also showed outstanding robustness on large-scale datasets, with a stability score of 0.994, indicating strong classification performance and generalizability. In terms of intervention outcomes, boys aged 1-6 showed a marked increase in social interest scores, rising from 1.280 to 2.540-a 98.44% improvement. Girls aged 7-12 exhibited significant gains in emotional response scores, from 1.670 to 3.120-an 86.77% increase. Further statistical analysis using the Mixed-effects Model for Repeated Measures (MMRM) and bootstrap confidence interval estimation confirmed the intervention's significance both statistically and clinically, with particularly strong effects observed in younger participants. Expert blind evaluations further validated the system's effectiveness, showing high consistency in rhythm and emotion matching. The Intraclass Correlation Coefficient (ICC) ranged from 0.75 to 0.91, with matching accuracy surpassing 94% in certain subgroups. EmoMusik-Net not only addresses the current research gap in integrating intelligent emotion recognition with music-based interventions but also offers a responsive, technology-driven support tool for parents, educators, and clinicians. This approach holds strong potential to advance autism spectrum disorder interventions toward personalized, data-driven methodologies.