Abstract
Accurate emotion recognition in social media text is critical for applications such as sentiment analysis, mental health monitoring, and human-computer interaction. However, existing approaches face challenges like computational complexity and class imbalance, limiting their deployment in resource-constrained environments. While transformer-based models achieve state-of-the-art performance, their size and latency hinder real-time applications. To address these issues, we propose a novel knowledge distillation framework that transfers knowledge from a fine-tuned BERT-base teacher model to lightweight DistilBERT and ALBERT student models, optimised for efficient emotion recognition. Our approach integrates a hybrid loss function combining focal loss and Kullback-Leibler (KL) divergence to enhance minority class recognition, attention-head alignment for effective contextual knowledge transfer, and semantic-preserving data augmentation to mitigate class imbalance. Experiments on two datasets, Twitter Emotions 416 K samples, six classes, and Social Media Emotion 75 K samples, five classes, show that our distilled models achieve near-teacher performance 97.35% and 73.86% accuracy, respectively. with only a < 1% and < 6% accuracy drop, while reducing model size by 40% and inference latency by 3.2×. Notably, our method significantly improves F1-scores for minority classes. Our work sets a new state-of-the-art in efficient emotion recognition, enabling practical deployment in edge computing and mobile applications.