Abstract
Zero-shot learning (ZSL) aims to classify unseen classes by leveraging semantic information from seen classes, addressing the challenge of limited labeled data. In recent years, ZSL methods have focused on extracting attribute-level features from images and aligning them with semantic features within an embedding space. However, existing approaches often fail to account for significant visual variations within the same attribute, leading to noisy attribute-level features that degrade classification performance.To tackle these challenges, we propose a novel zero-shot image classification method named CRAE (Class Representation and Attribute Embedding), which combines class representation learning and attribute embedding learning to enhance classification robustness and accuracy. Specifically, we design an adaptive softmax activation function to normalize attribute feature maps, effectively reducing noise and improving the discriminability of attribute-level features. Additionally, we introduce attribute-level contrastive learning with hard sample selection to optimize the attribute embedding space, reinforcing the distinctiveness of attribute representations. To further increase classification accuracy, we incorporate class-level contrastive learning to enhance the separation between features of different classes. We evaluate the effectiveness of our approach on three widely used benchmark datasets (CUB, SUN, and AWA2), and the experimental results demonstrate that CRAE significantly outperforms existing state-of-the-art methods, proving its superior capability in zero-shot image classification.