Abstract
OBJECTIVE: Tongue diagnosis is crucial in traditional Chinese medicine (TCM). As diagnoses cannot be standardized in TCM, reaching an ideal consensus when labeling TCM syndrome is difficult. This results in the introduction of subjective bias in the representation learning method. Therefore, we explore the application of contrastive learning to automatically extract semantic features in tongue images, thereby reducing the need for manual labeling and avoiding manual biases in a self-supervised manner. METHODS: We applied clustering contrastive learning (CCL) to the representation learning of tongue images. Based on TCM theory, we also coupled with a refined data augmented strategy. The embedding of tongue images by CCL-based models was utilized in downstream tasks, and the feature extraction capability was verified through their loss drop curve, precision, and other metrics. RESULTS: The downstream task experiments showed that CCL-based models outperformed the supervised models for most evaluation metrics. In the qualitative experiment, cluster analysis showed that the CCL-based model could perceive the colors and textures of the nasolabial fold or the eye without human-supervised information. CONCLUSIONS: The contrastive learning (CL) method automatically extracted tongue image features and avoided interference from artificial subjective labels. Thus, the symptoms, signs, and other phenotypes associated with Zheng (syndrome) of TCM can be objectively quantified, thereby solving long-standing standardization problem of TCM.