Abstract
CLIP models have shown their impressive learning and transfer capabilities in a wide range of visual tasks. It is, however, interesting that these foundation models have not been fully explored for Universal Domain Adaptation (UniDA). In this paper, we make comprehensive empirical studies of state-of-the-art UniDA methods using these foundation models. We first demonstrate that although the foundation models greatly improve the performance of the baseline method (which trains the models on the source data alone), existing UniDA methods struggle to improve over the baseline. This suggests that new research efforts are necessary for UniDA using these foundation models. Finally, we observe that calibration of CLIP models plays a key role in UniDA. To this end, we propose a very simple calibration method via automatic temperature scaling, which significantly enhances the baseline's out-of-class detection capability. We show that a single learned temperature outperforms previous approaches in most benchmark tasks when adapting from CLIP models, excelling in evaluation metrics including H-score and a newly proposed Universal Classification Rate (UCR) metric. We hope that our investigation and the proposed simple framework can serve as a strong baseline to facilitate future studies in this field.