Building a Chinese ancient architecture multimodal dataset combining image, annotation and style-model

构建结合图像、标注和风格模型的中国古代建筑多模态数据集

阅读:1

Abstract

In this rapidly evolving era of multimodal generation, diffusion models exhibit impressive generative capabilities, significantly enhancing the realm of creative image synthesis by intricately textual prompts. Yet, their effectiveness is limited in certain niche sectors, like depicting Chinese ancient architecture. This limitation is primarily due to the insufficient data that fails to encompass the unique architectural features and corresponding text information. Hence, we build an extensive multimodal dataset capturing the essence of Chinese architectures mostly from the Tang to the Yuan Dynasties. The dataset is categorized on the types, including image&text, video, and style models. In details, images and videos are methodically categorized based on locations. All images are annotated at two levels: initial annotations and descriptive terms based on distinctive characteristics and official information. Moreover, seven artistic styles fine-tuning models are provided in our dataset for further innovations. Significantly, this is the first Chinese ancient architecture dataset and the instance of using the Pinyin system to annotate unique terms related to Chinese architectural styles.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。