Digital documents play a crucial role in contemporary information management. However, their quality can be significantly impacted by various factors such as hand-drawn annotations, image distortion, watermarks, stains, and degradation. Deep learning-based methods have emerged as powerful tools for document enhancement. However, their effectiveness relies heavily on the availability of high-quality training and evaluation datasets. Unfortunately, such benchmark datasets are relatively scarce, particularly in the domain of Traditional Chinese documents. We introduce a novel dataset termed "Joint Variation and ZhuYin dataset (JVZY)" to address this gap. This dataset comprises 20,000 images and 1.92 million words, encompassing various document degradation characteristics. It also includes unique phonetic symbols in Traditional Chinese, catering to the specific localization requirements. By releasing this dataset, we aim to construct a continuously evolving resource explicitly tailored to the diverse needs of Traditional Chinese document enhancement. This resource aims to facilitate the development of applications that can effectively address the challenges posed by unique phonetic symbols and varied file degradation characteristics encountered in Traditional Chinese documents.
Joint variation and ZhuYin dataset for Traditional Chinese document enhancement.
阅读:12
作者:Lo Shi-Wei, Chou Hsiu-Mei, Wu Jyh-Horng
| 期刊: | Scientific Data | 影响因子: | 6.900 |
| 时间: | 2024 | 起止号: | 2024 Nov 27; 11(1):1295 |
| doi: | 10.1038/s41597-024-04146-7 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
