Saurashtra: An Indo-Aryan language dataset

索拉什特拉:一个印度-雅利安语数据集

阅读:1

Abstract

Saurashtra (also spelled Sourashtra or Saurashtri) is a low-resource Indo-Aryan language primarily spoken by diaspora communities in southern India, with historical roots in the Saurashtra region of Gujarat. In everyday usage, Saurashtra is written in multiple scripts, including its native script, the Tamil script, and occasionally in Devanagari transliteration. Despite its rich literary and oral traditions, Saurashtra is severely underrepresented in the fields of computational linguistics and digital language resources. The lack of a structured dataset has hindered advances in natural language processing (NLP), language preservation, and the development of digital tools. This paper introduces a new dataset for the Saurashtra language in the wider context of resource development for Indo-Aryan languages. This dataset was compiled to contain comprehensive linguistic data, including phonology and semantics of the Saurashtra language, which describe its structure and usage. It serves as a critical resource for linguists, language researchers, and cultural historians interested in the preservation and study of lesser-documented languages in the Indo-Aryan family.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。