Abstract
Saurashtra (also spelled Sourashtra or Saurashtri) is a low-resource Indo-Aryan language primarily spoken by diaspora communities in southern India, with historical roots in the Saurashtra region of Gujarat. In everyday usage, Saurashtra is written in multiple scripts, including its native script, the Tamil script, and occasionally in Devanagari transliteration. Despite its rich literary and oral traditions, Saurashtra is severely underrepresented in the fields of computational linguistics and digital language resources. The lack of a structured dataset has hindered advances in natural language processing (NLP), language preservation, and the development of digital tools. This paper introduces a new dataset for the Saurashtra language in the wider context of resource development for Indo-Aryan languages. This dataset was compiled to contain comprehensive linguistic data, including phonology and semantics of the Saurashtra language, which describe its structure and usage. It serves as a critical resource for linguists, language researchers, and cultural historians interested in the preservation and study of lesser-documented languages in the Indo-Aryan family.