SSC-BanglaTutor: A curriculum-aligned Bengali dataset for intelligent tutoring systems

SSC-BanglaTutor:一个与课程体系相符的孟加拉语智能辅导系统数据集

阅读:1

Abstract

This dataset presents a Bengali-language dataset designed to fine-tune AI powered hint-based tutoring systems for the Secondary School Certificate (SSC) science curriculum in Bangladesh. This data includes 11,286 hint-based question-answer entries, comprising 4859 questions from Biology covering 14 chapters, 3034 from Chemistry across 12 chapters, and 3393 from Physics spanning 14 chapters. All items were created manually using government-issued textbooks, SSC focused study materials, and past exam question banks. Each question is paired with candidate answers containing one correct option and several closely related but incorrect options to help measure the effectiveness of the hints. A convergence score is attached to each entry, estimating how far a student may need to go through the hints to answer correctly. These features support personalized feedback and offer meaningful insight into the students' learning progress. The dataset is encoded in UTF-8, with some English terms retained for scientific precision and consistency with source materials. This makes it accessible to native learners while remaining valuable for low-resource Natural Language Processing (NLP) applications. By emphasizing curriculum alignment, ranked hinting, and learner modeling, the dataset provides a strong foundation for fine-tuning large language models (LLMs) and developing intelligent tutoring systems that are both linguistically inclusive and educationally effective.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。