Abstract
This dataset presents a Bengali-language dataset designed to fine-tune AI powered hint-based tutoring systems for the Secondary School Certificate (SSC) science curriculum in Bangladesh. This data includes 11,286 hint-based question-answer entries, comprising 4859 questions from Biology covering 14 chapters, 3034 from Chemistry across 12 chapters, and 3393 from Physics spanning 14 chapters. All items were created manually using government-issued textbooks, SSC focused study materials, and past exam question banks. Each question is paired with candidate answers containing one correct option and several closely related but incorrect options to help measure the effectiveness of the hints. A convergence score is attached to each entry, estimating how far a student may need to go through the hints to answer correctly. These features support personalized feedback and offer meaningful insight into the students' learning progress. The dataset is encoded in UTF-8, with some English terms retained for scientific precision and consistency with source materials. This makes it accessible to native learners while remaining valuable for low-resource Natural Language Processing (NLP) applications. By emphasizing curriculum alignment, ranked hinting, and learner modeling, the dataset provides a strong foundation for fine-tuning large language models (LLMs) and developing intelligent tutoring systems that are both linguistically inclusive and educationally effective.