Abstract
The COVID-19 pandemic has accelerated the adoption of online educational systems, highlighting the need for advanced automation to enhance learning and evaluation processes. Multiple-choice questions (MCQs) are a fundamental assessment tool in these systems. This paper introduces NOIRBETTIK, a novel dataset designed for reading comprehension-based MCQ answering in Bangla, developed to address the shortage of high-quality Bangla datasets for context-based tasks. The dataset is human-made, sourced from authentic Bangla materials such as books, articles, and biographies, offering longer passages and multiple-choice questions with four alternatives per question. This work focuses on providing a comprehensive and real-world dataset, filling a critical gap in Bangla NLP research and educational applications. We describe the dataset's creation and annotation process, comparing it to existing datasets to highlight its uniqueness. The primary contributions include the release of the NOIRBETTIK dataset and a detailed exploration of its structure, enabling future advancements in educational technologies. This dataset holds significant promise for enhancing reading comprehension systems and addressing the educational needs of Bangla-speaking students.