Abstract
BACKGROUND: Health literacy is a critical determinant of public health. In China, the 56-item national questionnaire limits large-scale implementation due to its length. This study aimed to develop and evaluate a brief health literacy classification model by integrating psychometric evaluation with classification modeling. METHODS: We conducted a cross-sectional study with 19,092 participants in Ningbo, China (2022–2024). The 56-item questionnaire was reduced using item response theory, exploratory factor analysis, and reliability analysis. Selected items and demographic variables were further refined using LASSO regression and Bayesian Information Criterion based logistic modeling. Model performance was assessed through area under the curve (AUC) and calibration in temporal testing. RESULTS: Fifteen items met all psychometric criteria (Cronbach’s α = 0.840). LASSO retained 19 variables; final modeling yielded 17 variables. The nomogram-based model showed excellent discrimination (AUC = 0.952 training; 0.949 testing). Sensitivity and specificity exceeded 86%, with negative predictive value over 93%. Calibration remained strong with minimal performance degradation. CONCLUSIONS: The evaluated 15-item model offers a brief, reliable alternative to the national questionnaire. Its high classification performance and reduced burden support integration into health surveillance systems and electronic health records. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12889-026-26661-5.