Abstract
ViClickbait-2025 is a curated Vietnamese-language dataset developed to facilitate research on automatic clickbait detection. It comprises 3414 headline samples collected through web scraping from eight major Vietnamese online news platforms between 2023 and 2025. Each headline is annotated as either clickbait or non-clickbait, with 31.2 % labeled as clickbait. The dataset includes nine key attributes, covering headline text, metadata, article summaries, and simulated engagement indicators. A preprocessing pipeline was applied to remove HTML noise, eliminate duplicates, and normalize the data. Annotation was carried out by three independent reviewers using a standardized guideline, with inter-annotator agreement reaching a Cohen's Kappa of 0.822. Disagreements were resolved by a fourth annotator, and inconclusive cases were excluded. The final dataset spans 13 news categories and is released in JSONL and CSV formats under a CC BY 4.0 license.