Abstract
Native advertising has rapidly evolved into a predominant online marketing strategy, adequately blending with editorial content to the extent that readers often remain unaware they are involved with sponsored material. This integration, spanning diverse formats like text articles, videos, and social media posts, enriches its application but concurrently escalates the detection challenge. The nuanced identification and comprehension of native advertising necessitates advanced methodologies, underpinned by datasets that are both relevant and expansive, ensuring transparency in online environments. This paper elucidates a systematic approach to data collection and annotation, aiming to construct a specialized dataset poised to significantly enhance native ads detection efforts. Through the meticulous aggregation of news from six mainstream electronic news portals in Indonesia, the study extends beyond mere identification of native ads to also encompass four additional implicit characteristics typically associated with such content. The resulting, carefully annotated dataset emerges as an invaluable resource, promising to propel the development and evaluation of sophisticated native ads detection algorithms. This advancement holds the potential to augment transparency within online advertising and contribute to the clearer delineation between pure editorial content and sponsored material.