Abstract
To address the growing demand for efficient public opinion analysis in water conservancy and related domains, as well as the inefficiencies and limited scalability of existing automated web data extraction algorithms for multi-source datasets, this research integrates advanced technologies including big data analytics, natural language processing, and deep learning. A novel, transferable web information extraction model based on deep learning (WIEM-DL) is proposed, leveraging knowledge graphs, machine learning, and ontology-based methods. This model is designed to adapt to varying website structures, enabling effective cross-website information extraction. By refining water conservancy-related online public opinion content and extracting key feature information from critical sentences, the WIEM-DL model excels in locating main content while filtering out noise. This approach not only reduces processing time but also significantly improves extraction accuracy and efficiency. Furthermore, the model establishes methods for micro-level public opinion information extraction and feature representation, creating a fusion space for data-level integration. This serves as a robust foundation for multi-granularity semantic knowledge integration in public opinion big data. Experimental results demonstrate that the WIEM-DL model substantially outperforms traditional information extraction methods, setting a new benchmark for extraction performance.