Abstract
BACKGROUND: Topic modeling approaches, such as latent Dirichlet allocation (LDA) and its successor, the dynamic topic model (DTM), are widely used to identify specific topics by extracting words with similar frequencies from documents. However, these topics often require manual interpretation, which poses challenges in constructing semantics topic evolution, mainly when topics contain negations, synonyms, or rare terms. Neural network-based word embeddings, such as Word2vec and FastText, have advanced semantic understanding but have their limitations. Word2Vec struggles with out-of-vocabulary (OOV) words, and FastText generates suboptimal embeddings for infrequent terms. METHODS: This study introduces Fast2Vec, a novel model that integrates the semantic capabilities of Word2Vec with the subword analysis strength of FastText to enhance semantic analysis in topic modeling. The model was evaluated using research abstracts from the Science and Technology Index (SINTA) journal database and validated using twelve public word similarity benchmarks, covering diverse semantic and syntactic dimensions. Evaluation metrics include Spearman and Pearson correlation coefficients to assess the alignment with human judgments. RESULTS: Experimental findings demonstrated that Fast2Vec outperforms or closely matches Word2Vec and FastText across most benchmark datasets, particularly in task requiring fine-grained semantic similarity. In OOV scenarios, Fast2Vec improved semantic similarity by 39.64% compared to Word2Vec, and 6.18% compared to FastText. Even in scenarios without OOV terms, Fast2Vec achieved a 7.82% improvement over FastText and a marginal 0.087% improvement over Word2Vec. Additionally, the model effectively categorized topics into four distinct evolution patterns (diffusion, shifting, moderate fluctuations, and stability), enabling a deeper understanding of evolution topic interests and their dynamic characteristics. CONCLUSION: Fast2Vec presents a robust and generalizable word embedding framework for semantic-based topic modeling. By combining the contextual sensitivity of Word2Vec with the subword flexibility of FastText, Fast2Vec effectively addresses prior limitations in handling OOV terms and semantic variation and demonstrates strong potential for boarder applications in natural language processing tasks.