FLACON: An Information-Theoretic Approach to Flag-Aware Contextual Clustering for Large-Scale Document Organization

FLACON:一种面向大规模文档组织的基于信息论的、面向标志的上下文聚类方法

阅读:1

Abstract

Enterprise document management faces a significant challenge: traditional clustering methods focus solely on content similarity while ignoring organizational context, such as priority, workflow status, and temporal relevance. This paper introduces FLACON (Flag-Aware Context-sensitive Clustering), an information-theoretic approach that captures multi-dimensional document context through a six-dimensional flag system encompassing Type, Domain, Priority, Status, Relationship, and Temporal dimensions. FLACON formalizes document clustering as an entropy minimization problem, where the objective is to group documents with similar contextual characteristics. The approach combines a composite distance function-integrating semantic content, contextual flags, and temporal factors-with adaptive hierarchical clustering and efficient incremental updates. This design addresses key limitations of existing solutions, including context-aware systems that lack domain-specific intelligence and LLM-based methods that require prohibitive computational resources. Evaluation across nine dataset variations demonstrates notable improvements over traditional methods, including a 7.8-fold improvement in clustering quality (Silhouette Score: 0.311 vs. 0.040) and performance comparable to GPT-4 (89% of quality) while being ~7× faster (60 s vs. 420 s for 10 K documents). FLACON achieves O(m log n) complexity for incremental updates affecting m documents and provides deterministic behavior, which is suitable for compliance requirements. Consistent performance across business emails, technical discussions, and financial news confirms the practical viability of this approach for large-scale enterprise document organization.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。