H-NGPCA: Hierarchical clustering of data streams with adaptive number of clusters and adaptive dimensionality

H-NGPCA:具有自适应聚类数和自适应维度的数据流层次聚类

阅读:1

Abstract

We present H-NGPCA, a hierarchical clustering algorithm for data streams that integrates an adaptive unit number growth and local dimensionality control. Unlike existing algorithm, H-NGPCA combines the characteristics of centroid-based, model-based and hierarchical clustering. H-NGPCA builds a hierarchical structure of local Principal Component Analysis (PCA) units, where each unit is a hyper-ellipsoid whose shape is updated by a neural network-based online PCA method. The re-positioning of each unit is handled by Neural Gas, a centroid-based clustering algorithm. In the hierarchical tree structure, new units are created in a branch if suggested by a splitting criterion. In addition, each unit determines its own dimensionality based on the data represented by the unit. In extensive benchmarks, H-NGPCA not only surpasses all competing online algorithms with adaptive unit numbers but also achieves competitive performance with state-of-the-art offline methods, reaching an average NMI = 0.87 and CI = 0.26. This demonstrates that H-NGPCA achieves both online adaptability and offline-level accuracy.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。