mdBIRCH for Fast, Scalable, Online Clustering of Molecular Dynamics Trajectories

mdBIRCH 用于快速、可扩展的在线分子动力学轨迹聚类

阅读:1

Abstract

We present mdBIRCH, an online clustering method that adapts the BIRCH CF-tree to molecular dynamics (MD) data by using a merge test calibrated directly to RMSD. Each arriving frame is routed to the nearest centroid and added only if the post-merge radius computed from the cluster feature remains within a user-supplied threshold. This keeps the average deviation to each cluster centroid bounded as the cluster grows and preserves a simple interpretation of resolution in physical units. We evaluate mdBIRCH on a β -heptapeptide and the HP35 system. We propose two protocols to make the threshold selection easier: (a) RMSD-anchored runs that use controlled structural edits to define interpretable operating points and (b) blind sweep that tracks how cluster count, occupancy, and coverage change with the threshold. In both systems, increasing the threshold reduces the number of clusters, concentrates coverage in high-occupancy states, and broadens within-cluster RMSD distributions. Furthermore, because decisions rely only on cluster summaries, mdBIRCH completely avoids the need for pairwise distance matrices, scales near-linearly with the number of frames on standard hardware, and naturally supports incremental operation. The method offers a practical combination of speed and interpretability for large-scale trajectory analysis.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。