Results
We propose to tackle these challenges with Parallelized Split Merge Sampling on Dirichlet Process Mixture Model (the Para-DPMM model). Unlike classic DPMM methods that perform sampling on each single data point, the split merge mechanism samples on the cluster level, which significantly improves convergence and optimality of the result. The model is highly parallelized and can utilize the computing power of high performance computing (HPC) clusters, enabling massive inference on huge datasets. Experiment results show the model outperforms current widely used models in both clustering quality and computational speed. Availability and implementation: Source code is publicly available on https://github.com/tiehangd/Para_DPMM/tree/master/Para_DPMM_package.
Supplementary Information
Supplementary data are available at Bioinformatics online.
