Abstract
Offline constraint-based causal feature selection (OC-CFS) algorithms are essential for identifying causal relationships from observational data. However, existing methods often suffer from limitations such as low prediction accuracy or high computational cost, particularly when sample sizes vary. To address these limitations, we propose Triplet, a novel framework that leverages the HITON-MB Parents and Children (PC) strategy to identify strongly relevant PC nodes while eliminating irrelevant and redundant features. It concurrently employs the BAMB strategy to detect relevant spouses and discard irrelevant ones, and applies the STMB non-Markov Blanket (non-MB) strategy to identify and exclude non-MB descendants. Through this integration, the proposed T-OCD[Formula: see text] overcomes these limitations, accurately identifying the true MB with high prediction accuracy and reduced runtime. To validate its effectiveness, we evaluated T-OCD[Formula: see text] on benchmark Bayesian networks (BNs) and real-world datasets. Extensive experimental results demonstrate that T-OCD[Formula: see text] achieves significant improvements in both prediction accuracy and computational efficiency compared to existing methods. On small sample sizes (n=500), T-OCD[Formula: see text] achieved the highest recall in 5 out of 7 datasets, with an average improvement of over 20% compared to rivals. On large sample sizes (n=5000), it excelled in precision, achieving the top score in 4 out of 7 datasets with an average precision of 94%. Computationally, T-OCD[Formula: see text] is highly efficient, operating as the second-fastest method overall. It ran over 55% faster than half of the benchmarks and a remarkable 35% faster than the average competitor on large datasets. The source code for this research is available at the following repository: https://github.com/vickykhan89/T-OCDmb.