A Hybrid Machine Learning Framework to Improve Morphological Trait Recovery in Avian Datasets

一种用于提高鸟类数据集形态特征恢复的混合机器学习框架

阅读:3

Abstract

Missing data in morphological trait datasets pose a persistent challenge to ecological and evolutionary research, frequently compromising model inference and predictive accuracy. We propose THORBFNN, a three-stage hybrid imputation framework that integrates regularized K-means clustering, Radial Basis Function Neural Networks (RBFNNs), and hierarchical Bayesian optimization to accurately recover missing avian morphological traits. The framework partitions species into clusters using regularized K-means, enhancing the preservation of local morphological structure through inter-cluster separation. Within each cluster, RBFNNs model nonlinear dependencies among traits using input features selected by Pearson correlation with the target trait. Key hyperparameters such as the number of clusters and RBF width are optimized via hierarchical Bayesian optimization to balance generalization and model complexity. When applied to a global avian trait dataset comprising over 10,000 individuals and 11 morphological traits, THORBFNN outperforms K-nearest neighbors and Random Forest imputation across four focal traits, achieving higher R (2) and lower errors (THORBFNN: R (2) = 0.9003, RMSE = 0.1652, MAE = 0.1096; KNN: R (2) = 0.8864, RMSE = 0.1668, MAE = 0.1248; Random Forest: R (2) = 0.8573, RMSE = 0.2134, MAE = 0.1584). Ablation experiments comparing models trained on complete cases versus mean-imputed data confirm that THORBFNN captures genuine trait covariation rather than statistical artifacts. THORBFNN requires no phylogenetic information and scales efficiently to datasets with thousands of individuals, offering a practical pathway for integrating machine learning into biodiversity trait analysis.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。