Influence of Explanatory Variable Distributions on the Behavior of the Impurity Measures Used in Classification Tree Learning

解释变量分布对分类树学习中使用的杂质度量行为的影响

阅读:1

Abstract

The primary objective of our study is to analyze how the nature of explanatory variables influences the values and behavior of impurity measures, including the Shannon, Rényi, Tsallis, Sharma-Mittal, Sharma-Taneja, and Kapur entropies. Our analysis aims to use these measures in the interactive learning of decision trees, particularly in the tie-breaking situations where an expert needs to make a decision. We simulate the values of explanatory variables from various probability distributions in order to consider a wide range of variability and properties. These probability distributions include the normal, Cauchy, uniform, exponential, and two beta distributions. This research assumes that the values of the binary responses are generated from the logistic regression model. All of the six mentioned probability distributions of the explanatory variables are presented in the same graphical format. The first two graphs depict histograms of the explanatory variables values and their corresponding probabilities generated by a particular model. The remaining graphs present distinct impurity measures with different parameters. In order to examine and discuss the behavior of the obtained results, we conduct a sensitivity analysis of the algorithms with regard to the entropy parameter values. We also demonstrate how certain explanatory variables affect the process of interactive tree learning.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。