Outlier detection plays a key role in data analysis by improving data quality, uncovering data entry errors, and spotting unusual patterns, such as fraudulent activities. Choosing the right detection method is essential, as some approaches may be too complex or ineffective depending on the data distribution. In this study, we explore a simple yet powerful approach using the range distribution to identify outliers in univariate data. We compare the effectiveness of two range statistics: we normalize the range by the standard deviation (Ï) and the interquartile range (IQR) across different types of distributions, including normal, logistic, Laplace, and Weibull distributions, with varying sample sizes (n) and error rates (α). An evaluation of the range behavior across multiple distributions allows for the determination of threshold values for identifying potential outliers. Through extensive experimental work, the accuracy of both statistics in detecting outliers under various contamination strategies, sample sizes, and error rates (α=0.1,0.05,0.01) is investigated. The results demonstrate the flexibility of the proposed statistic, as it adapts well to different underlying distributions and maintains robust detection performance under a variety of conditions. Our findings underscore the value of an adaptive method for reliable anomaly detection in diverse data environments.
Empirical Evaluation of the Relative Range for Detecting Outliers.
阅读:6
作者:Dallah Dania, Sulieman Hana, Zaatreh Ayman Al, Kamalov Firuz
| 期刊: | Entropy | 影响因子: | 2.000 |
| 时间: | 2025 | 起止号: | 2025 Jul 7; 27(7):731 |
| doi: | 10.3390/e27070731 | ||
特别声明
1、本文转载旨在传播信息,不代表本网站观点,亦不对其内容的真实性承担责任。
2、其他媒体、网站或个人若从本网站转载使用,必须保留本网站注明的“来源”,并自行承担包括版权在内的相关法律责任。
3、如作者不希望本文被转载,或需洽谈转载稿费等事宜,请及时与本网站联系。
4、此外,如需投稿,也可通过邮箱info@biocloudy.com与我们取得联系。
