Consensus clustering-based undersampling for improved classification of transient events in time-domain astronomy surveys

基于共识聚类的欠采样方法用于改进时域天文巡天中瞬变事件的分类

阅读:1

Abstract

Astronomical data analytics has rapidly expanded given the advancement of data handling techniques and computing system. The race to discover new events is subject to acquiring and digesting the high volume of data from sky surveys efficiently, yet accurately. The assumption is valid for many modern astronomy projects, with the issue of big data storage on the one hand, and effective data analysis on the other. This research deals with the latter by focusing on the classification of potential transient events initially detected in time-domain astronomical surveys. Most of these candidate transients represent false positives that are the results of fault in hardware, errors in data collection and/or data pre-processing. Hence, the ability to filter these out is much needed to avoid a laborious manual assessment down the line. The problem investigated here is that training data can be highly imbalanced. For the first attempt, the coupling of oversampling methods and several classifiers provides an improvement, but generally leads to overfitting. As a solution, this paper presents a novel application of consensus clustering to undersample majority-class instances instead. It not only helps to overcome the aforementioned drawback but also strengthen the recent approach that exploits a single clustering to guide the selection of representative samples.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。