Detecting refactoring type of software commit messages based on ensemble machine learning algorithms

基于集成机器学习算法检测软件提交消息中的重构类型

阅读:1

Abstract

Refactoring is a well-established topic in contemporary software engineering, focusing on enhancing software's structural design without altering its external behavior. Commit messages play a vital role in tracking changes to the codebase. However, determining the exact refactoring required in the code can be challenging due to various refactoring types. Prior studies have attempted to classify refactoring documentation by type, achieving acceptable results in accuracy, precision, recall, F1-Score, and other performance metrics. Nevertheless, there is room for improvement. To address this, we propose a novel approach using four ensemble Machine Learning algorithms to detect refactoring types. Our experimentation utilized a dataset containing 573 commits, with text cleaning and preprocessing applied to address data imbalances. Various techniques, including hyperparameter optimization, feature engineering with TF-IDF and bag-of-words, and binary transformation using one-vs-one and one-vs-rest classifiers, were employed to enhance accuracy. Results indicate that the experiment involving feature engineering using the TF-IDF technique outperformed other methods. Notably, the XGBoost algorithm with the same technique achieved superior performance across all metrics, attaining 100% accuracy. Moreover, our results surpass the current state-of-the-art performance using the same dataset. Our proposed approach bears significant implications for software engineering, particularly in enhancing the internal quality of software.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。