Topic modeling-based prediction of software defects and root cause using BERTopic, and multioutput classifier

基于主题建模的软件缺陷和根本原因预测(使用 BERTopic 和多输出分类器)

阅读:1

Abstract

The occurrence of software defects remains a major obstacle in software engineering, resulting in costly debugging and maintenance efforts. This study introduces a new angle for software defect prediction (SDP), utilizing advanced natural language processing (NLP) and machine learning (ML) techniques. In this work, the proposed methodology, BERT-MOC, combines the power of BERTopic, a transformer-based topic modeling technique, with a multioutput classifier to predict software defects and the root cause (reason) of defects. BERTopic is used to extract the root cause of the defect from textual descriptions of software defects, creating a meaningful representation of the software artifacts. These topic representations are then combined with the defect log data set.A multi-output classifier is trained on the combined dataset to predict multiple outputs, i.e., defect/not defect and defect root cause, simultaneously. As an estimator, Logistic Regression, Decision Tree Classifier, Kneighbor Classifier, Random Forest Classifier, and Ensemble Method-Voting are included in the MultiOutput Classifier. The proposed model is evaluated by the metrics hamming loss, accuracy, F1-score, precision, recall, and Jaccard similarity. The multi-output classifier with ensemble method voting as an estimator achieved the highest performance with 97% accuracy and F1-score to predict the root cause of the defect and 94% accuracy and F1-score to predict defect or not.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。