On Refining the SZZ Algorithm with Bug Discussion Data

利用缺陷讨论数据改进 SZZ 算法

阅读:1

Abstract

CONTEXT: Researchers testing hypotheses related to factors leading to low-quality software often rely on historical data, specifically on details regarding when defects were introduced into a codebase of interest. The prevailing techniques to determine the introduction of defects revolve around variants of the SZZ algorithm. This algorithm leverages information on the lines modified during a bug-fixing commit and finds when these lines were last modified, thereby identifying bug-introducing commits. OBJECTIVES: Despite several improvements and variants, SZZ struggles with accuracy, especially in cases of unrelated modifications or that touch files not involved in the introduction of the bug in the version control systems (aka tangled commit and ghost commits). METHODS: Our research investigates whether and how incorporating content retrieved from bug discussions can address these issues by identifying the related and external files and thus improve the efficacy of the SZZ algorithm. RESULTS: To conduct our investigation, we take advantage of the links manually inserted by Mozilla developers in bug reports to signal which commits inserted bugs. Thus, we prepared the dataset, RoTEB, comprised of 12,472 bug reports. We first manually inspect a sample of 369 bug reports related to these bug-fixing or bug-introducing commits and investigate whether the files mentioned in these reports could be useful for SZZ. After we found evidence that the mentioned files are relevant, we augment SZZ with this information, using different strategies, and evaluate the resulting approach against multiple SZZ variations. CONCLUSION: We define a taxonomy outlining the rationale behind developers' references to diverse files in their discussions. We observe that bug discussions often mention files relevant to enhancing the SZZ algorithm's efficacy. Then, we verify that integrating these file references augments the precision of SZZ in pinpointing bug-introducing commits. Yet, it does not markedly influence recall. These results deepen our comprehension of the usefulness of bug discussions for SZZ. Future work can leverage our dataset and explore other techniques to further address the problem of tangled commits and ghost commits. Data & material: https://zenodo.org/records/11484723.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。