Duplicate bug report detection using clustering

Bug reporting and fixing the reported bugs play a critical part in the development and maintenance of software systems. The software developers and end users can collaborate in this process to improve the reliability of software systems. Various end users report the defects they have found in the so...

Full description

Bibliographic Details
Main Authors: Gopalan, Raj, Krishna, Aneesh
Other Authors: Jim Steel
Format: Conference Paper
Published: IEEE 2014
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/11144
_version_ 1848747726585462784
author Gopalan, Raj
Krishna, Aneesh
author2 Jim Steel
author_facet Jim Steel
Gopalan, Raj
Krishna, Aneesh
author_sort Gopalan, Raj
building Curtin Institutional Repository
collection Online Access
description Bug reporting and fixing the reported bugs play a critical part in the development and maintenance of software systems. The software developers and end users can collaborate in this process to improve the reliability of software systems. Various end users report the defects they have found in the software and how these bugs affect them. However, the same defect may be reported independently by several users leading to a significant number of duplicate bug reports. There are a number of existing methods for detecting duplicate bug reports, but the best results so far account for only 24% of actual duplicates. In this paper, we propose a new method based on clustering to identify a larger proportion of duplicate bug reports while keeping the false positives of misidentified non-duplicates low. The proposed approach is experimentally evaluated on a large sample of bug reports from three public domain data sets. The results show that this approach achieves better performance in terms of a harmonic measure that combines true positive and true negative rates when compared to the existing methods.
first_indexed 2025-11-14T06:53:44Z
format Conference Paper
id curtin-20.500.11937-11144
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T06:53:44Z
publishDate 2014
publisher IEEE
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-111442023-02-13T08:01:37Z Duplicate bug report detection using clustering Gopalan, Raj Krishna, Aneesh Jim Steel Liming Zhu bug report duplicate detection Bugzilla clustering Bug reporting and fixing the reported bugs play a critical part in the development and maintenance of software systems. The software developers and end users can collaborate in this process to improve the reliability of software systems. Various end users report the defects they have found in the software and how these bugs affect them. However, the same defect may be reported independently by several users leading to a significant number of duplicate bug reports. There are a number of existing methods for detecting duplicate bug reports, but the best results so far account for only 24% of actual duplicates. In this paper, we propose a new method based on clustering to identify a larger proportion of duplicate bug reports while keeping the false positives of misidentified non-duplicates low. The proposed approach is experimentally evaluated on a large sample of bug reports from three public domain data sets. The results show that this approach achieves better performance in terms of a harmonic measure that combines true positive and true negative rates when compared to the existing methods. 2014 Conference Paper http://hdl.handle.net/20.500.11937/11144 10.1109/ASWEC.2014.31 IEEE fulltext
spellingShingle bug report
duplicate detection
Bugzilla
clustering
Gopalan, Raj
Krishna, Aneesh
Duplicate bug report detection using clustering
title Duplicate bug report detection using clustering
title_full Duplicate bug report detection using clustering
title_fullStr Duplicate bug report detection using clustering
title_full_unstemmed Duplicate bug report detection using clustering
title_short Duplicate bug report detection using clustering
title_sort duplicate bug report detection using clustering
topic bug report
duplicate detection
Bugzilla
clustering
url http://hdl.handle.net/20.500.11937/11144