Cross-project software defect prediction

The feasibility of building a software defect prediction (SDP) model in the absence of previous records has been increased by the introduction of the Cross-Project Defect Prediction (CPDP) method. Although this method overcomes the limitations of SDP in the absence of previous historical records, th...

Full description

Bibliographic Details
Main Authors: Bala, Yahaya Zakariyau, Abdul Samat, Pathiah, Sharif, Khaironi Yatim, Manshor, Noridayu
Format: Article
Published: Little Lion Scientific 2022
Online Access:http://psasir.upm.edu.my/id/eprint/100844/
_version_ 1848863429013536768
author Bala, Yahaya Zakariyau
Abdul Samat, Pathiah
Sharif, Khaironi Yatim
Manshor, Noridayu
author_facet Bala, Yahaya Zakariyau
Abdul Samat, Pathiah
Sharif, Khaironi Yatim
Manshor, Noridayu
author_sort Bala, Yahaya Zakariyau
building UPM Institutional Repository
collection Online Access
description The feasibility of building a software defect prediction (SDP) model in the absence of previous records has been increased by the introduction of the Cross-Project Defect Prediction (CPDP) method. Although this method overcomes the limitations of SDP in the absence of previous historical records, the predictive performance of the CPDP model is relatively poor due to distribution discrepancy between the source and the target datasets. To overcome this challenge, various studies have been published. This SLR was conducted after analyzing research articles published since 2013 in four digital libraries: Scopus, IEEE, Science Direct, and Google Scholar. In this work, five research questions covering the classification algorithms, dataset, independent variables, performance evaluation metrics used in CPDP studies, and as well as the performance of individual machine learning classification algorithms in predicting software defects across different software projects were addressed accordingly. To respond to outlined questions, 34 most relevant articles were selected after passing through quality assessment criteria. Through this work, it was discovered the majority of the selected studies used machine learning techniques as classification algorithms, and 64% of the studies used the combination of Object-Oriented (OO) and Line of Code (LOC) metrics. All the selected studies used publicly available datasets from NASA, PROMISE, SOFLAB, AEEEM, and Relink. The most commonly used evaluation metrics are F_measure and AUC. Best performing classifiers include Logistic Regression and SVM. Despite various efforts to improve the performance of the CPDP model, the performance is below the applicable level. Thus, there is a need for further study that will improve the performance of the CPDP model.
first_indexed 2025-11-15T13:32:46Z
format Article
id upm-100844
institution Universiti Putra Malaysia
institution_category Local University
last_indexed 2025-11-15T13:32:46Z
publishDate 2022
publisher Little Lion Scientific
recordtype eprints
repository_type Digital Repository
spelling upm-1008442023-08-16T08:51:45Z http://psasir.upm.edu.my/id/eprint/100844/ Cross-project software defect prediction Bala, Yahaya Zakariyau Abdul Samat, Pathiah Sharif, Khaironi Yatim Manshor, Noridayu The feasibility of building a software defect prediction (SDP) model in the absence of previous records has been increased by the introduction of the Cross-Project Defect Prediction (CPDP) method. Although this method overcomes the limitations of SDP in the absence of previous historical records, the predictive performance of the CPDP model is relatively poor due to distribution discrepancy between the source and the target datasets. To overcome this challenge, various studies have been published. This SLR was conducted after analyzing research articles published since 2013 in four digital libraries: Scopus, IEEE, Science Direct, and Google Scholar. In this work, five research questions covering the classification algorithms, dataset, independent variables, performance evaluation metrics used in CPDP studies, and as well as the performance of individual machine learning classification algorithms in predicting software defects across different software projects were addressed accordingly. To respond to outlined questions, 34 most relevant articles were selected after passing through quality assessment criteria. Through this work, it was discovered the majority of the selected studies used machine learning techniques as classification algorithms, and 64% of the studies used the combination of Object-Oriented (OO) and Line of Code (LOC) metrics. All the selected studies used publicly available datasets from NASA, PROMISE, SOFLAB, AEEEM, and Relink. The most commonly used evaluation metrics are F_measure and AUC. Best performing classifiers include Logistic Regression and SVM. Despite various efforts to improve the performance of the CPDP model, the performance is below the applicable level. Thus, there is a need for further study that will improve the performance of the CPDP model. Little Lion Scientific 2022-08-15 Article PeerReviewed Bala, Yahaya Zakariyau and Abdul Samat, Pathiah and Sharif, Khaironi Yatim and Manshor, Noridayu (2022) Cross-project software defect prediction. Journal of Theoretical and Applied Information Technology, 100 (15). 4825 - 4833. ISSN 1992-8645; ESSN: 1817-3195 http://www.jatit.org/volumes/onehundred15.php
spellingShingle Bala, Yahaya Zakariyau
Abdul Samat, Pathiah
Sharif, Khaironi Yatim
Manshor, Noridayu
Cross-project software defect prediction
title Cross-project software defect prediction
title_full Cross-project software defect prediction
title_fullStr Cross-project software defect prediction
title_full_unstemmed Cross-project software defect prediction
title_short Cross-project software defect prediction
title_sort cross-project software defect prediction
url http://psasir.upm.edu.my/id/eprint/100844/
http://psasir.upm.edu.my/id/eprint/100844/