Cross-project software defect prediction

The feasibility of building a software defect prediction (SDP) model in the absence of previous records has been increased by the introduction of the Cross-Project Defect Prediction (CPDP) method. Although this method overcomes the limitations of SDP in the absence of previous historical records, th...

Full description

Bibliographic Details
Main Authors:	Bala, Yahaya Zakariyau, Abdul Samat, Pathiah, Sharif, Khaironi Yatim, Manshor, Noridayu
Format:	Article
Published:	Little Lion Scientific 2022
Online Access:	http://psasir.upm.edu.my/id/eprint/100844/

_version_	1848863429013536768
author	Bala, Yahaya Zakariyau Abdul Samat, Pathiah Sharif, Khaironi Yatim Manshor, Noridayu
author_facet	Bala, Yahaya Zakariyau Abdul Samat, Pathiah Sharif, Khaironi Yatim Manshor, Noridayu
author_sort	Bala, Yahaya Zakariyau
building	UPM Institutional Repository
collection	Online Access
description	The feasibility of building a software defect prediction (SDP) model in the absence of previous records has been increased by the introduction of the Cross-Project Defect Prediction (CPDP) method. Although this method overcomes the limitations of SDP in the absence of previous historical records, the predictive performance of the CPDP model is relatively poor due to distribution discrepancy between the source and the target datasets. To overcome this challenge, various studies have been published. This SLR was conducted after analyzing research articles published since 2013 in four digital libraries: Scopus, IEEE, Science Direct, and Google Scholar. In this work, five research questions covering the classification algorithms, dataset, independent variables, performance evaluation metrics used in CPDP studies, and as well as the performance of individual machine learning classification algorithms in predicting software defects across different software projects were addressed accordingly. To respond to outlined questions, 34 most relevant articles were selected after passing through quality assessment criteria. Through this work, it was discovered the majority of the selected studies used machine learning techniques as classification algorithms, and 64% of the studies used the combination of Object-Oriented (OO) and Line of Code (LOC) metrics. All the selected studies used publicly available datasets from NASA, PROMISE, SOFLAB, AEEEM, and Relink. The most commonly used evaluation metrics are F_measure and AUC. Best performing classifiers include Logistic Regression and SVM. Despite various efforts to improve the performance of the CPDP model, the performance is below the applicable level. Thus, there is a need for further study that will improve the performance of the CPDP model.
first_indexed	2025-11-15T13:32:46Z
format	Article
id	upm-100844
institution	Universiti Putra Malaysia
institution_category	Local University
last_indexed	2025-11-15T13:32:46Z
publishDate	2022
publisher	Little Lion Scientific
recordtype	eprints
repository_type	Digital Repository
spelling	upm-1008442023-08-16T08:51:45Z http://psasir.upm.edu.my/id/eprint/100844/ Cross-project software defect prediction Bala, Yahaya Zakariyau Abdul Samat, Pathiah Sharif, Khaironi Yatim Manshor, Noridayu The feasibility of building a software defect prediction (SDP) model in the absence of previous records has been increased by the introduction of the Cross-Project Defect Prediction (CPDP) method. Although this method overcomes the limitations of SDP in the absence of previous historical records, the predictive performance of the CPDP model is relatively poor due to distribution discrepancy between the source and the target datasets. To overcome this challenge, various studies have been published. This SLR was conducted after analyzing research articles published since 2013 in four digital libraries: Scopus, IEEE, Science Direct, and Google Scholar. In this work, five research questions covering the classification algorithms, dataset, independent variables, performance evaluation metrics used in CPDP studies, and as well as the performance of individual machine learning classification algorithms in predicting software defects across different software projects were addressed accordingly. To respond to outlined questions, 34 most relevant articles were selected after passing through quality assessment criteria. Through this work, it was discovered the majority of the selected studies used machine learning techniques as classification algorithms, and 64% of the studies used the combination of Object-Oriented (OO) and Line of Code (LOC) metrics. All the selected studies used publicly available datasets from NASA, PROMISE, SOFLAB, AEEEM, and Relink. The most commonly used evaluation metrics are F_measure and AUC. Best performing classifiers include Logistic Regression and SVM. Despite various efforts to improve the performance of the CPDP model, the performance is below the applicable level. Thus, there is a need for further study that will improve the performance of the CPDP model. Little Lion Scientific 2022-08-15 Article PeerReviewed Bala, Yahaya Zakariyau and Abdul Samat, Pathiah and Sharif, Khaironi Yatim and Manshor, Noridayu (2022) Cross-project software defect prediction. Journal of Theoretical and Applied Information Technology, 100 (15). 4825 - 4833. ISSN 1992-8645; ESSN: 1817-3195 http://www.jatit.org/volumes/onehundred15.php
spellingShingle	Bala, Yahaya Zakariyau Abdul Samat, Pathiah Sharif, Khaironi Yatim Manshor, Noridayu Cross-project software defect prediction
title	Cross-project software defect prediction
title_full	Cross-project software defect prediction
title_fullStr	Cross-project software defect prediction
title_full_unstemmed	Cross-project software defect prediction
title_short	Cross-project software defect prediction
title_sort	cross-project software defect prediction
url	http://psasir.upm.edu.my/id/eprint/100844/ http://psasir.upm.edu.my/id/eprint/100844/

Cross-project software defect prediction

Similar Items