| Summary: | The feasibility of building a software defect prediction (SDP) model in the absence of previous records has been increased by the introduction of the Cross-Project Defect Prediction (CPDP) method. Although this method overcomes the limitations of SDP in the absence of previous historical records, the predictive performance of the CPDP model is relatively poor due to distribution discrepancy between the source and the target datasets. To overcome this challenge, various studies have been published. This SLR was conducted after analyzing research articles published since 2013 in four digital libraries: Scopus, IEEE, Science Direct, and Google Scholar. In this work, five research questions covering the classification algorithms, dataset, independent variables, performance evaluation metrics used in CPDP studies, and as well as the performance of individual machine learning classification algorithms in predicting software defects across different software projects were addressed accordingly. To respond to outlined questions, 34 most relevant articles were selected after passing through quality assessment criteria. Through this work, it was discovered the majority of the selected studies used machine learning techniques as classification algorithms, and 64% of the studies used the combination of Object-Oriented (OO) and Line of Code (LOC) metrics. All the selected studies used publicly available datasets from NASA, PROMISE, SOFLAB, AEEEM, and Relink. The most commonly used evaluation metrics are F_measure and AUC. Best performing classifiers include Logistic Regression and SVM. Despite various efforts to improve the performance of the CPDP model, the performance is below the applicable level. Thus, there is a need for further study that will improve the performance of the CPDP model.
|