Robust correlation feature selection based support vector machine approach for high dimensional datasets

Correlation-based feature selection methods are popular tools used to select the most important variables to include the true model in the analysis of sparse and high-dimensional models. In application, the presence of anomalous observations in both predictors and responses can seriously jeopardize...

Full description

Bibliographic Details
Main Authors: Baba, Ishaq Abdullahi, Mohammed, Mohammed Bappah, Jillahi, Kamal Bakari, Umar, Aliyu, Hendi, Hasan Talib
Format: Article
Language:English
Published: Elsevier B.V. 2025
Online Access:http://psasir.upm.edu.my/id/eprint/120119/
http://psasir.upm.edu.my/id/eprint/120119/1/120119.pdf
_version_ 1848868116339097600
author Baba, Ishaq Abdullahi
Mohammed, Mohammed Bappah
Jillahi, Kamal Bakari
Umar, Aliyu
Hendi, Hasan Talib
author_facet Baba, Ishaq Abdullahi
Mohammed, Mohammed Bappah
Jillahi, Kamal Bakari
Umar, Aliyu
Hendi, Hasan Talib
author_sort Baba, Ishaq Abdullahi
building UPM Institutional Repository
collection Online Access
description Correlation-based feature selection methods are popular tools used to select the most important variables to include the true model in the analysis of sparse and high-dimensional models. In application, the presence of anomalous observations in both predictors and responses can seriously jeopardize the prediction accuracy of the model, which in turn leads to misleading interpretations and conclusions if not correctly addressed. Furthermore, the cause of dimensionality is another serious difficulty facing many existing feature selection algorithms. To achieve more reliable feature selection and prediction accuracy, a weighted sure independence screening-based support vector machine for high-dimensional datasets is proposed. The key contribution of our proposed method is that it minimizes the influence of outliers in differentiating between significant and insignificant features and improves predictability and interpretability. Our method consists of three basic steps. In the first step, a weights-based modified reweighted fast, consistent, and high break-down point is computed. The second step utilizes the estimates of weights from the first step to select the most important variables for the model. The third step employs the support vector machine algorithm to calculate prediction values. To demonstrate the effectiveness of the developed procedure, we used both simulation and real-life data examples. Our results show that the proposed methods performs better with a clear margin compared to other procedures.
first_indexed 2025-11-15T14:47:16Z
format Article
id upm-120119
institution Universiti Putra Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T14:47:16Z
publishDate 2025
publisher Elsevier B.V.
recordtype eprints
repository_type Digital Repository
spelling upm-1201192025-09-23T07:29:05Z http://psasir.upm.edu.my/id/eprint/120119/ Robust correlation feature selection based support vector machine approach for high dimensional datasets Baba, Ishaq Abdullahi Mohammed, Mohammed Bappah Jillahi, Kamal Bakari Umar, Aliyu Hendi, Hasan Talib Correlation-based feature selection methods are popular tools used to select the most important variables to include the true model in the analysis of sparse and high-dimensional models. In application, the presence of anomalous observations in both predictors and responses can seriously jeopardize the prediction accuracy of the model, which in turn leads to misleading interpretations and conclusions if not correctly addressed. Furthermore, the cause of dimensionality is another serious difficulty facing many existing feature selection algorithms. To achieve more reliable feature selection and prediction accuracy, a weighted sure independence screening-based support vector machine for high-dimensional datasets is proposed. The key contribution of our proposed method is that it minimizes the influence of outliers in differentiating between significant and insignificant features and improves predictability and interpretability. Our method consists of three basic steps. In the first step, a weights-based modified reweighted fast, consistent, and high break-down point is computed. The second step utilizes the estimates of weights from the first step to select the most important variables for the model. The third step employs the support vector machine algorithm to calculate prediction values. To demonstrate the effectiveness of the developed procedure, we used both simulation and real-life data examples. Our results show that the proposed methods performs better with a clear margin compared to other procedures. Elsevier B.V. 2025-12 Article PeerReviewed text en cc_by_nc_nd_4 http://psasir.upm.edu.my/id/eprint/120119/1/120119.pdf Baba, Ishaq Abdullahi and Mohammed, Mohammed Bappah and Jillahi, Kamal Bakari and Umar, Aliyu and Hendi, Hasan Talib (2025) Robust correlation feature selection based support vector machine approach for high dimensional datasets. Results in Control and Optimization, 21. art. no. 100609. pp. 1-14. ISSN 2666-7207 https://linkinghub.elsevier.com/retrieve/pii/S2666720725000943 10.1016/j.rico.2025.100609
spellingShingle Baba, Ishaq Abdullahi
Mohammed, Mohammed Bappah
Jillahi, Kamal Bakari
Umar, Aliyu
Hendi, Hasan Talib
Robust correlation feature selection based support vector machine approach for high dimensional datasets
title Robust correlation feature selection based support vector machine approach for high dimensional datasets
title_full Robust correlation feature selection based support vector machine approach for high dimensional datasets
title_fullStr Robust correlation feature selection based support vector machine approach for high dimensional datasets
title_full_unstemmed Robust correlation feature selection based support vector machine approach for high dimensional datasets
title_short Robust correlation feature selection based support vector machine approach for high dimensional datasets
title_sort robust correlation feature selection based support vector machine approach for high dimensional datasets
url http://psasir.upm.edu.my/id/eprint/120119/
http://psasir.upm.edu.my/id/eprint/120119/
http://psasir.upm.edu.my/id/eprint/120119/
http://psasir.upm.edu.my/id/eprint/120119/1/120119.pdf