Subspace-based dynamic selection for high-dimensional data

The number of features collected has increased greatly in the past decade, particularly in medicine and life sciences, which brings challenges and opportunities. Making reliable predictions, exploring associations and extracting meaningful information in high-dimensional data are some of the problem...

Full description

Bibliographic Details
Main Author: Maciel-Guerra, Alexandre
Format: Thesis (University of Nottingham only)
Language:English
Published: 2022
Subjects:
Online Access:https://eprints.nottingham.ac.uk/71623/
_version_ 1848800675952066560
author Maciel-Guerra, Alexandre
author_facet Maciel-Guerra, Alexandre
author_sort Maciel-Guerra, Alexandre
building Nottingham Research Data Repository
collection Online Access
description The number of features collected has increased greatly in the past decade, particularly in medicine and life sciences, which brings challenges and opportunities. Making reliable predictions, exploring associations and extracting meaningful information in high-dimensional data are some of the problems that are yet to be solved. Due to intrinsic properties of high-dimensional spaces such as distance concentration and hubness, traditional classification and clustering algorithms face difficult challenges. In general, a Multiple Classifier System (MCS) provides better classification accuracy than individual classifiers. One of the most promising approaches to MCS is Dynamic Selection (DS) methods, which work by selecting classifiers on the fly, according to each unknown test sample. The rationale behind this is that not every classifier is an expert in predicting all samples, rather each classifier or a combination of classifiers is an expert in a different region of the feature space; whose quality can significantly impact the overall performance. This thesis provides three major contributions. First, traditional DS methods fail to perform effectively in high-dimensional data sets due to the use of a k-Nearest Neighbour (k-NN) to define the region competence and, moreover, they do not indicate which are the most important features for classification. Second, two frameworks were proposed the Subspace-Based Dynamic Selection (SBDS) and the Classifier SBDS (cSBDS) which integrate characteristics of DS methods and subspace clustering. Subspace clustering methods localise their search for clusters and are able to uncover clusters that exist in multiple, possible overlapping subspaces of features and/or samples. The subspace clustering approach separates the high-dimensional feature space into small feature spaces with a reduced number of features and samples in each one. The results indicate that the cSBDS framework performs statistically better when compared to DS methods and majority voting on real-world and synthetic datasets. Third, we provide a comparison between the features selected by the cSBDS framework and feature importance methods. The results indicate that for high-dimensional datasets, the cSBDS framework is able to capture the most important features when the number of clusters per class is increased, while traditional feature importance methods lose this capability.
first_indexed 2025-11-14T20:55:20Z
format Thesis (University of Nottingham only)
id nottingham-71623
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T20:55:20Z
publishDate 2022
recordtype eprints
repository_type Digital Repository
spelling nottingham-716232024-02-22T13:50:14Z https://eprints.nottingham.ac.uk/71623/ Subspace-based dynamic selection for high-dimensional data Maciel-Guerra, Alexandre The number of features collected has increased greatly in the past decade, particularly in medicine and life sciences, which brings challenges and opportunities. Making reliable predictions, exploring associations and extracting meaningful information in high-dimensional data are some of the problems that are yet to be solved. Due to intrinsic properties of high-dimensional spaces such as distance concentration and hubness, traditional classification and clustering algorithms face difficult challenges. In general, a Multiple Classifier System (MCS) provides better classification accuracy than individual classifiers. One of the most promising approaches to MCS is Dynamic Selection (DS) methods, which work by selecting classifiers on the fly, according to each unknown test sample. The rationale behind this is that not every classifier is an expert in predicting all samples, rather each classifier or a combination of classifiers is an expert in a different region of the feature space; whose quality can significantly impact the overall performance. This thesis provides three major contributions. First, traditional DS methods fail to perform effectively in high-dimensional data sets due to the use of a k-Nearest Neighbour (k-NN) to define the region competence and, moreover, they do not indicate which are the most important features for classification. Second, two frameworks were proposed the Subspace-Based Dynamic Selection (SBDS) and the Classifier SBDS (cSBDS) which integrate characteristics of DS methods and subspace clustering. Subspace clustering methods localise their search for clusters and are able to uncover clusters that exist in multiple, possible overlapping subspaces of features and/or samples. The subspace clustering approach separates the high-dimensional feature space into small feature spaces with a reduced number of features and samples in each one. The results indicate that the cSBDS framework performs statistically better when compared to DS methods and majority voting on real-world and synthetic datasets. Third, we provide a comparison between the features selected by the cSBDS framework and feature importance methods. The results indicate that for high-dimensional datasets, the cSBDS framework is able to capture the most important features when the number of clusters per class is increased, while traditional feature importance methods lose this capability. 2022-12-14 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en cc_by https://eprints.nottingham.ac.uk/71623/1/PhD%20Thesis%20-%20Alexandre%20Maciel%20Guerra%20-%20October%202022.pdf Maciel-Guerra, Alexandre (2022) Subspace-based dynamic selection for high-dimensional data. PhD thesis, University of Nottingham. dynamic selection; multiple classifier system; subspace clustering; high-dimensional data; datasets
spellingShingle dynamic selection; multiple classifier system; subspace clustering; high-dimensional data; datasets
Maciel-Guerra, Alexandre
Subspace-based dynamic selection for high-dimensional data
title Subspace-based dynamic selection for high-dimensional data
title_full Subspace-based dynamic selection for high-dimensional data
title_fullStr Subspace-based dynamic selection for high-dimensional data
title_full_unstemmed Subspace-based dynamic selection for high-dimensional data
title_short Subspace-based dynamic selection for high-dimensional data
title_sort subspace-based dynamic selection for high-dimensional data
topic dynamic selection; multiple classifier system; subspace clustering; high-dimensional data; datasets
url https://eprints.nottingham.ac.uk/71623/