Variable selection in principal component analysis : using measures of multivariate association.

This thesis is concerned with the problem of selection of important variables in Principal Component Analysis (PCA) in such a way that the selected subsets of variables retain, as much as possible, the overall multivariate structure of the complete data. Throughout the thesis, the criteria used in o...

Full description

Bibliographic Details
Main Author: Sithole, Moses M.
Format: Thesis
Language:English
Published: Curtin University 1992
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/2112
_version_ 1848743862321807360
author Sithole, Moses M.
author_facet Sithole, Moses M.
author_sort Sithole, Moses M.
building Curtin Institutional Repository
collection Online Access
description This thesis is concerned with the problem of selection of important variables in Principal Component Analysis (PCA) in such a way that the selected subsets of variables retain, as much as possible, the overall multivariate structure of the complete data. Throughout the thesis, the criteria used in order to meet this requirement are collectively referred to as measures of Multivariate Association (MVA). Most of the currently available selection methods may lead to inappropriate subsets, while Krzanowskis (1987) M(subscript)2-Procrustes criterion successfully identifies structure-bearing variables particularly when groups are present in the data. Our major objective, however, is to utilize the idea of multivariate association to select subsets of the original variables which preserve any (unknown) multivariate structure that may be present in the data.The first part of the thesis is devoted to a study of the choice of the number of components (say, k) to be used in the variable selection process. Various methods that exist in the literature for choosing k are described, and comparative studies on these methods are reviewed. Currently available methods based exclusively on the eigenvalues of the covariance or correlation matrices, and those based on cross-validation are unsatisfactory. Hence, we propose a new technique for choosing k based on the bootstrap methodology. A full comparative study of this new technique and the cross-validatory choice of k proposed by Eastment and Krzanowski (1982) is then carried out using data simulated from Monte Carlo experiment.The remainder of the thesis focuses on variable selection in PCA using measures of MVA. Various existing selection methods are described, and comparative studies on these methods available in the literature are reviewed. New methods for selecting variables, based of measures of MVA are then proposed and compared among themselves as well as with the M(subscript)2-procrustes criterion. This comparison is based on Monte Carlo simulation, and the behaviour of the selection methods is assessed in terms of the performance of the selected variables.In summary, the Monte Carlo results suggest that the proposed bootstrap technique for choosing k generally performs better than the cross-validatory technique of Eastment and Krzanowski (1982). Similarly, the Monte Carlo comparison of the variable selection methods shows that the proposed methods are comparable with or better than Krzanowskis (1987) M(subscript)2-procrustes criterion. These conclusions are mainly based on data simulated by means of Monte Carlo experiments. However, these techniques for choosing k and the various variable selection techniques are also evaluated on some real data sets. Some comments on alternative approaches and suggestions for possible extensions conclude the thesis.
first_indexed 2025-11-14T05:52:19Z
format Thesis
id curtin-20.500.11937-2112
institution Curtin University Malaysia
institution_category Local University
language English
last_indexed 2025-11-14T05:52:19Z
publishDate 1992
publisher Curtin University
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-21122019-03-27T01:58:05Z Variable selection in principal component analysis : using measures of multivariate association. Sithole, Moses M. principal component analysis multivariate association variable selection This thesis is concerned with the problem of selection of important variables in Principal Component Analysis (PCA) in such a way that the selected subsets of variables retain, as much as possible, the overall multivariate structure of the complete data. Throughout the thesis, the criteria used in order to meet this requirement are collectively referred to as measures of Multivariate Association (MVA). Most of the currently available selection methods may lead to inappropriate subsets, while Krzanowskis (1987) M(subscript)2-Procrustes criterion successfully identifies structure-bearing variables particularly when groups are present in the data. Our major objective, however, is to utilize the idea of multivariate association to select subsets of the original variables which preserve any (unknown) multivariate structure that may be present in the data.The first part of the thesis is devoted to a study of the choice of the number of components (say, k) to be used in the variable selection process. Various methods that exist in the literature for choosing k are described, and comparative studies on these methods are reviewed. Currently available methods based exclusively on the eigenvalues of the covariance or correlation matrices, and those based on cross-validation are unsatisfactory. Hence, we propose a new technique for choosing k based on the bootstrap methodology. A full comparative study of this new technique and the cross-validatory choice of k proposed by Eastment and Krzanowski (1982) is then carried out using data simulated from Monte Carlo experiment.The remainder of the thesis focuses on variable selection in PCA using measures of MVA. Various existing selection methods are described, and comparative studies on these methods available in the literature are reviewed. New methods for selecting variables, based of measures of MVA are then proposed and compared among themselves as well as with the M(subscript)2-procrustes criterion. This comparison is based on Monte Carlo simulation, and the behaviour of the selection methods is assessed in terms of the performance of the selected variables.In summary, the Monte Carlo results suggest that the proposed bootstrap technique for choosing k generally performs better than the cross-validatory technique of Eastment and Krzanowski (1982). Similarly, the Monte Carlo comparison of the variable selection methods shows that the proposed methods are comparable with or better than Krzanowskis (1987) M(subscript)2-procrustes criterion. These conclusions are mainly based on data simulated by means of Monte Carlo experiments. However, these techniques for choosing k and the various variable selection techniques are also evaluated on some real data sets. Some comments on alternative approaches and suggestions for possible extensions conclude the thesis. 1992 Thesis http://hdl.handle.net/20.500.11937/2112 en Curtin University fulltext
spellingShingle principal component analysis
multivariate association
variable selection
Sithole, Moses M.
Variable selection in principal component analysis : using measures of multivariate association.
title Variable selection in principal component analysis : using measures of multivariate association.
title_full Variable selection in principal component analysis : using measures of multivariate association.
title_fullStr Variable selection in principal component analysis : using measures of multivariate association.
title_full_unstemmed Variable selection in principal component analysis : using measures of multivariate association.
title_short Variable selection in principal component analysis : using measures of multivariate association.
title_sort variable selection in principal component analysis : using measures of multivariate association.
topic principal component analysis
multivariate association
variable selection
url http://hdl.handle.net/20.500.11937/2112