Improved clustering using robust and classical principal component

k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set...

Full description

Bibliographic Details
Main Author:	Hassn, Ahmed Kadom
Format:	Thesis
Language:	English
Published:	2017
Subjects:	Algorithms
Online Access:	http://psasir.upm.edu.my/id/eprint/70922/ http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf

_version_	1848856829798383616
author	Hassn, Ahmed Kadom
author_facet	Hassn, Ahmed Kadom
author_sort	Hassn, Ahmed Kadom
building	UPM Institutional Repository
collection	Online Access
description	k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set is generally a trial-and-error process which made more difficult by the subjective nature of deciding what constitutes ‘correct’ clustering. When dimension of data is large it is often difficult to apply k-means clustering algorithm since it needs lots of computational times. To remedy this problem, we propose to integrate Principal Component analysis (PCA) which is useful for dimensionality reduction of a dataset with the k-means clustering algorithm. We call our propose method as k-means by principal components (pc1). In this study, the kernels that are created by using the k-means method are replaced with kernels which are created by using PCA method where the PCA method reduces the dimensionality of a data. The results of the study show that the k-means by PCA is faster and more efficient than the classical k-means algorithm. The classical k-means algorithm and the k-means by PCA algorithm are very sensitive to the presence of outlier. Hence the k-means by robust PCA is developed to rectify the problem of outliers in the dataset. The findings indicate that in the absence of outliers, the performances of both methods; the k-means by PCA and the k-means by robust PCA are equally good. Nonetheless, the k-means by robust PCA is not much affected by outliers compared to the k-means by classical PCA.
first_indexed	2025-11-15T11:47:53Z
format	Thesis
id	upm-70922
institution	Universiti Putra Malaysia
institution_category	Local University
language	English
last_indexed	2025-11-15T11:47:53Z
publishDate	2017
recordtype	eprints
repository_type	Digital Repository
spelling	upm-709222022-07-07T03:07:15Z http://psasir.upm.edu.my/id/eprint/70922/ Improved clustering using robust and classical principal component Hassn, Ahmed Kadom k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set is generally a trial-and-error process which made more difficult by the subjective nature of deciding what constitutes ‘correct’ clustering. When dimension of data is large it is often difficult to apply k-means clustering algorithm since it needs lots of computational times. To remedy this problem, we propose to integrate Principal Component analysis (PCA) which is useful for dimensionality reduction of a dataset with the k-means clustering algorithm. We call our propose method as k-means by principal components (pc1). In this study, the kernels that are created by using the k-means method are replaced with kernels which are created by using PCA method where the PCA method reduces the dimensionality of a data. The results of the study show that the k-means by PCA is faster and more efficient than the classical k-means algorithm. The classical k-means algorithm and the k-means by PCA algorithm are very sensitive to the presence of outlier. Hence the k-means by robust PCA is developed to rectify the problem of outliers in the dataset. The findings indicate that in the absence of outliers, the performances of both methods; the k-means by PCA and the k-means by robust PCA are equally good. Nonetheless, the k-means by robust PCA is not much affected by outliers compared to the k-means by classical PCA. 2017-06 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf Hassn, Ahmed Kadom (2017) Improved clustering using robust and classical principal component. Masters thesis, Universiti Putra Malaysia. Algorithms
spellingShingle	Algorithms Hassn, Ahmed Kadom Improved clustering using robust and classical principal component
title	Improved clustering using robust and classical principal component
title_full	Improved clustering using robust and classical principal component
title_fullStr	Improved clustering using robust and classical principal component
title_full_unstemmed	Improved clustering using robust and classical principal component
title_short	Improved clustering using robust and classical principal component
title_sort	improved clustering using robust and classical principal component
topic	Algorithms
url	http://psasir.upm.edu.my/id/eprint/70922/ http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf

Improved clustering using robust and classical principal component

Similar Items