Improved clustering using robust and classical principal component
k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set...
| Main Author: | |
|---|---|
| Format: | Thesis |
| Language: | English |
| Published: |
2017
|
| Subjects: | |
| Online Access: | http://psasir.upm.edu.my/id/eprint/70922/ http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf |
| _version_ | 1848856829798383616 |
|---|---|
| author | Hassn, Ahmed Kadom |
| author_facet | Hassn, Ahmed Kadom |
| author_sort | Hassn, Ahmed Kadom |
| building | UPM Institutional Repository |
| collection | Online Access |
| description | k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set is generally a trial-and-error process which made more difficult by the subjective nature of deciding what constitutes ‘correct’ clustering. When dimension of data is large it is often difficult to apply k-means clustering algorithm since it needs lots of computational times. To remedy this problem, we propose to integrate Principal Component analysis (PCA) which is useful for dimensionality reduction of a dataset with the k-means clustering algorithm. We call our propose method as k-means by principal components (pc1). In this study, the kernels that are created by using the k-means method are replaced with kernels which are created by using PCA method where the PCA method reduces the dimensionality of a data. The results of the study show that the k-means by PCA is faster and more efficient than the classical k-means algorithm. The classical k-means algorithm and the k-means by PCA algorithm are very sensitive to the presence of outlier. Hence the k-means by robust PCA is developed to rectify the problem of outliers in the dataset. The findings indicate that in the absence of outliers, the performances of both methods; the k-means by PCA and the k-means by robust PCA are equally good. Nonetheless, the k-means by robust PCA is not much affected by outliers compared to the k-means by classical PCA. |
| first_indexed | 2025-11-15T11:47:53Z |
| format | Thesis |
| id | upm-70922 |
| institution | Universiti Putra Malaysia |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-15T11:47:53Z |
| publishDate | 2017 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | upm-709222022-07-07T03:07:15Z http://psasir.upm.edu.my/id/eprint/70922/ Improved clustering using robust and classical principal component Hassn, Ahmed Kadom k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set is generally a trial-and-error process which made more difficult by the subjective nature of deciding what constitutes ‘correct’ clustering. When dimension of data is large it is often difficult to apply k-means clustering algorithm since it needs lots of computational times. To remedy this problem, we propose to integrate Principal Component analysis (PCA) which is useful for dimensionality reduction of a dataset with the k-means clustering algorithm. We call our propose method as k-means by principal components (pc1). In this study, the kernels that are created by using the k-means method are replaced with kernels which are created by using PCA method where the PCA method reduces the dimensionality of a data. The results of the study show that the k-means by PCA is faster and more efficient than the classical k-means algorithm. The classical k-means algorithm and the k-means by PCA algorithm are very sensitive to the presence of outlier. Hence the k-means by robust PCA is developed to rectify the problem of outliers in the dataset. The findings indicate that in the absence of outliers, the performances of both methods; the k-means by PCA and the k-means by robust PCA are equally good. Nonetheless, the k-means by robust PCA is not much affected by outliers compared to the k-means by classical PCA. 2017-06 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf Hassn, Ahmed Kadom (2017) Improved clustering using robust and classical principal component. Masters thesis, Universiti Putra Malaysia. Algorithms |
| spellingShingle | Algorithms Hassn, Ahmed Kadom Improved clustering using robust and classical principal component |
| title | Improved clustering using robust and classical principal component |
| title_full | Improved clustering using robust and classical principal component |
| title_fullStr | Improved clustering using robust and classical principal component |
| title_full_unstemmed | Improved clustering using robust and classical principal component |
| title_short | Improved clustering using robust and classical principal component |
| title_sort | improved clustering using robust and classical principal component |
| topic | Algorithms |
| url | http://psasir.upm.edu.my/id/eprint/70922/ http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf |