A comparison of clustering algorithms for data anonymization / Zahra Mahmoud

Organizations today can easily store massive amounts of data as the cost of storage has significantly plummeted over the years. Data is used to help them raise their brand's value. However, as data becomes easier to store in mass amounts, the security risk also increases. In the last two years...

Full description

Bibliographic Details
Main Author: Zahra, Mahmoud
Format: Thesis
Published: 2019
Subjects:
Online Access:http://studentsrepo.um.edu.my/10708/
http://studentsrepo.um.edu.my/10708/2/Zahra_Mahmoud.pdf
http://studentsrepo.um.edu.my/10708/1/Zahra_Mahmoud_%E2%80%93_Dissertation.pdf
_version_ 1848774209119977472
author Zahra, Mahmoud
author_facet Zahra, Mahmoud
author_sort Zahra, Mahmoud
building UM Research Repository
collection Online Access
description Organizations today can easily store massive amounts of data as the cost of storage has significantly plummeted over the years. Data is used to help them raise their brand's value. However, as data becomes easier to store in mass amounts, the security risk also increases. In the last two years alone, multiple data leaks have been reported, the latest being from the Ministry of Education in Malaysia. Over the years, there has been extensive research on data security. Literature review showed that many researches have employed methods such as data encryption or privacy protection data publishing (PPDP). This thesis focuses more on the latter, as data encryption has proven to be more costly. Many of the literature also focused on using generalization and suppression to achieve the level of anonymity it required. However, a heavily suppressed or generalized data may paint a different picture instead. The objective of this thesis is to find a method of data anonymization that is efficient and produces the least percentage of information loss. By comparing multiple different types of PPDP, the researcher then determined that the clustering method is the best fit for this purpose. Next, multiple types of existing clustering algorithms are compared to determine which has the best performance. The researcher then created an enhanced method to do a final comparison– the researcher manipulated the distance function to show how cluster distance difference can affect the outcome of the anonymized dataset.
first_indexed 2025-11-14T13:54:40Z
format Thesis
id um-10708
institution University Malaya
institution_category Local University
last_indexed 2025-11-14T13:54:40Z
publishDate 2019
recordtype eprints
repository_type Digital Repository
spelling um-107082020-08-16T23:22:54Z A comparison of clustering algorithms for data anonymization / Zahra Mahmoud Zahra, Mahmoud Z665 Library Science. Information Science Organizations today can easily store massive amounts of data as the cost of storage has significantly plummeted over the years. Data is used to help them raise their brand's value. However, as data becomes easier to store in mass amounts, the security risk also increases. In the last two years alone, multiple data leaks have been reported, the latest being from the Ministry of Education in Malaysia. Over the years, there has been extensive research on data security. Literature review showed that many researches have employed methods such as data encryption or privacy protection data publishing (PPDP). This thesis focuses more on the latter, as data encryption has proven to be more costly. Many of the literature also focused on using generalization and suppression to achieve the level of anonymity it required. However, a heavily suppressed or generalized data may paint a different picture instead. The objective of this thesis is to find a method of data anonymization that is efficient and produces the least percentage of information loss. By comparing multiple different types of PPDP, the researcher then determined that the clustering method is the best fit for this purpose. Next, multiple types of existing clustering algorithms are compared to determine which has the best performance. The researcher then created an enhanced method to do a final comparison– the researcher manipulated the distance function to show how cluster distance difference can affect the outcome of the anonymized dataset. 2019-06 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/10708/2/Zahra_Mahmoud.pdf application/pdf http://studentsrepo.um.edu.my/10708/1/Zahra_Mahmoud_%E2%80%93_Dissertation.pdf Zahra, Mahmoud (2019) A comparison of clustering algorithms for data anonymization / Zahra Mahmoud. Masters thesis, University of Malaya. http://studentsrepo.um.edu.my/10708/
spellingShingle Z665 Library Science. Information Science
Zahra, Mahmoud
A comparison of clustering algorithms for data anonymization / Zahra Mahmoud
title A comparison of clustering algorithms for data anonymization / Zahra Mahmoud
title_full A comparison of clustering algorithms for data anonymization / Zahra Mahmoud
title_fullStr A comparison of clustering algorithms for data anonymization / Zahra Mahmoud
title_full_unstemmed A comparison of clustering algorithms for data anonymization / Zahra Mahmoud
title_short A comparison of clustering algorithms for data anonymization / Zahra Mahmoud
title_sort comparison of clustering algorithms for data anonymization / zahra mahmoud
topic Z665 Library Science. Information Science
url http://studentsrepo.um.edu.my/10708/
http://studentsrepo.um.edu.my/10708/2/Zahra_Mahmoud.pdf
http://studentsrepo.um.edu.my/10708/1/Zahra_Mahmoud_%E2%80%93_Dissertation.pdf