The performance of soft computing techniques on content-based SMS spam filtering

Content-based filtering is one of the most widely used methods to combat SMS (Short Message Service) spam. This method represents SMS text messages by a set of selected features which are extracted from data sets. Most of the available data sets have imbalanced class distribution problem. However...

Full description

Bibliographic Details
Main Author: Hassan Saeed, Waddah Waheeb
Format: Thesis
Language:English
English
English
Published: 2015
Subjects:
Online Access:http://eprints.uthm.edu.my/1496/
http://eprints.uthm.edu.my/1496/2/WADDAH%20WAHEEB%20HASSAN%20SAEED%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/1496/1/24p%20WADDAH%20WAHEEB%20HASSAN%20SAEED.pdf
http://eprints.uthm.edu.my/1496/3/WADDAH%20WAHEEB%20HASSAN%20SAEED%20WATERMARK.pdf
_version_ 1848887475859095552
author Hassan Saeed, Waddah Waheeb
author_facet Hassan Saeed, Waddah Waheeb
author_sort Hassan Saeed, Waddah Waheeb
building UTHM Institutional Repository
collection Online Access
description Content-based filtering is one of the most widely used methods to combat SMS (Short Message Service) spam. This method represents SMS text messages by a set of selected features which are extracted from data sets. Most of the available data sets have imbalanced class distribution problem. However, not much attention has been paid to handle this problem which affect the characteristics and size of selected features and cause undesired performance. Soft computing approaches have been applied successfully in content-based spam filtering. In order to enhance soft computing performance, suitable feature subset should be selected. Therefore, this research investigates how well suited three soft computing techniques: Fuzzy Similarity, Artificial Neural Network and Support Vector Machines (SVM) are for content-based SMS spam filtering using an appropriate size of features which are selected by the Gini Index metric as it has the ability to extract suitable features from imbalanced data sets. The data sets used in this research were taken from three sources: UCI repository, Dublin Institute of Technology (DIT) and British English SMS. The performance of each of the technique was compared in terms of True Positive Rate against False Positive Rate, F1 score and Matthews Correlation Coefficient. The results showed that SVM with 150 features outperformed the other techniques in all the comparison measures. The average time needed to classify an SMS text message is a fraction of a millisecond. Another test using NUS SMS corpus was conducted in order to validate the SVM classifier with 150 features. The results again proved the efficiency of the SVM classifier with 150 features for SMS spam filtering with an accuracy of about 99.2%.
first_indexed 2025-11-15T19:54:59Z
format Thesis
id uthm-1496
institution Universiti Tun Hussein Onn Malaysia
institution_category Local University
language English
English
English
last_indexed 2025-11-15T19:54:59Z
publishDate 2015
recordtype eprints
repository_type Digital Repository
spelling uthm-14962021-10-03T07:44:57Z http://eprints.uthm.edu.my/1496/ The performance of soft computing techniques on content-based SMS spam filtering Hassan Saeed, Waddah Waheeb QA76 Computer software Content-based filtering is one of the most widely used methods to combat SMS (Short Message Service) spam. This method represents SMS text messages by a set of selected features which are extracted from data sets. Most of the available data sets have imbalanced class distribution problem. However, not much attention has been paid to handle this problem which affect the characteristics and size of selected features and cause undesired performance. Soft computing approaches have been applied successfully in content-based spam filtering. In order to enhance soft computing performance, suitable feature subset should be selected. Therefore, this research investigates how well suited three soft computing techniques: Fuzzy Similarity, Artificial Neural Network and Support Vector Machines (SVM) are for content-based SMS spam filtering using an appropriate size of features which are selected by the Gini Index metric as it has the ability to extract suitable features from imbalanced data sets. The data sets used in this research were taken from three sources: UCI repository, Dublin Institute of Technology (DIT) and British English SMS. The performance of each of the technique was compared in terms of True Positive Rate against False Positive Rate, F1 score and Matthews Correlation Coefficient. The results showed that SVM with 150 features outperformed the other techniques in all the comparison measures. The average time needed to classify an SMS text message is a fraction of a millisecond. Another test using NUS SMS corpus was conducted in order to validate the SVM classifier with 150 features. The results again proved the efficiency of the SVM classifier with 150 features for SMS spam filtering with an accuracy of about 99.2%. 2015-02 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/1496/2/WADDAH%20WAHEEB%20HASSAN%20SAEED%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/1496/1/24p%20WADDAH%20WAHEEB%20HASSAN%20SAEED.pdf text en http://eprints.uthm.edu.my/1496/3/WADDAH%20WAHEEB%20HASSAN%20SAEED%20WATERMARK.pdf Hassan Saeed, Waddah Waheeb (2015) The performance of soft computing techniques on content-based SMS spam filtering. Masters thesis, Universiti Tun Hussein Onn Malaysia.
spellingShingle QA76 Computer software
Hassan Saeed, Waddah Waheeb
The performance of soft computing techniques on content-based SMS spam filtering
title The performance of soft computing techniques on content-based SMS spam filtering
title_full The performance of soft computing techniques on content-based SMS spam filtering
title_fullStr The performance of soft computing techniques on content-based SMS spam filtering
title_full_unstemmed The performance of soft computing techniques on content-based SMS spam filtering
title_short The performance of soft computing techniques on content-based SMS spam filtering
title_sort performance of soft computing techniques on content-based sms spam filtering
topic QA76 Computer software
url http://eprints.uthm.edu.my/1496/
http://eprints.uthm.edu.my/1496/2/WADDAH%20WAHEEB%20HASSAN%20SAEED%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/1496/1/24p%20WADDAH%20WAHEEB%20HASSAN%20SAEED.pdf
http://eprints.uthm.edu.my/1496/3/WADDAH%20WAHEEB%20HASSAN%20SAEED%20WATERMARK.pdf