The effectiveness of url features on phishing emails classification using machine learning approach

Phishing email classification requires features so that the performance obtained produces good accuracy. One of the reasons for the lack of development of models for detecting phishing emails is the complexity of the feature selection. Feature selection is one of the essential parts of getting a...

Full description

Bibliographic Details
Main Authors:	Ahmad Fadhil Naswir, Lailatul Qadri Zakaria, Saidah Saad
Format:	Article
Language:	English
Published:	Penerbit Universiti Kebangsaan Malaysia 2022
Online Access:	http://journalarticle.ukm.my/20846/ http://journalarticle.ukm.my/20846/1/4.pdf

_version_	1848815211241275392
author	Ahmad Fadhil Naswir, Lailatul Qadri Zakaria, Saidah Saad,
author_facet	Ahmad Fadhil Naswir, Lailatul Qadri Zakaria, Saidah Saad,
author_sort	Ahmad Fadhil Naswir,
building	UKM Institutional Repository
collection	Online Access
description	Phishing email classification requires features so that the performance obtained produces good accuracy. One of the reasons for the lack of development of models for detecting phishing emails is the complexity of the feature selection. Feature selection is one of the essential parts of getting a good classification result, commonly used features are header, body, and Uniform Resource Locator (URL). Besides the email body text content, the URL is one of the leading indicators that the phishing attack successfully happened. The URL is commonly located on the body of the phishing email to get the victim's attention. It will redirect the victim to a fake website to obtain personal information from the victim. There is a lack of information about how the URL features affect the phishing email classification results. Therefore, this work focuses on using URL features to determine whether an email is phishing or legitimate using machine learning approaches. Two public datasets used in this work are the Online Phishing Corpus and Enron Corpus. The URL features are extracted using the Beautiful Soup library. Two machine learning classifiers used in this work are Support Vector Machine (SVM) and Artificial Neural Network (ANN). The experiments were divided into two based on features used in the classifiers. The first experiment used raw email data with URL features, while the second only used raw email data. The first experiment shows higher accuracy in both classifiers, SVM and ANN. Hence, this research proves that the impact of selecting URL features will increase the performance of the classification.
first_indexed	2025-11-15T00:46:22Z
format	Article
id	oai:generic.eprints.org:20846
institution	Universiti Kebangasaan Malaysia
institution_category	Local University
language	English
last_indexed	2025-11-15T00:46:22Z
publishDate	2022
publisher	Penerbit Universiti Kebangsaan Malaysia
recordtype	eprints
repository_type	Digital Repository
spelling	oai:generic.eprints.org:208462022-12-21T08:26:22Z http://journalarticle.ukm.my/20846/ The effectiveness of url features on phishing emails classification using machine learning approach Ahmad Fadhil Naswir, Lailatul Qadri Zakaria, Saidah Saad, Phishing email classification requires features so that the performance obtained produces good accuracy. One of the reasons for the lack of development of models for detecting phishing emails is the complexity of the feature selection. Feature selection is one of the essential parts of getting a good classification result, commonly used features are header, body, and Uniform Resource Locator (URL). Besides the email body text content, the URL is one of the leading indicators that the phishing attack successfully happened. The URL is commonly located on the body of the phishing email to get the victim's attention. It will redirect the victim to a fake website to obtain personal information from the victim. There is a lack of information about how the URL features affect the phishing email classification results. Therefore, this work focuses on using URL features to determine whether an email is phishing or legitimate using machine learning approaches. Two public datasets used in this work are the Online Phishing Corpus and Enron Corpus. The URL features are extracted using the Beautiful Soup library. Two machine learning classifiers used in this work are Support Vector Machine (SVM) and Artificial Neural Network (ANN). The experiments were divided into two based on features used in the classifiers. The first experiment used raw email data with URL features, while the second only used raw email data. The first experiment shows higher accuracy in both classifiers, SVM and ANN. Hence, this research proves that the impact of selecting URL features will increase the performance of the classification. Penerbit Universiti Kebangsaan Malaysia 2022-12 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/20846/1/4.pdf Ahmad Fadhil Naswir, and Lailatul Qadri Zakaria, and Saidah Saad, (2022) The effectiveness of url features on phishing emails classification using machine learning approach. Asia-Pacific Journal of Information Technology and Multimedia, 11 (2). pp. 49-58. ISSN 2289-2192 https://www.ukm.my/apjitm/articles-issues
spellingShingle	Ahmad Fadhil Naswir, Lailatul Qadri Zakaria, Saidah Saad, The effectiveness of url features on phishing emails classification using machine learning approach
title	The effectiveness of url features on phishing emails classification using machine learning approach
title_full	The effectiveness of url features on phishing emails classification using machine learning approach
title_fullStr	The effectiveness of url features on phishing emails classification using machine learning approach
title_full_unstemmed	The effectiveness of url features on phishing emails classification using machine learning approach
title_short	The effectiveness of url features on phishing emails classification using machine learning approach
title_sort	effectiveness of url features on phishing emails classification using machine learning approach
url	http://journalarticle.ukm.my/20846/ http://journalarticle.ukm.my/20846/ http://journalarticle.ukm.my/20846/1/4.pdf

The effectiveness of url features on phishing emails classification using machine learning approach

Similar Items