A comparative analysis of anti-phishing website techniques: identifying optimal approaches to enhance cybersecurity

Internet security is continuously threatened by phishing attacks; therefore, the ability to identify fraudulent websites is crucial in order to prevent users from being duped into divulging sensitive information. Consequently, it is critical to identify effective detection techniques for fraudulent...

Full description

Bibliographic Details
Main Author: Yau, Jia Xin
Format: Final Year Project / Dissertation / Thesis
Published: 2023
Subjects:
Online Access:http://eprints.utar.edu.my/6330/
http://eprints.utar.edu.my/6330/1/Final_Report_(Yau_Jia_Xin).pdf
Description
Summary:Internet security is continuously threatened by phishing attacks; therefore, the ability to identify fraudulent websites is crucial in order to prevent users from being duped into divulging sensitive information. Consequently, it is critical to identify effective detection techniques for fraudulent websites. The research consists of analysing the characteristics of phishing websites, extracting their essential features using the wrapper method, and classifying websites as phishing or legitimate using supervised and unsupervised learning algorithms. The study evaluates and compares the efficacy of multiple machine learning algorithms, including the Autoencoder classifier, Extreme Gradient Boost (XGBoost), and Random Forest classifier, using metrics such as accuracy, precision, recall, and F1-score. Random Forest, with an impressive accuracy rate of 97.03%, demonstrates its exceptional capability in accurately categorising websites that are fraudulent or legitimate in nature. By integrating the Google Safe Browsing List and the Random Forest classifier, a web application is created. Upon receiving the user's URL, the web application utilises a pre-trained Random Forest classifier to ascertain the probability that the requested URL is a fraud site. As an additional layer of security, the Google Safe Browsing List is utilised to verify the output produced by the Random Forest classifier. It is expected the fact that the research will result in the development of phishing detection technologies that are more precise and efficient, thereby bolstering online security and protecting users against identity and financial deception.