Classifying good and bad websites

Websites classification has become a vital subject matter as most websites are increasingly being used as a platform for various applications. These web pages often contain semi-structured content which make the classification process challenging. This paper addresses the use of machine learning tec...

Full description

Bibliographic Details
Main Author: Koo, Ee Woon
Format: Final Year Project Report / IMRAD
Language:English
English
Published: Universiti Malaysia Sarawak, (UNIMAS) 2015
Subjects:
Online Access:http://ir.unimas.my/id/eprint/12117/
http://ir.unimas.my/id/eprint/12117/1/Koo.pdf
http://ir.unimas.my/id/eprint/12117/4/Koo%20full.pdf
_version_ 1848837131603017728
author Koo, Ee Woon
author_facet Koo, Ee Woon
author_sort Koo, Ee Woon
building UNIMAS Institutional Repository
collection Online Access
description Websites classification has become a vital subject matter as most websites are increasingly being used as a platform for various applications. These web pages often contain semi-structured content which make the classification process challenging. This paper addresses the use of machine learning techniques to classify good and bad websites. The classification process is made easy by using set of features generated from HTML codes. The performance ofthe 21 features were evaluated by using three machine learning techniques: support vector machine (SVM), naIve bayes, and nearest neighbor classifiers. The good and bad websites were distinguished by the set of features obtained through counting ofthe HTML tags. A total of200 websites were collected from machine learning task. The results obtained indicate that the features are useful for classification tasks with average accuracy of 80.50% for SVM classifier, 77.00% for naIve bayes classifier, and 72.50% nearest neighbor classifier. Hence, SVM classifier achieved the highest accuracy among all. This project illustrates that it is possible to classify websites as good or bad by using the underlying tags along with the machine learning algorithms.
first_indexed 2025-11-15T06:34:47Z
format Final Year Project Report / IMRAD
id unimas-12117
institution Universiti Malaysia Sarawak
institution_category Local University
language English
English
last_indexed 2025-11-15T06:34:47Z
publishDate 2015
publisher Universiti Malaysia Sarawak, (UNIMAS)
recordtype eprints
repository_type Digital Repository
spelling unimas-121172023-08-08T03:42:27Z http://ir.unimas.my/id/eprint/12117/ Classifying good and bad websites Koo, Ee Woon T Technology (General) Websites classification has become a vital subject matter as most websites are increasingly being used as a platform for various applications. These web pages often contain semi-structured content which make the classification process challenging. This paper addresses the use of machine learning techniques to classify good and bad websites. The classification process is made easy by using set of features generated from HTML codes. The performance ofthe 21 features were evaluated by using three machine learning techniques: support vector machine (SVM), naIve bayes, and nearest neighbor classifiers. The good and bad websites were distinguished by the set of features obtained through counting ofthe HTML tags. A total of200 websites were collected from machine learning task. The results obtained indicate that the features are useful for classification tasks with average accuracy of 80.50% for SVM classifier, 77.00% for naIve bayes classifier, and 72.50% nearest neighbor classifier. Hence, SVM classifier achieved the highest accuracy among all. This project illustrates that it is possible to classify websites as good or bad by using the underlying tags along with the machine learning algorithms. Universiti Malaysia Sarawak, (UNIMAS) 2015 Final Year Project Report / IMRAD NonPeerReviewed text en http://ir.unimas.my/id/eprint/12117/1/Koo.pdf text en http://ir.unimas.my/id/eprint/12117/4/Koo%20full.pdf Koo, Ee Woon (2015) Classifying good and bad websites. [Final Year Project Report / IMRAD] (Unpublished)
spellingShingle T Technology (General)
Koo, Ee Woon
Classifying good and bad websites
title Classifying good and bad websites
title_full Classifying good and bad websites
title_fullStr Classifying good and bad websites
title_full_unstemmed Classifying good and bad websites
title_short Classifying good and bad websites
title_sort classifying good and bad websites
topic T Technology (General)
url http://ir.unimas.my/id/eprint/12117/
http://ir.unimas.my/id/eprint/12117/1/Koo.pdf
http://ir.unimas.my/id/eprint/12117/4/Koo%20full.pdf