Classifying good and bad websites

Websites classification has become a vital subject matter as most websites are increasingly being used as a platform for various applications. These web pages often contain semi-structured content which make the classification process challenging. This paper addresses the use of machine learning tec...

Full description

Bibliographic Details
Main Author: Koo, Ee Woon
Format: Project Report
Language:English
English
Published: Universiti Malaysia Sarawak, (UNIMAS) 2015
Subjects:
Online Access:http://ir.unimas.my/12117/
http://ir.unimas.my/12117/1/Classifying%20good%20and%20bad%20websites%20%2824pgs%29.pdf
http://ir.unimas.my/12117/2/Classifying%20good%20and%20bad%20websites%20%28fulltext%29.pdf
id unimas-12117
recordtype eprints
spelling unimas-121172016-05-20T07:18:39Z http://ir.unimas.my/12117/ Classifying good and bad websites Koo, Ee Woon T Technology (General) Websites classification has become a vital subject matter as most websites are increasingly being used as a platform for various applications. These web pages often contain semi-structured content which make the classification process challenging. This paper addresses the use of machine learning techniques to classify good and bad websites. The classification process is made easy by using set of features generated from HTML codes. The performance ofthe 21 features were evaluated by using three machine learning techniques: support vector machine (SVM), naIve bayes, and nearest neighbor classifiers. The good and bad websites were distinguished by the set of features obtained through counting ofthe HTML tags. A total of200 websites were collected from machine learning task. The results obtained indicate that the features are useful for classification tasks with average accuracy of 80.50% for SVM classifier, 77.00% for naIve bayes classifier, and 72.50% nearest neighbor classifier. Hence, SVM classifier achieved the highest accuracy among all. This project illustrates that it is possible to classify websites as good or bad by using the underlying tags along with the machine learning algorithms. Universiti Malaysia Sarawak, (UNIMAS) 2015 Project Report NonPeerReviewed text en http://ir.unimas.my/12117/1/Classifying%20good%20and%20bad%20websites%20%2824pgs%29.pdf text en http://ir.unimas.my/12117/2/Classifying%20good%20and%20bad%20websites%20%28fulltext%29.pdf Koo, Ee Woon (2015) Classifying good and bad websites. [Project Report] (Unpublished)
repository_type Digital Repository
institution_category Local University
institution Universiti Malaysia Sarawak
building UNIMAS Institutional Repository
collection Online Access
language English
English
topic T Technology (General)
spellingShingle T Technology (General)
Koo, Ee Woon
Classifying good and bad websites
description Websites classification has become a vital subject matter as most websites are increasingly being used as a platform for various applications. These web pages often contain semi-structured content which make the classification process challenging. This paper addresses the use of machine learning techniques to classify good and bad websites. The classification process is made easy by using set of features generated from HTML codes. The performance ofthe 21 features were evaluated by using three machine learning techniques: support vector machine (SVM), naIve bayes, and nearest neighbor classifiers. The good and bad websites were distinguished by the set of features obtained through counting ofthe HTML tags. A total of200 websites were collected from machine learning task. The results obtained indicate that the features are useful for classification tasks with average accuracy of 80.50% for SVM classifier, 77.00% for naIve bayes classifier, and 72.50% nearest neighbor classifier. Hence, SVM classifier achieved the highest accuracy among all. This project illustrates that it is possible to classify websites as good or bad by using the underlying tags along with the machine learning algorithms.
format Project Report
author Koo, Ee Woon
author_facet Koo, Ee Woon
author_sort Koo, Ee Woon
title Classifying good and bad websites
title_short Classifying good and bad websites
title_full Classifying good and bad websites
title_fullStr Classifying good and bad websites
title_full_unstemmed Classifying good and bad websites
title_sort classifying good and bad websites
publisher Universiti Malaysia Sarawak, (UNIMAS)
publishDate 2015
url http://ir.unimas.my/12117/
http://ir.unimas.my/12117/1/Classifying%20good%20and%20bad%20websites%20%2824pgs%29.pdf
http://ir.unimas.my/12117/2/Classifying%20good%20and%20bad%20websites%20%28fulltext%29.pdf
first_indexed 2018-09-06T15:58:05Z
last_indexed 2018-09-06T15:58:05Z
_version_ 1610874340934418432