Classifying good and bad websites
Websites classification has become a vital subject matter as most websites are increasingly being used as a platform for various applications. These web pages often contain semi-structured content which make the classification process challenging. This paper addresses the use of machine learning tec...
Main Author: | |
---|---|
Format: | Project Report |
Language: | English English |
Published: |
Universiti Malaysia Sarawak, (UNIMAS)
2015
|
Subjects: | |
Online Access: | http://ir.unimas.my/12117/ http://ir.unimas.my/12117/1/Classifying%20good%20and%20bad%20websites%20%2824pgs%29.pdf http://ir.unimas.my/12117/2/Classifying%20good%20and%20bad%20websites%20%28fulltext%29.pdf |
id |
unimas-12117 |
---|---|
recordtype |
eprints |
spelling |
unimas-121172016-05-20T07:18:39Z http://ir.unimas.my/12117/ Classifying good and bad websites Koo, Ee Woon T Technology (General) Websites classification has become a vital subject matter as most websites are increasingly being used as a platform for various applications. These web pages often contain semi-structured content which make the classification process challenging. This paper addresses the use of machine learning techniques to classify good and bad websites. The classification process is made easy by using set of features generated from HTML codes. The performance ofthe 21 features were evaluated by using three machine learning techniques: support vector machine (SVM), naIve bayes, and nearest neighbor classifiers. The good and bad websites were distinguished by the set of features obtained through counting ofthe HTML tags. A total of200 websites were collected from machine learning task. The results obtained indicate that the features are useful for classification tasks with average accuracy of 80.50% for SVM classifier, 77.00% for naIve bayes classifier, and 72.50% nearest neighbor classifier. Hence, SVM classifier achieved the highest accuracy among all. This project illustrates that it is possible to classify websites as good or bad by using the underlying tags along with the machine learning algorithms. Universiti Malaysia Sarawak, (UNIMAS) 2015 Project Report NonPeerReviewed text en http://ir.unimas.my/12117/1/Classifying%20good%20and%20bad%20websites%20%2824pgs%29.pdf text en http://ir.unimas.my/12117/2/Classifying%20good%20and%20bad%20websites%20%28fulltext%29.pdf Koo, Ee Woon (2015) Classifying good and bad websites. [Project Report] (Unpublished) |
repository_type |
Digital Repository |
institution_category |
Local University |
institution |
Universiti Malaysia Sarawak |
building |
UNIMAS Institutional Repository |
collection |
Online Access |
language |
English English |
topic |
T Technology (General) |
spellingShingle |
T Technology (General) Koo, Ee Woon Classifying good and bad websites |
description |
Websites classification has become a vital subject matter as most websites are increasingly being used as a platform for various applications. These web pages often contain semi-structured content which make the classification process challenging. This paper addresses the use of machine learning techniques to classify good and bad websites. The classification process is made easy by using set of features generated from HTML codes. The performance ofthe 21 features were evaluated by using three machine learning techniques: support vector machine (SVM), naIve bayes, and nearest neighbor classifiers. The good and bad websites were distinguished by the set of features obtained through counting ofthe HTML tags. A total of200 websites were collected from machine learning task. The results obtained indicate that the features are useful for classification tasks with average accuracy of 80.50% for SVM classifier, 77.00% for naIve bayes classifier, and 72.50% nearest neighbor classifier. Hence, SVM classifier achieved the highest accuracy among all. This project illustrates that it is possible to classify websites as good or bad by using the underlying tags along with the machine learning algorithms. |
format |
Project Report |
author |
Koo, Ee Woon |
author_facet |
Koo, Ee Woon |
author_sort |
Koo, Ee Woon |
title |
Classifying good and bad websites |
title_short |
Classifying good and bad websites |
title_full |
Classifying good and bad websites |
title_fullStr |
Classifying good and bad websites |
title_full_unstemmed |
Classifying good and bad websites |
title_sort |
classifying good and bad websites |
publisher |
Universiti Malaysia Sarawak, (UNIMAS) |
publishDate |
2015 |
url |
http://ir.unimas.my/12117/ http://ir.unimas.my/12117/1/Classifying%20good%20and%20bad%20websites%20%2824pgs%29.pdf http://ir.unimas.my/12117/2/Classifying%20good%20and%20bad%20websites%20%28fulltext%29.pdf |
first_indexed |
2018-09-06T15:58:05Z |
last_indexed |
2018-09-06T15:58:05Z |
_version_ |
1610874340934418432 |