An ensemble feature selection method to detect web spam

Feature selection is an important issue in data mining, and it is used to reduce dimensions of features set. Web spam detection is one of research fields of data mining. With regard to increasing available information in virtual space and the need of users to search, the role of search engines and u...

Full description

Bibliographic Details
Main Authors: Oskouei, Mahdieh Danandeh, Razavi, Seyed Naser
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2018
Online Access:http://journalarticle.ukm.my/17768/
http://journalarticle.ukm.my/17768/1/08.pdf
_version_ 1848814396024815616
author Oskouei, Mahdieh Danandeh
Razavi, Seyed Naser
author_facet Oskouei, Mahdieh Danandeh
Razavi, Seyed Naser
author_sort Oskouei, Mahdieh Danandeh
building UKM Institutional Repository
collection Online Access
description Feature selection is an important issue in data mining, and it is used to reduce dimensions of features set. Web spam detection is one of research fields of data mining. With regard to increasing available information in virtual space and the need of users to search, the role of search engines and used algorithms are important in terms of ranking. Web spam is an illegal method to increase mendacious rank of internet pages by deceiving the algorithms of search engines, so it is essential to use an efficient method. Up to now, many methods have been proposed to face with web spam. An ensemble feature selection method has been proposed in this paper to detect web spam. Content features of standard dataset of WEBSPAM-UK2007 are used for evaluation. Bayes network classifier is used along with 70-30% training-testing spilt of dataset. The presented results show that Area Under the ROC Curve (AUC) of this method is higher than the other methods reported in this paper. Moreover, the best values of evaluation metrics in our proposed method are optimal in comparison to the other methods reported in this paper. In addition, it improves classification metrics in comparison to basic feature selection methods.
first_indexed 2025-11-15T00:33:25Z
format Article
id oai:generic.eprints.org:17768
institution Universiti Kebangasaan Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T00:33:25Z
publishDate 2018
publisher Penerbit Universiti Kebangsaan Malaysia
recordtype eprints
repository_type Digital Repository
spelling oai:generic.eprints.org:177682021-12-24T08:51:51Z http://journalarticle.ukm.my/17768/ An ensemble feature selection method to detect web spam Oskouei, Mahdieh Danandeh Razavi, Seyed Naser Feature selection is an important issue in data mining, and it is used to reduce dimensions of features set. Web spam detection is one of research fields of data mining. With regard to increasing available information in virtual space and the need of users to search, the role of search engines and used algorithms are important in terms of ranking. Web spam is an illegal method to increase mendacious rank of internet pages by deceiving the algorithms of search engines, so it is essential to use an efficient method. Up to now, many methods have been proposed to face with web spam. An ensemble feature selection method has been proposed in this paper to detect web spam. Content features of standard dataset of WEBSPAM-UK2007 are used for evaluation. Bayes network classifier is used along with 70-30% training-testing spilt of dataset. The presented results show that Area Under the ROC Curve (AUC) of this method is higher than the other methods reported in this paper. Moreover, the best values of evaluation metrics in our proposed method are optimal in comparison to the other methods reported in this paper. In addition, it improves classification metrics in comparison to basic feature selection methods. Penerbit Universiti Kebangsaan Malaysia 2018-12 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/17768/1/08.pdf Oskouei, Mahdieh Danandeh and Razavi, Seyed Naser (2018) An ensemble feature selection method to detect web spam. Asia-Pacific Journal of Information Technology and Multimedia, 7 (2). pp. 99-113. ISSN 2289-2192 https://www.ukm.my/apjitm/articles-year.php
spellingShingle Oskouei, Mahdieh Danandeh
Razavi, Seyed Naser
An ensemble feature selection method to detect web spam
title An ensemble feature selection method to detect web spam
title_full An ensemble feature selection method to detect web spam
title_fullStr An ensemble feature selection method to detect web spam
title_full_unstemmed An ensemble feature selection method to detect web spam
title_short An ensemble feature selection method to detect web spam
title_sort ensemble feature selection method to detect web spam
url http://journalarticle.ukm.my/17768/
http://journalarticle.ukm.my/17768/
http://journalarticle.ukm.my/17768/1/08.pdf