Improving multi-term topics focused crawling by introducing term frequency-information content (TF-IC) measure

By rapid growth of the Internet, finding desirable information would be a challenging and time consuming task. In order to tackle this issue, focused crawlers, as the ideal solution, through mining of the Web, help us to find web pages closely relevant to the desired information. For this purpose, a...

Full description

Bibliographic Details
Main Authors: Pesaranghader, Ali, Pesaranghader, Ahmad, Mustapha, Norwati, Mohd Sharef, Nurfadhlina
Format: Conference or Workshop Item
Language:English
Published: IEEE 2013
Online Access:http://psasir.upm.edu.my/id/eprint/41317/
http://psasir.upm.edu.my/id/eprint/41317/1/Improving%20multi-term%20topics%20focused%20crawling%20by%20introducing%20term%20frequency-information%20content%20%28TF-IC%29%20measure.pdf
_version_ 1848849663090753536
author Pesaranghader, Ali
Pesaranghader, Ahmad
Mustapha, Norwati
Mohd Sharef, Nurfadhlina
author_facet Pesaranghader, Ali
Pesaranghader, Ahmad
Mustapha, Norwati
Mohd Sharef, Nurfadhlina
author_sort Pesaranghader, Ali
building UPM Institutional Repository
collection Online Access
description By rapid growth of the Internet, finding desirable information would be a challenging and time consuming task. In order to tackle this issue, focused crawlers, as the ideal solution, through mining of the Web, help us to find web pages closely relevant to the desired information. For this purpose, a variety of methods are devised and implemented. Nonetheless, the majority of these methods do not favor more informative terms in a given multi-term topic. In this paper, we propose a new measure called Term Frequency-Information Content (TF-IC) to prioritize terms in a multi-term topic accordingly. Through conducted experiments, we compare our measure against both Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Semantic Indexing (LSI) measures applied in focused crawlers. Experimental results indicate superiority of our measure over TF-IDF and LSI for collecting more relevant web pages of both general and specialized multi-term topics.
first_indexed 2025-11-15T09:53:58Z
format Conference or Workshop Item
id upm-41317
institution Universiti Putra Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T09:53:58Z
publishDate 2013
publisher IEEE
recordtype eprints
repository_type Digital Repository
spelling upm-413172020-06-24T04:36:08Z http://psasir.upm.edu.my/id/eprint/41317/ Improving multi-term topics focused crawling by introducing term frequency-information content (TF-IC) measure Pesaranghader, Ali Pesaranghader, Ahmad Mustapha, Norwati Mohd Sharef, Nurfadhlina By rapid growth of the Internet, finding desirable information would be a challenging and time consuming task. In order to tackle this issue, focused crawlers, as the ideal solution, through mining of the Web, help us to find web pages closely relevant to the desired information. For this purpose, a variety of methods are devised and implemented. Nonetheless, the majority of these methods do not favor more informative terms in a given multi-term topic. In this paper, we propose a new measure called Term Frequency-Information Content (TF-IC) to prioritize terms in a multi-term topic accordingly. Through conducted experiments, we compare our measure against both Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Semantic Indexing (LSI) measures applied in focused crawlers. Experimental results indicate superiority of our measure over TF-IDF and LSI for collecting more relevant web pages of both general and specialized multi-term topics. IEEE 2013 Conference or Workshop Item PeerReviewed text en http://psasir.upm.edu.my/id/eprint/41317/1/Improving%20multi-term%20topics%20focused%20crawling%20by%20introducing%20term%20frequency-information%20content%20%28TF-IC%29%20measure.pdf Pesaranghader, Ali and Pesaranghader, Ahmad and Mustapha, Norwati and Mohd Sharef, Nurfadhlina (2013) Improving multi-term topics focused crawling by introducing term frequency-information content (TF-IC) measure. In: 3rd International Conference on Research and Innovation in Information Systems – 2013 (ICRIIS'13), 27-28 Nov. 2013, Kuala Lumpur, Malaysia. (pp. 102-106). 10.1109/ICRIIS.2013.6716693
spellingShingle Pesaranghader, Ali
Pesaranghader, Ahmad
Mustapha, Norwati
Mohd Sharef, Nurfadhlina
Improving multi-term topics focused crawling by introducing term frequency-information content (TF-IC) measure
title Improving multi-term topics focused crawling by introducing term frequency-information content (TF-IC) measure
title_full Improving multi-term topics focused crawling by introducing term frequency-information content (TF-IC) measure
title_fullStr Improving multi-term topics focused crawling by introducing term frequency-information content (TF-IC) measure
title_full_unstemmed Improving multi-term topics focused crawling by introducing term frequency-information content (TF-IC) measure
title_short Improving multi-term topics focused crawling by introducing term frequency-information content (TF-IC) measure
title_sort improving multi-term topics focused crawling by introducing term frequency-information content (tf-ic) measure
url http://psasir.upm.edu.my/id/eprint/41317/
http://psasir.upm.edu.my/id/eprint/41317/
http://psasir.upm.edu.my/id/eprint/41317/1/Improving%20multi-term%20topics%20focused%20crawling%20by%20introducing%20term%20frequency-information%20content%20%28TF-IC%29%20measure.pdf