Term frequency-information content for focused crawling to predict relevant web pages.
With the rapid growth of the Web, finding desirable information on the Internet is a tedious and time consuming task. Focused crawlers are the golden keys to solve this issue through mining of the Web content. In this regard, a variety of methods have been devised and implemented. Many of these meth...
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English English |
| Published: |
Advanced Institute of Convergence Information Technology
2013
|
| Online Access: | http://psasir.upm.edu.my/id/eprint/30629/ http://psasir.upm.edu.my/id/eprint/30629/1/Term%20frequency.pdf |
| _version_ | 1848846732350193664 |
|---|---|
| author | Pesaranghader, Ali Mustapha, Norwati |
| author_facet | Pesaranghader, Ali Mustapha, Norwati |
| author_sort | Pesaranghader, Ali |
| building | UPM Institutional Repository |
| collection | Online Access |
| description | With the rapid growth of the Web, finding desirable information on the Internet is a tedious and time consuming task. Focused crawlers are the golden keys to solve this issue through mining of the Web content. In this regard, a variety of methods have been devised and implemented. Many of these methods coming from information retrieval viewpoint are not biased towards more informative terms
in multi-term topics (topics with more than one keyword). In this paper, by considering terms’ information contents, we propose Term Frequency-Information Content (TF-IC) method which assigns appropriate weight to each term in a multi-term topic. Through the conducted experiments, we
compare our method with other methods such as Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Semantic Indexing (LSI). Experimental results show that our method outperforms those two methods by retrieving more relevant pages for multi-term topics. |
| first_indexed | 2025-11-15T09:07:23Z |
| format | Article |
| id | upm-30629 |
| institution | Universiti Putra Malaysia |
| institution_category | Local University |
| language | English English |
| last_indexed | 2025-11-15T09:07:23Z |
| publishDate | 2013 |
| publisher | Advanced Institute of Convergence Information Technology |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | upm-306292015-10-28T03:18:09Z http://psasir.upm.edu.my/id/eprint/30629/ Term frequency-information content for focused crawling to predict relevant web pages. Pesaranghader, Ali Mustapha, Norwati With the rapid growth of the Web, finding desirable information on the Internet is a tedious and time consuming task. Focused crawlers are the golden keys to solve this issue through mining of the Web content. In this regard, a variety of methods have been devised and implemented. Many of these methods coming from information retrieval viewpoint are not biased towards more informative terms in multi-term topics (topics with more than one keyword). In this paper, by considering terms’ information contents, we propose Term Frequency-Information Content (TF-IC) method which assigns appropriate weight to each term in a multi-term topic. Through the conducted experiments, we compare our method with other methods such as Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Semantic Indexing (LSI). Experimental results show that our method outperforms those two methods by retrieving more relevant pages for multi-term topics. Advanced Institute of Convergence Information Technology 2013-08 Article PeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/30629/1/Term%20frequency.pdf Pesaranghader, Ali and Mustapha, Norwati (2013) Term frequency-information content for focused crawling to predict relevant web pages. International Journal of Digital Content Technology and its Applications, 7 (12). pp. 113-122. ISSN 1975-9339 English |
| spellingShingle | Pesaranghader, Ali Mustapha, Norwati Term frequency-information content for focused crawling to predict relevant web pages. |
| title | Term frequency-information content for focused crawling to predict relevant web pages. |
| title_full | Term frequency-information content for focused crawling to predict relevant web pages. |
| title_fullStr | Term frequency-information content for focused crawling to predict relevant web pages. |
| title_full_unstemmed | Term frequency-information content for focused crawling to predict relevant web pages. |
| title_short | Term frequency-information content for focused crawling to predict relevant web pages. |
| title_sort | term frequency-information content for focused crawling to predict relevant web pages. |
| url | http://psasir.upm.edu.my/id/eprint/30629/ http://psasir.upm.edu.my/id/eprint/30629/1/Term%20frequency.pdf |