A survey in semantic web technologies-inspired focused crawlers

Crawlers are software which can traverse the internet and retrieve webpages by hyperlinks. In theface of the inundant spam websites, traditional web crawlers cannot function well to solve this problem.Semantic focused crawlers utilize semantic web technologies to analyze the semantics of hyperlinksa...

Full description

Bibliographic Details
Main Authors: Dong, Hai, Hussain, Farookh Khadeer, Chang, Elizabeth
Other Authors: Shoniregun, C.A.
Format: Conference Paper
Published: Institute of Electrical and Electronics Engineers (IEEE) 2008
Online Access:http://hdl.handle.net/20.500.11937/7518
_version_ 1848745391270395904
author Dong, Hai
Hussain, Farookh Khadeer
Chang, Elizabeth
author2 Shoniregun, C.A.
author_facet Shoniregun, C.A.
Dong, Hai
Hussain, Farookh Khadeer
Chang, Elizabeth
author_sort Dong, Hai
building Curtin Institutional Repository
collection Online Access
description Crawlers are software which can traverse the internet and retrieve webpages by hyperlinks. In theface of the inundant spam websites, traditional web crawlers cannot function well to solve this problem.Semantic focused crawlers utilize semantic web technologies to analyze the semantics of hyperlinksand web documents. This paper briefly reviews the recent studies on one category of semantic focusedcrawlers ? ontology-based focused crawlers, which are a series of crawlers that utilize ontologies to linkthe fetched web documents with the ontological concepts (topics). The purpose of this is to organizeand categorize web documents, or filtering irrelevant webpages with regards to the topics. A briefcomparison are made among these crawlers, from six perspectives - domain, working environment,special functions, technologies utilized, evaluation metrics and evaluation results. The conclusion withrespect to this comparison is made in the final section.
first_indexed 2025-11-14T06:16:37Z
format Conference Paper
id curtin-20.500.11937-7518
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T06:16:37Z
publishDate 2008
publisher Institute of Electrical and Electronics Engineers (IEEE)
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-75182017-09-13T15:54:28Z A survey in semantic web technologies-inspired focused crawlers Dong, Hai Hussain, Farookh Khadeer Chang, Elizabeth Shoniregun, C.A. Crawlers are software which can traverse the internet and retrieve webpages by hyperlinks. In theface of the inundant spam websites, traditional web crawlers cannot function well to solve this problem.Semantic focused crawlers utilize semantic web technologies to analyze the semantics of hyperlinksand web documents. This paper briefly reviews the recent studies on one category of semantic focusedcrawlers ? ontology-based focused crawlers, which are a series of crawlers that utilize ontologies to linkthe fetched web documents with the ontological concepts (topics). The purpose of this is to organizeand categorize web documents, or filtering irrelevant webpages with regards to the topics. A briefcomparison are made among these crawlers, from six perspectives - domain, working environment,special functions, technologies utilized, evaluation metrics and evaluation results. The conclusion withrespect to this comparison is made in the final section. 2008 Conference Paper http://hdl.handle.net/20.500.11937/7518 10.1109/ICDIM.2008.4746736 Institute of Electrical and Electronics Engineers (IEEE) fulltext
spellingShingle Dong, Hai
Hussain, Farookh Khadeer
Chang, Elizabeth
A survey in semantic web technologies-inspired focused crawlers
title A survey in semantic web technologies-inspired focused crawlers
title_full A survey in semantic web technologies-inspired focused crawlers
title_fullStr A survey in semantic web technologies-inspired focused crawlers
title_full_unstemmed A survey in semantic web technologies-inspired focused crawlers
title_short A survey in semantic web technologies-inspired focused crawlers
title_sort survey in semantic web technologies-inspired focused crawlers
url http://hdl.handle.net/20.500.11937/7518