SOF: a semi-supervised ontology - learning - based focused crawler

The rapid increase in the volume of data available on the Internet makes it increasingly impractical for a crawler to index the whole Web. Instead, many intelligent crawlers, known as ontology-based semantic focused crawlers, have been designed by making use of Semantic Web technologies for topic-ce...

Full description

Bibliographic Details
Main Authors: Dong, Hai, Hussain, Farookh
Format: Journal Article
Published: John Wiley & Sons Ltd 2013
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/18523
_version_ 1848749768317075456
author Dong, Hai
Hussain, Farookh
author_facet Dong, Hai
Hussain, Farookh
author_sort Dong, Hai
building Curtin Institutional Repository
collection Online Access
description The rapid increase in the volume of data available on the Internet makes it increasingly impractical for a crawler to index the whole Web. Instead, many intelligent crawlers, known as ontology-based semantic focused crawlers, have been designed by making use of Semantic Web technologies for topic-centered Web information crawling. Ontologies, however, have constraints of validity and time, which may influence the performance of the crawlers. Ontology-learning-based focused crawlers are therefore designed to automatically evolve ontologies by integrating ontology learning technologies. Nevertheless, surveys indicate that the existing ontology-learning-based focused crawlers do not have the capability to automatically enrich the content of ontologies, which makes these crawlers unreliable in the open and heterogeneous Web environment. Hence, in this paper, we propose a framework for a novel semi-supervised ontology-learning-based focused (SOF) crawler, the SOF crawler, which embodies a series of schemas for ontology generation and Web information formatting, a semi-supervised ontology learning framework, and a hybrid Web page classification approach aggregated by a group of support vector machine models. A series of tests are implemented to evaluate the technical feasibility of this proposed framework. The conclusion and the future work are summarized in the final section.
first_indexed 2025-11-14T07:26:11Z
format Journal Article
id curtin-20.500.11937-18523
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T07:26:11Z
publishDate 2013
publisher John Wiley & Sons Ltd
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-185232017-09-13T16:04:39Z SOF: a semi-supervised ontology - learning - based focused crawler Dong, Hai Hussain, Farookh support vector machine probabilistic model ontology-learning-based focused crawler semantic focused crawler ontological term learning semi-supervised ontology learning semantic similarity model The rapid increase in the volume of data available on the Internet makes it increasingly impractical for a crawler to index the whole Web. Instead, many intelligent crawlers, known as ontology-based semantic focused crawlers, have been designed by making use of Semantic Web technologies for topic-centered Web information crawling. Ontologies, however, have constraints of validity and time, which may influence the performance of the crawlers. Ontology-learning-based focused crawlers are therefore designed to automatically evolve ontologies by integrating ontology learning technologies. Nevertheless, surveys indicate that the existing ontology-learning-based focused crawlers do not have the capability to automatically enrich the content of ontologies, which makes these crawlers unreliable in the open and heterogeneous Web environment. Hence, in this paper, we propose a framework for a novel semi-supervised ontology-learning-based focused (SOF) crawler, the SOF crawler, which embodies a series of schemas for ontology generation and Web information formatting, a semi-supervised ontology learning framework, and a hybrid Web page classification approach aggregated by a group of support vector machine models. A series of tests are implemented to evaluate the technical feasibility of this proposed framework. The conclusion and the future work are summarized in the final section. 2013 Journal Article http://hdl.handle.net/20.500.11937/18523 10.1002/cpe.2980 John Wiley & Sons Ltd restricted
spellingShingle support vector machine
probabilistic model
ontology-learning-based focused crawler
semantic focused crawler
ontological term learning
semi-supervised ontology learning
semantic similarity model
Dong, Hai
Hussain, Farookh
SOF: a semi-supervised ontology - learning - based focused crawler
title SOF: a semi-supervised ontology - learning - based focused crawler
title_full SOF: a semi-supervised ontology - learning - based focused crawler
title_fullStr SOF: a semi-supervised ontology - learning - based focused crawler
title_full_unstemmed SOF: a semi-supervised ontology - learning - based focused crawler
title_short SOF: a semi-supervised ontology - learning - based focused crawler
title_sort sof: a semi-supervised ontology - learning - based focused crawler
topic support vector machine
probabilistic model
ontology-learning-based focused crawler
semantic focused crawler
ontological term learning
semi-supervised ontology learning
semantic similarity model
url http://hdl.handle.net/20.500.11937/18523