A strategy for extracting information from semi-structured web pages.

Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs...

Full description

Bibliographic Details
Main Authors:	Shaker, Mahmoud, Ibrahim, Hamidah, Mustapha, Aida, Abdullah, Lili Nurliyana
Format:	Article
Language:	English
Published:	2010
Subjects:	Information retrieval. Text processing (Computer science).
Online Access:	http://psasir.upm.edu.my/id/eprint/12868/

_version_	1848841953659060224
author	Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana
author_facet	Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana
author_sort	Shaker, Mahmoud
building	UPM Institutional Repository
collection	Online Access
description	Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines. Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub-attributes that describe the extracted attributes and values of the sub-attributes. Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent. Originality/value – This paper contributes to the research on extracting information.
first_indexed	2025-11-15T07:51:26Z
format	Article
id	upm-12868
institution	Universiti Putra Malaysia
institution_category	Local University
language	English
last_indexed	2025-11-15T07:51:26Z
publishDate	2010
recordtype	eprints
repository_type	Digital Repository
spelling	upm-128682012-01-27T01:25:59Z http://psasir.upm.edu.my/id/eprint/12868/ A strategy for extracting information from semi-structured web pages. Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines. Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub-attributes that describe the extracted attributes and values of the sub-attributes. Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent. Originality/value – This paper contributes to the research on extracting information. 2010 Article PeerReviewed Shaker, Mahmoud and Ibrahim, Hamidah and Mustapha, Aida and Abdullah, Lili Nurliyana (2010) A strategy for extracting information from semi-structured web pages. International Journal of Web Information Systems , 6 (4). pp. 304-318. ISSN 1744-0084 Information retrieval. Text processing (Computer science). 10.1108/17440081011090239 English
spellingShingle	Information retrieval. Text processing (Computer science). Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana A strategy for extracting information from semi-structured web pages.
title	A strategy for extracting information from semi-structured web pages.
title_full	A strategy for extracting information from semi-structured web pages.
title_fullStr	A strategy for extracting information from semi-structured web pages.
title_full_unstemmed	A strategy for extracting information from semi-structured web pages.
title_short	A strategy for extracting information from semi-structured web pages.
title_sort	strategy for extracting information from semi-structured web pages.
topic	Information retrieval. Text processing (Computer science).
url	http://psasir.upm.edu.my/id/eprint/12868/ http://psasir.upm.edu.my/id/eprint/12868/

A strategy for extracting information from semi-structured web pages.

Similar Items