A strategy for extracting information from semi-structured web pages.
Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs...
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
2010
|
| Subjects: | |
| Online Access: | http://psasir.upm.edu.my/id/eprint/12868/ |
| _version_ | 1848841953659060224 |
|---|---|
| author | Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana |
| author_facet | Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana |
| author_sort | Shaker, Mahmoud |
| building | UPM Institutional Repository |
| collection | Online Access |
| description | Purpose – The aim of this paper is to propose a strategy for extracting information from web tables.
Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines.
Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub-attributes that describe the extracted attributes and values of the sub-attributes.
Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent.
Originality/value – This paper contributes to the research on extracting information. |
| first_indexed | 2025-11-15T07:51:26Z |
| format | Article |
| id | upm-12868 |
| institution | Universiti Putra Malaysia |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-15T07:51:26Z |
| publishDate | 2010 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | upm-128682012-01-27T01:25:59Z http://psasir.upm.edu.my/id/eprint/12868/ A strategy for extracting information from semi-structured web pages. Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines. Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub-attributes that describe the extracted attributes and values of the sub-attributes. Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent. Originality/value – This paper contributes to the research on extracting information. 2010 Article PeerReviewed Shaker, Mahmoud and Ibrahim, Hamidah and Mustapha, Aida and Abdullah, Lili Nurliyana (2010) A strategy for extracting information from semi-structured web pages. International Journal of Web Information Systems , 6 (4). pp. 304-318. ISSN 1744-0084 Information retrieval. Text processing (Computer science). 10.1108/17440081011090239 English |
| spellingShingle | Information retrieval. Text processing (Computer science). Shaker, Mahmoud Ibrahim, Hamidah Mustapha, Aida Abdullah, Lili Nurliyana A strategy for extracting information from semi-structured web pages. |
| title | A strategy for extracting information from semi-structured web pages. |
| title_full | A strategy for extracting information from semi-structured web pages. |
| title_fullStr | A strategy for extracting information from semi-structured web pages. |
| title_full_unstemmed | A strategy for extracting information from semi-structured web pages. |
| title_short | A strategy for extracting information from semi-structured web pages. |
| title_sort | strategy for extracting information from semi-structured web pages. |
| topic | Information retrieval. Text processing (Computer science). |
| url | http://psasir.upm.edu.my/id/eprint/12868/ http://psasir.upm.edu.my/id/eprint/12868/ |