Web Data Extraction Approach for Deep Web using WEIDJ

Bibliographic Details
Format: Restricted Document
_version_ 1860800055164796928
building INTELEK Repository
collection Online Access
collectionurl https://intelek.unisza.edu.my/intelek/pages/search.php?search=!collection407072
date 2020-04-15 03:41:34
eventvenue Effat UniversityJeddah; Saudi Arabia
format Restricted Document
id 8465
institution UniSZA
originalfilename 1870-01-FH03-FIK-20-37010.pdf
person Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML
like Gecko) Chrome/80.0.3987.149 Safari/537.36
recordtype oai_dc
resourceurl https://intelek.unisza.edu.my/intelek/pages/view.php?ref=8465
spelling 8465 https://intelek.unisza.edu.my/intelek/pages/view.php?ref=8465 https://intelek.unisza.edu.my/intelek/pages/search.php?search=!collection407072 Restricted Document Conference Conference Paper application/pdf 6 1.6 Adobe Acrobat Pro DC 20 Paper Capture Plug-in Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/80.0.3987.149 Safari/537.36 2020-04-15 03:41:34 1870-01-FH03-FIK-20-37010.pdf UniSZA Private Access Web Data Extraction Approach for Deep Web using WEIDJ Data extraction is one of the most prominent areas in data mining analysis that is been extensively studied especially in the field of data requirements and reservoir. The main aim of data extraction with regards to semi-structured data is to retrieve beneficial information from the World Wide Web. The data from large web data also known as deep web is retrievable but it requires request through form submission because it cannot be performed by any search engines. Data mining applications and automatic data extraction are very cumbersome due to the diverse structure of web pages. Most of the previous data extraction techniques were dealing with various data types such as text, audio, video and etc. but research works that are focusing on image as data are still lacking. Document Object Model (DOM) is an example of the state of the art of data extraction technique that is related to research work in mining image data. DOM was the method used to solve semi-structured data extraction from web. However, as the HTML documents start to grow larger, it has been found that the process of data extraction has been plagued with lengthy processing time and noisy information. In this research work, we propose an improved model namely Wrapper Extraction of Image using DOM and JSON (WEIDJ) in response to the promising results of mining in a higher volume of web data from a various types of image format and taking the consideration of web data extraction from deep web. To observe the efficiency of the proposed model, we compare the performance of data extraction by different level of page extraction with existing methods such as VIBS, MDR, DEPTA and VIDE. It has yielded the best results in Precision with 100, Recall with 97.93103 and F-measure with 98.9547. 16th International Learning and Technology Conference, L and T 2019 Effat UniversityJeddah; Saudi Arabia
spellingShingle Web Data Extraction Approach for Deep Web using WEIDJ
summary Data extraction is one of the most prominent areas in data mining analysis that is been extensively studied especially in the field of data requirements and reservoir. The main aim of data extraction with regards to semi-structured data is to retrieve beneficial information from the World Wide Web. The data from large web data also known as deep web is retrievable but it requires request through form submission because it cannot be performed by any search engines. Data mining applications and automatic data extraction are very cumbersome due to the diverse structure of web pages. Most of the previous data extraction techniques were dealing with various data types such as text, audio, video and etc. but research works that are focusing on image as data are still lacking. Document Object Model (DOM) is an example of the state of the art of data extraction technique that is related to research work in mining image data. DOM was the method used to solve semi-structured data extraction from web. However, as the HTML documents start to grow larger, it has been found that the process of data extraction has been plagued with lengthy processing time and noisy information. In this research work, we propose an improved model namely Wrapper Extraction of Image using DOM and JSON (WEIDJ) in response to the promising results of mining in a higher volume of web data from a various types of image format and taking the consideration of web data extraction from deep web. To observe the efficiency of the proposed model, we compare the performance of data extraction by different level of page extraction with existing methods such as VIBS, MDR, DEPTA and VIDE. It has yielded the best results in Precision with 100, Recall with 97.93103 and F-measure with 98.9547.
title Web Data Extraction Approach for Deep Web using WEIDJ
title_full Web Data Extraction Approach for Deep Web using WEIDJ
title_fullStr Web Data Extraction Approach for Deep Web using WEIDJ
title_full_unstemmed Web Data Extraction Approach for Deep Web using WEIDJ
title_short Web Data Extraction Approach for Deep Web using WEIDJ
title_sort web data extraction approach for deep web using weidj