Reducing distributed URLS crawling time : A comparison of GUIDS and IDS

Bibliographic Details
Format:	Restricted Document

_version_	1860797080255070208
building	INTELEK Repository
collection	Online Access
collectionurl	https://intelek.unisza.edu.my/intelek/pages/search.php?search=!collection407072
date	2014-09-29 15:31:09
format	Restricted Document
id	11296
institution	UniSZA
internalnotes	[1] M. A. Qureshi, 2010 “Analyzing the Web Crawler as a Feed Forward Engine for an Efficient Solution to the Search Problem in the Minimum Amount of Time through a Distributed,” Information Science and Applications (ICISA), 2010 International Conference on, pp. 1,8, 21–23. [2] M. Gray, 1993 “Internet Growth and Statistics Credits and Background,” [Online]. Available: http://www.mit.edu/people/mkgray/net/background.html [3] M. Burner, 1997 “Crawling towards Eternit, Building An Archive of The World Wide Web,” Web Techniques Magazine. [4] A. Heydon, M. Najork, L. Ave, and P. Alto, 1999 “Mercator : A Scalable , Extensible Web Crawler Architecture of a Scalable Web Crawler,” World Wide Web, vol. 2, no. 4, pp. 219–229. [5] C. Dimou, A. Batzios, A. L. Symeonidis, P. A. Mitkas, and A. W. Spidering, 2006 “A Multi-Agent Simulation Framework for Spiders Traversing the Semantic Web,” Web Intelligence, WI IEEE/WIC/ACM International Conference on, vol. pp.736,739, pp. 736 – 739. [6] P. Boldi, B. Codenotti, M. Santini, and S. Vigna, 2004 “UbiCrawler : A Scalable Fully Distributed Web Crawler,” pp. 1–14. [7] M. S. Kumar, 2011 “Design and Implementation of Scalable , Fully Distributed Web Crawler for a Web Search Engine,” vol. 15, no. 7, pp. 8–13. [8] A. Singh, M. Srivatsa, L. Liu, and T. Miller, 2004 “Apoidea : A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web,” Distributed Multimedia Information Retrieval, pp. 126–142 [9] Q. Chen, X. Yang, and X. Wang, 2011, “A PEER-TO-PEER BASED PASSIVE WEB CRAWLING SYSTEM,” pp. 10–13,. [10] J. Bahru, 2007, “Multi-Agent Crawling System ( MACS ) Architecture for Effective Web Retrieval Siti Nurkhadijah Aishah Ibrahim and Ali Selamat,” vol., no. July, pp. 1–4, 2007. [11] S. S. Vishwakarma, A. Jain, and A. K. Sachan, 2011 “A Novel Web Crawler Algorithm on Query based Approach with Increases Efficiency,” vol. 46, no. 1, pp. 34–37, 2012. [12] N. Singhal and R. P. Agarwal, 2011 “Information Retrieval from the Web and Application of Migrating Crawler,” International Conference on Computational Intelligence and Communication Systems, p. pp.476,480 [13] N. Singhal, A. Dixit, R. P. Agarwal, and A. K. Sharma, 2012 “Regulating Frequency of a Migrating Web Crawler based on Users Interest,” vol. 4, no. 4, pp. 246–253. [14] S. Mishra, 2011 “A Query based Approach to Reduce the Web Crawler Traffic using HTTP Get Request and Dynamic Web Page,” vol. 14, no. 3, pp. 8–14. [15] B. Zhou, B. Xiao, Z. Lin, and C. Zhang, 2010 “A Distributed Vertical Crawler Using Crawling-Period Based Strategy,” IEEE 2nd International Conference on Future Computer and Communication, vol. V1–306, pp. 306–311. [16] V. Shkapenyuk, 2002 “Design and Implementation of a High-Performance Distributed Web Crawler,” Data Engineering, 2002. Proceedings. 18th International Conference on, p. pp.357 – 368. [17] J. Akilandeswari, 2008 “An Architectural Framework of a Crawler for Locating Deep Web Repositories using Learning Multi-agent Systems,” Internet and Web Applications and Services, 2008. ICIW ’08. Third International Conference on, pp. 558–562. [18] B. S. Hoberman, 2008 “Is GUID Good ?,” no. September. [19] C. Lutteroth and G. Weber, 2008 “Efficient Use of GUIDs,” Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 115–120. [20] I. Report, 2001 “JXTA : A Network Programming Environment,” IEEE INTERNET COMPUTING, no. June, pp. 88–95. [21] P. S. et al. 2003 (Eds.), UML 2003 The Unified Modeling Language. Modeling languages and Applications. Berlin: Springer-Verlag, , pp. 2–3. [22] S. Avancha, A. Joshi, and T. Finin, 2002 “Enhanced Service Discovery in Bluetooth,” Computer, vol. 35, no. 6, pp. 96–99. [23] S. Francisco, S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and J. Kubiatowicz, 2003 “FAST ’ 03 : 2nd USENIX Conference on File and Storage Technologies,” 2nd USENIX Conference on File and Storage Technologies. [24] P. Lucas and J. Senn, 2002 “Toward the Universal Database : U-forms and the VIA Repository. [25] P. Lucas, D. Widdows, J. Hughes, and W. Lucas, 2005 “Roles in the Universal Database: Data and Metadata in a Distributed Semantic Network. [26] A. Baid, T. Vu, and D. Raychaudhuri, 2012 “Comparing Alternative Approaches for Networking of Named Objects in the Future Internet *,” In Computer Communications Workshops (INFOCOM WKSHPS), IEEE Conference on, pp. 298–303.
originalfilename	5512-01-FH02-FIK-06-01524.jpg
person	UniSZA Unisza unisza
recordtype	oai_dc
resourceurl	https://intelek.unisza.edu.my/intelek/pages/view.php?ref=11296
spelling	11296 https://intelek.unisza.edu.my/intelek/pages/view.php?ref=11296 https://intelek.unisza.edu.my/intelek/pages/search.php?search=!collection407072 Restricted Document Article Journal UniSZA Unisza unisza image/jpeg inches 96 96 06 06 1425 781 2014-09-29 15:31:09 1425x781 5512-01-FH02-FIK-06-01524.jpg UniSZA Private Access Reducing distributed URLS crawling time : A comparison of GUIDS and IDS Journal of Theoretical and Applied Information Technology Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the crawling process harder than before as web contents are continuously updated. In addition, crawling speed is important considering tsunami of big data that need to be indexed among competitive search engines. This research project is aimed to provide survey of current problems in distributed web crawlers. It then investigate the best crawling speed between dynamic globally unique identifiers (GUIDs) and the traditional static identifiers (IDs). Experiment are done by implementing Arachnot.net web crawlers to index up to 20000 locally generated URLs using both techniques. The results shown that URLs crawling time can be reduced up to 7% by using GUIDs technique instead of using IDs. 67 1 121-128 [1] M. A. Qureshi, 2010 “Analyzing the Web Crawler as a Feed Forward Engine for an Efficient Solution to the Search Problem in the Minimum Amount of Time through a Distributed,” Information Science and Applications (ICISA), 2010 International Conference on, pp. 1,8, 21–23. [2] M. Gray, 1993 “Internet Growth and Statistics Credits and Background,” [Online]. Available: http://www.mit.edu/people/mkgray/net/background.html [3] M. Burner, 1997 “Crawling towards Eternit, Building An Archive of The World Wide Web,” Web Techniques Magazine. [4] A. Heydon, M. Najork, L. Ave, and P. Alto, 1999 “Mercator : A Scalable , Extensible Web Crawler Architecture of a Scalable Web Crawler,” World Wide Web, vol. 2, no. 4, pp. 219–229. [5] C. Dimou, A. Batzios, A. L. Symeonidis, P. A. Mitkas, and A. W. Spidering, 2006 “A Multi-Agent Simulation Framework for Spiders Traversing the Semantic Web,” Web Intelligence, WI IEEE/WIC/ACM International Conference on, vol. pp.736,739, pp. 736 – 739. [6] P. Boldi, B. Codenotti, M. Santini, and S. Vigna, 2004 “UbiCrawler : A Scalable Fully Distributed Web Crawler,” pp. 1–14. [7] M. S. Kumar, 2011 “Design and Implementation of Scalable , Fully Distributed Web Crawler for a Web Search Engine,” vol. 15, no. 7, pp. 8–13. [8] A. Singh, M. Srivatsa, L. Liu, and T. Miller, 2004 “Apoidea : A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web,” Distributed Multimedia Information Retrieval, pp. 126–142 [9] Q. Chen, X. Yang, and X. Wang, 2011, “A PEER-TO-PEER BASED PASSIVE WEB CRAWLING SYSTEM,” pp. 10–13,. [10] J. Bahru, 2007, “Multi-Agent Crawling System ( MACS ) Architecture for Effective Web Retrieval Siti Nurkhadijah Aishah Ibrahim and Ali Selamat,” vol., no. July, pp. 1–4, 2007. [11] S. S. Vishwakarma, A. Jain, and A. K. Sachan, 2011 “A Novel Web Crawler Algorithm on Query based Approach with Increases Efficiency,” vol. 46, no. 1, pp. 34–37, 2012. [12] N. Singhal and R. P. Agarwal, 2011 “Information Retrieval from the Web and Application of Migrating Crawler,” International Conference on Computational Intelligence and Communication Systems, p. pp.476,480 [13] N. Singhal, A. Dixit, R. P. Agarwal, and A. K. Sharma, 2012 “Regulating Frequency of a Migrating Web Crawler based on Users Interest,” vol. 4, no. 4, pp. 246–253. [14] S. Mishra, 2011 “A Query based Approach to Reduce the Web Crawler Traffic using HTTP Get Request and Dynamic Web Page,” vol. 14, no. 3, pp. 8–14. [15] B. Zhou, B. Xiao, Z. Lin, and C. Zhang, 2010 “A Distributed Vertical Crawler Using Crawling-Period Based Strategy,” IEEE 2nd International Conference on Future Computer and Communication, vol. V1–306, pp. 306–311. [16] V. Shkapenyuk, 2002 “Design and Implementation of a High-Performance Distributed Web Crawler,” Data Engineering, 2002. Proceedings. 18th International Conference on, p. pp.357 – 368. [17] J. Akilandeswari, 2008 “An Architectural Framework of a Crawler for Locating Deep Web Repositories using Learning Multi-agent Systems,” Internet and Web Applications and Services, 2008. ICIW ’08. Third International Conference on, pp. 558–562. [18] B. S. Hoberman, 2008 “Is GUID Good ?,” no. September. [19] C. Lutteroth and G. Weber, 2008 “Efficient Use of GUIDs,” Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 115–120. [20] I. Report, 2001 “JXTA : A Network Programming Environment,” IEEE INTERNET COMPUTING, no. June, pp. 88–95. [21] P. S. et al. 2003 (Eds.), UML 2003 The Unified Modeling Language. Modeling languages and Applications. Berlin: Springer-Verlag, , pp. 2–3. [22] S. Avancha, A. Joshi, and T. Finin, 2002 “Enhanced Service Discovery in Bluetooth,” Computer, vol. 35, no. 6, pp. 96–99. [23] S. Francisco, S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and J. Kubiatowicz, 2003 “FAST ’ 03 : 2nd USENIX Conference on File and Storage Technologies,” 2nd USENIX Conference on File and Storage Technologies. [24] P. Lucas and J. Senn, 2002 “Toward the Universal Database : U-forms and the VIA Repository. [25] P. Lucas, D. Widdows, J. Hughes, and W. Lucas, 2005 “Roles in the Universal Database: Data and Metadata in a Distributed Semantic Network. [26] A. Baid, T. Vu, and D. Raychaudhuri, 2012 “Comparing Alternative Approaches for Networking of Named Objects in the Future Internet *,” In Computer Communications Workshops (INFOCOM WKSHPS), IEEE Conference on, pp. 298–303.
spellingShingle	Reducing distributed URLS crawling time : A comparison of GUIDS and IDS
summary	Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the crawling process harder than before as web contents are continuously updated. In addition, crawling speed is important considering tsunami of big data that need to be indexed among competitive search engines. This research project is aimed to provide survey of current problems in distributed web crawlers. It then investigate the best crawling speed between dynamic globally unique identifiers (GUIDs) and the traditional static identifiers (IDs). Experiment are done by implementing Arachnot.net web crawlers to index up to 20000 locally generated URLs using both techniques. The results shown that URLs crawling time can be reduced up to 7% by using GUIDs technique instead of using IDs.
title	Reducing distributed URLS crawling time : A comparison of GUIDS and IDS
title_full	Reducing distributed URLS crawling time : A comparison of GUIDS and IDS
title_fullStr	Reducing distributed URLS crawling time : A comparison of GUIDS and IDS
title_full_unstemmed	Reducing distributed URLS crawling time : A comparison of GUIDS and IDS
title_short	Reducing distributed URLS crawling time : A comparison of GUIDS and IDS
title_sort	reducing distributed urls crawling time : a comparison of guids and ids

Reducing distributed URLS crawling time : A comparison of GUIDS and IDS

Similar Items