Random sampling method of large-scale graph data classification

Graph data appears in broad real-world applications in modelling complex objects in big data. Effective analysis of graph data provides a deeper understanding of the data in data mining tasks, including classification, clustering, prediction, and recommendation systems. Mining a large number of gr...

Full description

Bibliographic Details
Main Authors:	Rashed Mustafa, Mohammad Sultan Mahmud, Mahir Shadid
Format:	Article
Language:	English
Published:	Fakulti Kejuruteraan ,UKM,Bangi. 2024
Online Access:	http://journalarticle.ukm.my/25235/ http://journalarticle.ukm.my/25235/1/kejut_14.pdf

_version_	1848816304404824064
author	Rashed Mustafa, Mohammad Sultan Mahmud, Mahir Shadid,
author_facet	Rashed Mustafa, Mohammad Sultan Mahmud, Mahir Shadid,
author_sort	Rashed Mustafa,
building	UKM Institutional Repository
collection	Online Access
description	Graph data appears in broad real-world applications in modelling complex objects in big data. Effective analysis of graph data provides a deeper understanding of the data in data mining tasks, including classification, clustering, prediction, and recommendation systems. Mining a large number of graphs becomes a challenging task because state-of-the-art methods are not scalable due to the memory limit. To address this issue, we propose a novel approximate random sampling method for large-scale graph data classification. In this approach, we applied a representation method to encode each graph as a record of a vector string and a set of graphs as a set of N records in a file. Then, we partition the set of records into disjoint subsets of data blocks, making each data block a random sample of the data file. After that, we randomly select a subset of data blocks, each being a random sample of the graph dataset, and compute the different graph property distributions. Since the data blocks in this model are much smaller than the entire data set, it is more efficient to analyze them on a standalone small machine, and multiple data blocks can be analyzed on multiple nodes of the cluster in parallel. Finally, we classified the graphs of data blocks using the SVM algorithm. In experimental evaluation, our proposed method outperformed state-of-the-art graph kernels on graph classification datasets in terms of accuracy.
first_indexed	2025-11-15T01:03:45Z
format	Article
id	oai:generic.eprints.org:25235
institution	Universiti Kebangasaan Malaysia
institution_category	Local University
language	English
last_indexed	2025-11-15T01:03:45Z
publishDate	2024
publisher	Fakulti Kejuruteraan ,UKM,Bangi.
recordtype	eprints
repository_type	Digital Repository
spelling	oai:generic.eprints.org:252352025-05-22T07:00:36Z http://journalarticle.ukm.my/25235/ Random sampling method of large-scale graph data classification Rashed Mustafa, Mohammad Sultan Mahmud, Mahir Shadid, Graph data appears in broad real-world applications in modelling complex objects in big data. Effective analysis of graph data provides a deeper understanding of the data in data mining tasks, including classification, clustering, prediction, and recommendation systems. Mining a large number of graphs becomes a challenging task because state-of-the-art methods are not scalable due to the memory limit. To address this issue, we propose a novel approximate random sampling method for large-scale graph data classification. In this approach, we applied a representation method to encode each graph as a record of a vector string and a set of graphs as a set of N records in a file. Then, we partition the set of records into disjoint subsets of data blocks, making each data block a random sample of the data file. After that, we randomly select a subset of data blocks, each being a random sample of the graph dataset, and compute the different graph property distributions. Since the data blocks in this model are much smaller than the entire data set, it is more efficient to analyze them on a standalone small machine, and multiple data blocks can be analyzed on multiple nodes of the cluster in parallel. Finally, we classified the graphs of data blocks using the SVM algorithm. In experimental evaluation, our proposed method outperformed state-of-the-art graph kernels on graph classification datasets in terms of accuracy. Fakulti Kejuruteraan ,UKM,Bangi. 2024 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/25235/1/kejut_14.pdf Rashed Mustafa, and Mohammad Sultan Mahmud, and Mahir Shadid, (2024) Random sampling method of large-scale graph data classification. Jurnal Kejuruteraan, 36 (2). pp. 525-532. ISSN 0128-0198 https://www.ukm.my/jkukm/volume-3602-2024/
spellingShingle	Rashed Mustafa, Mohammad Sultan Mahmud, Mahir Shadid, Random sampling method of large-scale graph data classification
title	Random sampling method of large-scale graph data classification
title_full	Random sampling method of large-scale graph data classification
title_fullStr	Random sampling method of large-scale graph data classification
title_full_unstemmed	Random sampling method of large-scale graph data classification
title_short	Random sampling method of large-scale graph data classification
title_sort	random sampling method of large-scale graph data classification
url	http://journalarticle.ukm.my/25235/ http://journalarticle.ukm.my/25235/ http://journalarticle.ukm.my/25235/1/kejut_14.pdf

Random sampling method of large-scale graph data classification

Similar Items