Random sampling method of large-scale graph data classification
Graph data appears in broad real-world applications in modelling complex objects in big data. Effective analysis of graph data provides a deeper understanding of the data in data mining tasks, including classification, clustering, prediction, and recommendation systems. Mining a large number of gr...
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Fakulti Kejuruteraan ,UKM,Bangi.
2024
|
| Online Access: | http://journalarticle.ukm.my/25235/ http://journalarticle.ukm.my/25235/1/kejut_14.pdf |
| Summary: | Graph data appears in broad real-world applications in modelling complex objects in big data. Effective analysis of
graph data provides a deeper understanding of the data in data mining tasks, including classification, clustering,
prediction, and recommendation systems. Mining a large number of graphs becomes a challenging task because
state-of-the-art methods are not scalable due to the memory limit. To address this issue, we propose a novel
approximate random sampling method for large-scale graph data classification. In this approach, we applied a
representation method to encode each graph as a record of a vector string and a set of graphs as a set of N records in
a file. Then, we partition the set of records into disjoint subsets of data blocks, making each data block a random
sample of the data file. After that, we randomly select a subset of data blocks, each being a random sample of the
graph dataset, and compute the different graph property distributions. Since the data blocks in this model are much
smaller than the entire data set, it is more efficient to analyze them on a standalone small machine, and multiple data blocks can be analyzed on multiple nodes of the cluster in parallel. Finally, we classified the graphs of data blocks using the SVM algorithm. In experimental evaluation, our proposed method outperformed state-of-the-art graph
kernels on graph classification datasets in terms of accuracy. |
|---|