Sparse subspace representation for spectral document clustering

We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by consider...

Full description

Bibliographic Details
Main Authors: Budhaditya, S., Phung, D., Pham, DucSon, Venkatesh, S.
Other Authors: M. Jaki
Format: Conference Paper
Published: IEEE 2012
Subjects:
Online Access:http://hdl.handle.net/20.500.11937/4290
_version_ 1848744474235109376
author Budhaditya, S.
Phung, D.
Pham, DucSon
Venkatesh, S.
author2 M. Jaki
author_facet M. Jaki
Budhaditya, S.
Phung, D.
Pham, DucSon
Venkatesh, S.
author_sort Budhaditya, S.
building Curtin Institutional Repository
collection Online Access
description We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pair wise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on two real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-the-art algorithms. Significantly, the performance improvement over other methods is prominent for this datasets.
first_indexed 2025-11-14T06:02:02Z
format Conference Paper
id curtin-20.500.11937-4290
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T06:02:02Z
publishDate 2012
publisher IEEE
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-42902023-02-02T07:57:36Z Sparse subspace representation for spectral document clustering Budhaditya, S. Phung, D. Pham, DucSon Venkatesh, S. M. Jaki A. Siebes J. Yu B. Goethals X. Wu document clustering sparse representation We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pair wise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on two real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-the-art algorithms. Significantly, the performance improvement over other methods is prominent for this datasets. 2012 Conference Paper http://hdl.handle.net/20.500.11937/4290 10.1109/ICDM.2012.46 IEEE fulltext
spellingShingle document clustering
sparse representation
Budhaditya, S.
Phung, D.
Pham, DucSon
Venkatesh, S.
Sparse subspace representation for spectral document clustering
title Sparse subspace representation for spectral document clustering
title_full Sparse subspace representation for spectral document clustering
title_fullStr Sparse subspace representation for spectral document clustering
title_full_unstemmed Sparse subspace representation for spectral document clustering
title_short Sparse subspace representation for spectral document clustering
title_sort sparse subspace representation for spectral document clustering
topic document clustering
sparse representation
url http://hdl.handle.net/20.500.11937/4290