Sparse subspace representation for spectral document clustering
We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by consider...
| Main Authors: | , , , |
|---|---|
| Other Authors: | |
| Format: | Conference Paper |
| Published: |
IEEE
2012
|
| Subjects: | |
| Online Access: | http://hdl.handle.net/20.500.11937/4290 |
| _version_ | 1848744474235109376 |
|---|---|
| author | Budhaditya, S. Phung, D. Pham, DucSon Venkatesh, S. |
| author2 | M. Jaki |
| author_facet | M. Jaki Budhaditya, S. Phung, D. Pham, DucSon Venkatesh, S. |
| author_sort | Budhaditya, S. |
| building | Curtin Institutional Repository |
| collection | Online Access |
| description | We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pair wise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on two real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-the-art algorithms. Significantly, the performance improvement over other methods is prominent for this datasets. |
| first_indexed | 2025-11-14T06:02:02Z |
| format | Conference Paper |
| id | curtin-20.500.11937-4290 |
| institution | Curtin University Malaysia |
| institution_category | Local University |
| last_indexed | 2025-11-14T06:02:02Z |
| publishDate | 2012 |
| publisher | IEEE |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | curtin-20.500.11937-42902023-02-02T07:57:36Z Sparse subspace representation for spectral document clustering Budhaditya, S. Phung, D. Pham, DucSon Venkatesh, S. M. Jaki A. Siebes J. Yu B. Goethals X. Wu document clustering sparse representation We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pair wise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on two real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-the-art algorithms. Significantly, the performance improvement over other methods is prominent for this datasets. 2012 Conference Paper http://hdl.handle.net/20.500.11937/4290 10.1109/ICDM.2012.46 IEEE fulltext |
| spellingShingle | document clustering sparse representation Budhaditya, S. Phung, D. Pham, DucSon Venkatesh, S. Sparse subspace representation for spectral document clustering |
| title | Sparse subspace representation for spectral document clustering |
| title_full | Sparse subspace representation for spectral document clustering |
| title_fullStr | Sparse subspace representation for spectral document clustering |
| title_full_unstemmed | Sparse subspace representation for spectral document clustering |
| title_short | Sparse subspace representation for spectral document clustering |
| title_sort | sparse subspace representation for spectral document clustering |
| topic | document clustering sparse representation |
| url | http://hdl.handle.net/20.500.11937/4290 |