Graph-induced restricted Boltzmann machines for document modeling

© 2015 Elsevier Inc. All rights reserved. Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation - the restricted Boltzmann m...

Full description

Bibliographic Details
Main Authors: Nguyen, T., Tran, The Truyen, Phung, D., Venkatesh, S.
Format: Journal Article
Published: Elsevier Inc 2016
Online Access:http://hdl.handle.net/20.500.11937/45888
_version_ 1848757409319747584
author Nguyen, T.
Tran, The Truyen
Phung, D.
Venkatesh, S.
author_facet Nguyen, T.
Tran, The Truyen
Phung, D.
Venkatesh, S.
author_sort Nguyen, T.
building Curtin Institutional Repository
collection Online Access
description © 2015 Elsevier Inc. All rights reserved. Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation - the restricted Boltzmann machine (RBM) - where the underlying graphical model is an undirected bipartite graph. Inference is efficient - document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy.
first_indexed 2025-11-14T09:27:38Z
format Journal Article
id curtin-20.500.11937-45888
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T09:27:38Z
publishDate 2016
publisher Elsevier Inc
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-458882017-09-13T14:26:12Z Graph-induced restricted Boltzmann machines for document modeling Nguyen, T. Tran, The Truyen Phung, D. Venkatesh, S. © 2015 Elsevier Inc. All rights reserved. Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation - the restricted Boltzmann machine (RBM) - where the underlying graphical model is an undirected bipartite graph. Inference is efficient - document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy. 2016 Journal Article http://hdl.handle.net/20.500.11937/45888 10.1016/j.ins.2015.08.023 Elsevier Inc restricted
spellingShingle Nguyen, T.
Tran, The Truyen
Phung, D.
Venkatesh, S.
Graph-induced restricted Boltzmann machines for document modeling
title Graph-induced restricted Boltzmann machines for document modeling
title_full Graph-induced restricted Boltzmann machines for document modeling
title_fullStr Graph-induced restricted Boltzmann machines for document modeling
title_full_unstemmed Graph-induced restricted Boltzmann machines for document modeling
title_short Graph-induced restricted Boltzmann machines for document modeling
title_sort graph-induced restricted boltzmann machines for document modeling
url http://hdl.handle.net/20.500.11937/45888