Context-aware sentence categorisation: word mover’s distance and character-level convolutional recurrent neural network

Supervised k nearest neighbour and unsupervised hierarchical agglomerative clustering algorithm can be enhanced through word mover’s distance-based sentence distance metric to offer superior context-aware sentence categorisation performance. Advanced neural network-oriented classifier is able to ach...

Full description

Bibliographic Details
Main Author:	Fu, Xinyu
Format:	Thesis (University of Nottingham only)
Language:	English
Published:	2018
Subjects:	Sentence Categorisation; Word Mover's Distance; Convolutional Neural Network; Recurrent Neural Network; Sentence Similarity; Sentence Distance
Online Access:	https://eprints.nottingham.ac.uk/52054/

_version_	1848798637321093120
author	Fu, Xinyu
author_facet	Fu, Xinyu
author_sort	Fu, Xinyu
building	Nottingham Research Data Repository
collection	Online Access
description	Supervised k nearest neighbour and unsupervised hierarchical agglomerative clustering algorithm can be enhanced through word mover’s distance-based sentence distance metric to offer superior context-aware sentence categorisation performance. Advanced neural network-oriented classifier is able to achieve competing result on the benchmark streams via an aggregated recurrent unit incorporated with sophis- ticated convolving layer. The continually increasing number of textual snippets produced each year ne- cessitates ever improving information processing methods for searching, retrieving, and organising text. Central to these information processing methods are sentence classification and clustering, which have become an important application for nat- ural language processing and information retrieval. This present work proposes three novel sentence categorisation frameworks, namely hierarchical agglomerative clustering-word mover’s distance, k nearest neighbour-word mover’s distance, and convolutional recurrent neural network. Hierarchical agglomerative clustering-word mover’s distance employs word mover’s distance distortion function to effectively cluster unlabelled sentences into nearby centroid. K nearest neighbour-word mover’s distance classifies testing textual snippets through word mover’s distance-based sen- tence similarity. Both models are from the spectrum of count-based framework since they apply term frequency statistics when building the vector space matrix. Experimental evaluation on the two unsupervised learning data-sets show better per- formance of hierarchical agglomerative clustering-word mover’s distance over other competitors on mean squared error, completeness score, homogeneity score, and v-measure value. For k nearest neighbour-word mover’s distance, two benchmark textual streams are experimented to verify its superior classification performance against comparison algorithms on precision rate, recall ratio, and F1 score. Per- formance comparison is statistically validated via Mann-Whitney-U test. Through extensive experiments and results analysis, each research hypothesis is successfully verified to be yes. Unlike traditional singleton neural network, convolutional recurrent neural net- work model incorporates character-level convolutional network with character-aware recurrent neural network to form a combined framework. The proposed model ben- efits from character-aware convolutional neural network in that only salient features are selected and fed into the integrated character-aware recurrent neural network. Character-aware recurrent neural network effectively learns long sequence semantics via sophisticated update mechanism. The experiment presented in current thesis compares convolutional recurrent neural network framework against the state-of- the-art text classification algorithms on four popular benchmarking corpus. The present work also analyses three different recurrent neural network hidden recurrent cells’ impact on performance and their runtime efficiency. It is observed that min- imal gated unit achieves the optimal runtime and comparable performance against gated recurrent unit and long short-term memory. For term frequency-inverse docu- ment frequency-based algorithms, the current experiment examines word2vec, global vectors for word representation, and sent2vec embeddings and reports their perfor- mance differences. Performance comparison is statistically validated through Mann- Whitney-U test and the corresponding hypotheses are tested to be yes through the reported statistical analysis.
first_indexed	2025-11-14T20:22:56Z
format	Thesis (University of Nottingham only)
id	nottingham-52054
institution	University of Nottingham Malaysia Campus
institution_category	Local University
language	English
last_indexed	2025-11-14T20:22:56Z
publishDate	2018
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-520542025-02-28T14:08:17Z https://eprints.nottingham.ac.uk/52054/ Context-aware sentence categorisation: word mover’s distance and character-level convolutional recurrent neural network Fu, Xinyu Supervised k nearest neighbour and unsupervised hierarchical agglomerative clustering algorithm can be enhanced through word mover’s distance-based sentence distance metric to offer superior context-aware sentence categorisation performance. Advanced neural network-oriented classifier is able to achieve competing result on the benchmark streams via an aggregated recurrent unit incorporated with sophis- ticated convolving layer. The continually increasing number of textual snippets produced each year ne- cessitates ever improving information processing methods for searching, retrieving, and organising text. Central to these information processing methods are sentence classification and clustering, which have become an important application for nat- ural language processing and information retrieval. This present work proposes three novel sentence categorisation frameworks, namely hierarchical agglomerative clustering-word mover’s distance, k nearest neighbour-word mover’s distance, and convolutional recurrent neural network. Hierarchical agglomerative clustering-word mover’s distance employs word mover’s distance distortion function to effectively cluster unlabelled sentences into nearby centroid. K nearest neighbour-word mover’s distance classifies testing textual snippets through word mover’s distance-based sen- tence similarity. Both models are from the spectrum of count-based framework since they apply term frequency statistics when building the vector space matrix. Experimental evaluation on the two unsupervised learning data-sets show better per- formance of hierarchical agglomerative clustering-word mover’s distance over other competitors on mean squared error, completeness score, homogeneity score, and v-measure value. For k nearest neighbour-word mover’s distance, two benchmark textual streams are experimented to verify its superior classification performance against comparison algorithms on precision rate, recall ratio, and F1 score. Per- formance comparison is statistically validated via Mann-Whitney-U test. Through extensive experiments and results analysis, each research hypothesis is successfully verified to be yes. Unlike traditional singleton neural network, convolutional recurrent neural net- work model incorporates character-level convolutional network with character-aware recurrent neural network to form a combined framework. The proposed model ben- efits from character-aware convolutional neural network in that only salient features are selected and fed into the integrated character-aware recurrent neural network. Character-aware recurrent neural network effectively learns long sequence semantics via sophisticated update mechanism. The experiment presented in current thesis compares convolutional recurrent neural network framework against the state-of- the-art text classification algorithms on four popular benchmarking corpus. The present work also analyses three different recurrent neural network hidden recurrent cells’ impact on performance and their runtime efficiency. It is observed that min- imal gated unit achieves the optimal runtime and comparable performance against gated recurrent unit and long short-term memory. For term frequency-inverse docu- ment frequency-based algorithms, the current experiment examines word2vec, global vectors for word representation, and sent2vec embeddings and reports their perfor- mance differences. Performance comparison is statistically validated through Mann- Whitney-U test and the corresponding hypotheses are tested to be yes through the reported statistical analysis. 2018-07 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en arr https://eprints.nottingham.ac.uk/52054/1/thesis_final.pdf Fu, Xinyu (2018) Context-aware sentence categorisation: word mover’s distance and character-level convolutional recurrent neural network. PhD thesis, University of Nottingham. Sentence Categorisation; Word Mover's Distance; Convolutional Neural Network; Recurrent Neural Network; Sentence Similarity; Sentence Distance
spellingShingle	Sentence Categorisation; Word Mover's Distance; Convolutional Neural Network; Recurrent Neural Network; Sentence Similarity; Sentence Distance Fu, Xinyu Context-aware sentence categorisation: word mover’s distance and character-level convolutional recurrent neural network
title	Context-aware sentence categorisation: word mover’s distance and character-level convolutional recurrent neural network
title_full	Context-aware sentence categorisation: word mover’s distance and character-level convolutional recurrent neural network
title_fullStr	Context-aware sentence categorisation: word mover’s distance and character-level convolutional recurrent neural network
title_full_unstemmed	Context-aware sentence categorisation: word mover’s distance and character-level convolutional recurrent neural network
title_short	Context-aware sentence categorisation: word mover’s distance and character-level convolutional recurrent neural network
title_sort	context-aware sentence categorisation: word mover’s distance and character-level convolutional recurrent neural network
topic	Sentence Categorisation; Word Mover's Distance; Convolutional Neural Network; Recurrent Neural Network; Sentence Similarity; Sentence Distance
url	https://eprints.nottingham.ac.uk/52054/

Context-aware sentence categorisation: word mover’s distance and character-level convolutional recurrent neural network

Similar Items