Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models

Emotion classification can benefit from a larger pool of training data but manually expanding the emotion corpus is labour-intensive and time-consuming. Distant supervision can be used to collect large amount of training data in a short period of time using emotion word hashtags, but the collecte...

Full description

Bibliographic Details
Main Author:	Yong, Kuan Shyang
Format:	Thesis
Language:	English
Published:	2022
Subjects:	QA76.6 Electronic digital computers > Programming
Online Access:	http://eprints.usm.my/59117/ http://eprints.usm.my/59117/1/YONG%20KUAN%20SHYANG%20-%20TESIS.pdf

_version_	1848884089732464640
author	Yong, Kuan Shyang
author_facet	Yong, Kuan Shyang
author_sort	Yong, Kuan Shyang
building	USM Institutional Repository
collection	Online Access
description	Emotion classification can benefit from a larger pool of training data but manually expanding the emotion corpus is labour-intensive and time-consuming. Distant supervision can be used to collect large amount of training data in a short period of time using emotion word hashtags, but the collected data may contain excessive noise. In this research, we proposed a text augmentation strategy to efficiently expand the size of positive examples for six emotion categories (happiness, anger, excitement, desperation, boredom and indifference) in EmoTweet-28 by exploiting tweets collected from distant supervision (DS) that are similar to the seed examples in EmoTweet-28 (ET-seed). Similarity scoring approach was used to compute to cosine similarity scores between each DS tweet and all ET-seed tweets under the same emotion category. Seven vector representations (USE, InferSent GloVe, InferSent fastText, Word2Vec, fastText, GloVe, and Bag-of-Words) were experimented to represent the tweets in the similarity scoring approach. DS tweets with high similarity scores were selected to become the augmented instances and annotated with emotion labels. The selection of DS tweets was divided into two categories which are threshold-based selection and fixed increment selection. In addition, we also modified the proposed text augmentation strategy by altering the seed sets used for similarity scoring using clustering and misclassified strategies. All augmented sets were evaluated by training a deep neural network classifier separately to distinguish between the presence or absence of specific emotion in tweets from the test set.
first_indexed	2025-11-15T19:01:10Z
format	Thesis
id	usm-59117
institution	Universiti Sains Malaysia
institution_category	Local University
language	English
last_indexed	2025-11-15T19:01:10Z
publishDate	2022
recordtype	eprints
repository_type	Digital Repository
spelling	usm-591172023-08-14T06:38:11Z http://eprints.usm.my/59117/ Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models Yong, Kuan Shyang QA76.6 Electronic digital computers -- Programming Emotion classification can benefit from a larger pool of training data but manually expanding the emotion corpus is labour-intensive and time-consuming. Distant supervision can be used to collect large amount of training data in a short period of time using emotion word hashtags, but the collected data may contain excessive noise. In this research, we proposed a text augmentation strategy to efficiently expand the size of positive examples for six emotion categories (happiness, anger, excitement, desperation, boredom and indifference) in EmoTweet-28 by exploiting tweets collected from distant supervision (DS) that are similar to the seed examples in EmoTweet-28 (ET-seed). Similarity scoring approach was used to compute to cosine similarity scores between each DS tweet and all ET-seed tweets under the same emotion category. Seven vector representations (USE, InferSent GloVe, InferSent fastText, Word2Vec, fastText, GloVe, and Bag-of-Words) were experimented to represent the tweets in the similarity scoring approach. DS tweets with high similarity scores were selected to become the augmented instances and annotated with emotion labels. The selection of DS tweets was divided into two categories which are threshold-based selection and fixed increment selection. In addition, we also modified the proposed text augmentation strategy by altering the seed sets used for similarity scoring using clustering and misclassified strategies. All augmented sets were evaluated by training a deep neural network classifier separately to distinguish between the presence or absence of specific emotion in tweets from the test set. 2022-08 Thesis NonPeerReviewed application/pdf en http://eprints.usm.my/59117/1/YONG%20KUAN%20SHYANG%20-%20TESIS.pdf Yong, Kuan Shyang (2022) Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models. Masters thesis, Universiti Sains Malaysia.
spellingShingle	QA76.6 Electronic digital computers -- Programming Yong, Kuan Shyang Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models
title	Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models
title_full	Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models
title_fullStr	Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models
title_full_unstemmed	Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models
title_short	Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models
title_sort	text augmentation for emotion classification in microblog text using similarity scoring based on neural embedding models
topic	QA76.6 Electronic digital computers -- Programming
url	http://eprints.usm.my/59117/ http://eprints.usm.my/59117/1/YONG%20KUAN%20SHYANG%20-%20TESIS.pdf

Text Augmentation For Emotion Classification In Microblog Text Using Similarity Scoring Based On Neural Embedding Models

Similar Items