Exploring COVID-19 vaccine sentiment: a Twitter-based analysis of text processing and machine learning approaches

In the wake of the 2020 coronavirus disease (COVID-19) pandemic, the swift development and deployment of vaccines marked a critical juncture, necessitating an understanding of public sentiments for effective health communication and policymaking. Social media platforms, especially Twitter, have emer...

Full description

Bibliographic Details
Main Authors: Khalaf, Ban Safir, Hamdan, Hazlina, Manshor, Noridayu
Format: Article
Language:English
Published: Institute of Advanced Engineering and Science 2024
Online Access:http://psasir.upm.edu.my/id/eprint/116987/
http://psasir.upm.edu.my/id/eprint/116987/1/116987.pdf
_version_ 1848867136941850624
author Khalaf, Ban Safir
Hamdan, Hazlina
Manshor, Noridayu
author_facet Khalaf, Ban Safir
Hamdan, Hazlina
Manshor, Noridayu
author_sort Khalaf, Ban Safir
building UPM Institutional Repository
collection Online Access
description In the wake of the 2020 coronavirus disease (COVID-19) pandemic, the swift development and deployment of vaccines marked a critical juncture, necessitating an understanding of public sentiments for effective health communication and policymaking. Social media platforms, especially Twitter, have emerged as rich sources for gauging public opinion. This study harnesses the power of natural language processing (NLP) and machine learning (ML) to delve into the sentiments and trends surrounding COVID-19 vaccination, utilizing a comprehensive Twitter dataset. Traditional research primarily focuses on ML algorithms, but this study brings to the forefront the underutilized potential of NLP in data preprocessing. By employing text frequency-inverse document frequency (TF-IDF) for text processing and long short-term memory (LSTM) for classification, the research evaluates six ML techniques K-nearest neighbors (KNN), decision trees (DT), random forest (RF), artificial neural networks (ANN), support vector machines (SVM), and LSTM. Our findings reveal that LSTM, particularly when combined with tweet text tokenization, stands out as the most effective approach. Furthermore, the study highlights the pivotal role of feature selection, showcasing how TF-IDF features significantly bolster the performance of SVM and LSTM, achieving an impressive accuracy exceeding 98%. These results underscore the potential of advanced NLP applications in real-world settings, paving the way for nuanced and effective analysis of public health discourse on social media.
first_indexed 2025-11-15T14:31:42Z
format Article
id upm-116987
institution Universiti Putra Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T14:31:42Z
publishDate 2024
publisher Institute of Advanced Engineering and Science
recordtype eprints
repository_type Digital Repository
spelling upm-1169872025-04-22T04:04:20Z http://psasir.upm.edu.my/id/eprint/116987/ Exploring COVID-19 vaccine sentiment: a Twitter-based analysis of text processing and machine learning approaches Khalaf, Ban Safir Hamdan, Hazlina Manshor, Noridayu In the wake of the 2020 coronavirus disease (COVID-19) pandemic, the swift development and deployment of vaccines marked a critical juncture, necessitating an understanding of public sentiments for effective health communication and policymaking. Social media platforms, especially Twitter, have emerged as rich sources for gauging public opinion. This study harnesses the power of natural language processing (NLP) and machine learning (ML) to delve into the sentiments and trends surrounding COVID-19 vaccination, utilizing a comprehensive Twitter dataset. Traditional research primarily focuses on ML algorithms, but this study brings to the forefront the underutilized potential of NLP in data preprocessing. By employing text frequency-inverse document frequency (TF-IDF) for text processing and long short-term memory (LSTM) for classification, the research evaluates six ML techniques K-nearest neighbors (KNN), decision trees (DT), random forest (RF), artificial neural networks (ANN), support vector machines (SVM), and LSTM. Our findings reveal that LSTM, particularly when combined with tweet text tokenization, stands out as the most effective approach. Furthermore, the study highlights the pivotal role of feature selection, showcasing how TF-IDF features significantly bolster the performance of SVM and LSTM, achieving an impressive accuracy exceeding 98%. These results underscore the potential of advanced NLP applications in real-world settings, paving the way for nuanced and effective analysis of public health discourse on social media. Institute of Advanced Engineering and Science 2024 Article PeerReviewed text en cc_by_sa_4 http://psasir.upm.edu.my/id/eprint/116987/1/116987.pdf Khalaf, Ban Safir and Hamdan, Hazlina and Manshor, Noridayu (2024) Exploring COVID-19 vaccine sentiment: a Twitter-based analysis of text processing and machine learning approaches. Bulletin of Electrical Engineering and Informatics, 13 (6). pp. 4522-4531. ISSN 2089-3191; eISSN: 2302-9285 https://beei.org/index.php/EEI/article/view/7855 10.11591/eei.v13i6.7855
spellingShingle Khalaf, Ban Safir
Hamdan, Hazlina
Manshor, Noridayu
Exploring COVID-19 vaccine sentiment: a Twitter-based analysis of text processing and machine learning approaches
title Exploring COVID-19 vaccine sentiment: a Twitter-based analysis of text processing and machine learning approaches
title_full Exploring COVID-19 vaccine sentiment: a Twitter-based analysis of text processing and machine learning approaches
title_fullStr Exploring COVID-19 vaccine sentiment: a Twitter-based analysis of text processing and machine learning approaches
title_full_unstemmed Exploring COVID-19 vaccine sentiment: a Twitter-based analysis of text processing and machine learning approaches
title_short Exploring COVID-19 vaccine sentiment: a Twitter-based analysis of text processing and machine learning approaches
title_sort exploring covid-19 vaccine sentiment: a twitter-based analysis of text processing and machine learning approaches
url http://psasir.upm.edu.my/id/eprint/116987/
http://psasir.upm.edu.my/id/eprint/116987/
http://psasir.upm.edu.my/id/eprint/116987/
http://psasir.upm.edu.my/id/eprint/116987/1/116987.pdf