Optimizing sentiment analysis of Indonesian texts: Enhancing deep learning models with genetic algorithm-based feature selection

Automatic text classification techniques are employed in a multitude of real-world applications, including the filtering of unsolicited messages, the analysis of sentiment, and the categorization of news items. The primary challenge in text representation is the high dimensionality, which can increa...

Full description

Bibliographic Details
Main Authors: Siti, Mujilahwati, Noor Zuraidin, Mohd Safar, Ku Muhammad Naim, Ku Khalif, Nasyitah, Ghazalli
Format: Article
Language:English
Published: Penerbit UTHM 2024
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/43886/
http://umpir.ump.edu.my/id/eprint/43886/1/Optimizing%20sentiment%20analysis%20of%20indonesian%20texts.pdf
_version_ 1848826983230734336
author Siti, Mujilahwati
Noor Zuraidin, Mohd Safar
Ku Muhammad Naim, Ku Khalif
Nasyitah, Ghazalli
author_facet Siti, Mujilahwati
Noor Zuraidin, Mohd Safar
Ku Muhammad Naim, Ku Khalif
Nasyitah, Ghazalli
author_sort Siti, Mujilahwati
building UMP Institutional Repository
collection Online Access
description Automatic text classification techniques are employed in a multitude of real-world applications, including the filtering of unsolicited messages, the analysis of sentiment, and the categorization of news items. The primary challenge in text representation is the high dimensionality, which can increase the complexity and risk of overfitting the model. To address this challenge, feature selection (FS) is conducted during the data pre-processing phase with the objective of enhancing the learning accuracy and efficiency of the model. This study examines the optimization of Indonesian text sentiment analysis through the integration of feature selection using a genetic algorithm (GA) with deep learning models. The application of GA for data dimensionality reduction from 41,140 to 20,769 features, coupled with fitness evaluation based on SVM, resulted in an observed increase in accuracy by 8.10% for SVM, 36.1% for Naïve Bayes, 7.82% for LSTM, 5.47% for DNN, and 6.25% for CNN. Of the three deep learning models, LSTM demonstrated the highest accuracy, at 91.41%, while also exhibiting a notable reduction in computation time, approaching 50%.
first_indexed 2025-11-15T03:53:29Z
format Article
id ump-43886
institution Universiti Malaysia Pahang
institution_category Local University
language English
last_indexed 2025-11-15T03:53:29Z
publishDate 2024
publisher Penerbit UTHM
recordtype eprints
repository_type Digital Repository
spelling ump-438862025-02-20T08:53:48Z http://umpir.ump.edu.my/id/eprint/43886/ Optimizing sentiment analysis of Indonesian texts: Enhancing deep learning models with genetic algorithm-based feature selection Siti, Mujilahwati Noor Zuraidin, Mohd Safar Ku Muhammad Naim, Ku Khalif Nasyitah, Ghazalli Q Science (General) QA Mathematics Automatic text classification techniques are employed in a multitude of real-world applications, including the filtering of unsolicited messages, the analysis of sentiment, and the categorization of news items. The primary challenge in text representation is the high dimensionality, which can increase the complexity and risk of overfitting the model. To address this challenge, feature selection (FS) is conducted during the data pre-processing phase with the objective of enhancing the learning accuracy and efficiency of the model. This study examines the optimization of Indonesian text sentiment analysis through the integration of feature selection using a genetic algorithm (GA) with deep learning models. The application of GA for data dimensionality reduction from 41,140 to 20,769 features, coupled with fitness evaluation based on SVM, resulted in an observed increase in accuracy by 8.10% for SVM, 36.1% for Naïve Bayes, 7.82% for LSTM, 5.47% for DNN, and 6.25% for CNN. Of the three deep learning models, LSTM demonstrated the highest accuracy, at 91.41%, while also exhibiting a notable reduction in computation time, approaching 50%. Penerbit UTHM 2024-12-18 Article PeerReviewed pdf en cc_by_nc_sa_4 http://umpir.ump.edu.my/id/eprint/43886/1/Optimizing%20sentiment%20analysis%20of%20indonesian%20texts.pdf Siti, Mujilahwati and Noor Zuraidin, Mohd Safar and Ku Muhammad Naim, Ku Khalif and Nasyitah, Ghazalli (2024) Optimizing sentiment analysis of Indonesian texts: Enhancing deep learning models with genetic algorithm-based feature selection. Journal of Soft Computing and Data Mining, 5 (2). pp. 208-222. ISSN 2716-621X. (Published) https://doi.org/10.30880/jscdm.2024.05.02.016 https://doi.org/10.30880/jscdm.2024.05.02.016
spellingShingle Q Science (General)
QA Mathematics
Siti, Mujilahwati
Noor Zuraidin, Mohd Safar
Ku Muhammad Naim, Ku Khalif
Nasyitah, Ghazalli
Optimizing sentiment analysis of Indonesian texts: Enhancing deep learning models with genetic algorithm-based feature selection
title Optimizing sentiment analysis of Indonesian texts: Enhancing deep learning models with genetic algorithm-based feature selection
title_full Optimizing sentiment analysis of Indonesian texts: Enhancing deep learning models with genetic algorithm-based feature selection
title_fullStr Optimizing sentiment analysis of Indonesian texts: Enhancing deep learning models with genetic algorithm-based feature selection
title_full_unstemmed Optimizing sentiment analysis of Indonesian texts: Enhancing deep learning models with genetic algorithm-based feature selection
title_short Optimizing sentiment analysis of Indonesian texts: Enhancing deep learning models with genetic algorithm-based feature selection
title_sort optimizing sentiment analysis of indonesian texts: enhancing deep learning models with genetic algorithm-based feature selection
topic Q Science (General)
QA Mathematics
url http://umpir.ump.edu.my/id/eprint/43886/
http://umpir.ump.edu.my/id/eprint/43886/
http://umpir.ump.edu.my/id/eprint/43886/
http://umpir.ump.edu.my/id/eprint/43886/1/Optimizing%20sentiment%20analysis%20of%20indonesian%20texts.pdf