The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification
Sentiment classification is increasingly used to automatically identify a positive or negative sentiment in the opinionated text document, for instance, customer feedback or review. Feature selection has always been a critical and challenging problem in machine learning-based sentiment classificatio...
| Main Author: | |
|---|---|
| Format: | Thesis |
| Language: | English |
| Published: |
2022
|
| Subjects: | |
| Online Access: | http://umpir.ump.edu.my/id/eprint/37676/ http://umpir.ump.edu.my/id/eprint/37676/1/ir.The%20hybrid%20feature%20selection%20technique%20using%20term%20frequency-inverse%20document%20frequency%20and%20support%20vector%20machine-recursive%20feature%20elimination%20for%20sentiment%20classification.pdf |
| _version_ | 1848825316309467136 |
|---|---|
| author | Nur Syafiqah, Mohd Nafis |
| author_facet | Nur Syafiqah, Mohd Nafis |
| author_sort | Nur Syafiqah, Mohd Nafis |
| building | UMP Institutional Repository |
| collection | Online Access |
| description | Sentiment classification is increasingly used to automatically identify a positive or negative sentiment in the opinionated text document, for instance, customer feedback or review. Feature selection has always been a critical and challenging problem in machine learning-based sentiment classification. Hybrid feature selection is an efficient technique in sentiment classification. However, there are several disadvantages that can be solved. Firstly, the ability to identify feature importance and reduce some features from opinionated text documents. The failure to address this issue will result in poor classification performance. Therefore, this research aims to improve the classification performances by proposing term frequency-inverse document frequency (TF-IDF) and support vector machine-recursive feature elimination (SVM-RFE) as a hybrid feature selection technique. The TF-IDF evaluates the feature importance, and the standard deviation-based threshold is used for feature reduction. The objective is to improve the conventional approach of reducing features from feature matrix. Later, the SVM-RFE re-evaluates and ranks the remaining features from TF-IDF-based feature matrix. Only the k-top features group from the SVM-RFE ranked features were used for sentiment classification. Finally, the support vector machine (SVM) classifier is employed to classify the English customer review datasets, i.e., opinion-labelled, and large IMDb. The performance was measured using accuracy, precision, recall, F-measure, and feature size reduction. The experimental results present promising performances up to 95.06% in the performance measurements, especially from the large IMDb datasets and additional dataset, hotel review. Consequently, the proposed technique could minimise 31.80% to 64.00% of the features during classification. This reduction rate is significant in optimally utilising the computational resources while preserving the efficiency of the classification performance. |
| first_indexed | 2025-11-15T03:26:59Z |
| format | Thesis |
| id | ump-37676 |
| institution | Universiti Malaysia Pahang |
| institution_category | Local University |
| language | English |
| last_indexed | 2025-11-15T03:26:59Z |
| publishDate | 2022 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | ump-376762023-09-19T01:09:17Z http://umpir.ump.edu.my/id/eprint/37676/ The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification Nur Syafiqah, Mohd Nafis Q Science (General) QA75 Electronic computers. Computer science Sentiment classification is increasingly used to automatically identify a positive or negative sentiment in the opinionated text document, for instance, customer feedback or review. Feature selection has always been a critical and challenging problem in machine learning-based sentiment classification. Hybrid feature selection is an efficient technique in sentiment classification. However, there are several disadvantages that can be solved. Firstly, the ability to identify feature importance and reduce some features from opinionated text documents. The failure to address this issue will result in poor classification performance. Therefore, this research aims to improve the classification performances by proposing term frequency-inverse document frequency (TF-IDF) and support vector machine-recursive feature elimination (SVM-RFE) as a hybrid feature selection technique. The TF-IDF evaluates the feature importance, and the standard deviation-based threshold is used for feature reduction. The objective is to improve the conventional approach of reducing features from feature matrix. Later, the SVM-RFE re-evaluates and ranks the remaining features from TF-IDF-based feature matrix. Only the k-top features group from the SVM-RFE ranked features were used for sentiment classification. Finally, the support vector machine (SVM) classifier is employed to classify the English customer review datasets, i.e., opinion-labelled, and large IMDb. The performance was measured using accuracy, precision, recall, F-measure, and feature size reduction. The experimental results present promising performances up to 95.06% in the performance measurements, especially from the large IMDb datasets and additional dataset, hotel review. Consequently, the proposed technique could minimise 31.80% to 64.00% of the features during classification. This reduction rate is significant in optimally utilising the computational resources while preserving the efficiency of the classification performance. 2022-10 Thesis NonPeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/37676/1/ir.The%20hybrid%20feature%20selection%20technique%20using%20term%20frequency-inverse%20document%20frequency%20and%20support%20vector%20machine-recursive%20feature%20elimination%20for%20sentiment%20classification.pdf Nur Syafiqah, Mohd Nafis (2022) The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification. PhD thesis, Universiti Malaysia Pahang (Contributors, Thesis advisor: Suryanti, Awang). |
| spellingShingle | Q Science (General) QA75 Electronic computers. Computer science Nur Syafiqah, Mohd Nafis The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification |
| title | The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification |
| title_full | The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification |
| title_fullStr | The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification |
| title_full_unstemmed | The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification |
| title_short | The hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification |
| title_sort | hybrid feature selection technique using term frequency-inverse document frequency and support vector machine-recursive feature elimination for sentiment classification |
| topic | Q Science (General) QA75 Electronic computers. Computer science |
| url | http://umpir.ump.edu.my/id/eprint/37676/ http://umpir.ump.edu.my/id/eprint/37676/1/ir.The%20hybrid%20feature%20selection%20technique%20using%20term%20frequency-inverse%20document%20frequency%20and%20support%20vector%20machine-recursive%20feature%20elimination%20for%20sentiment%20classification.pdf |