Optimization of k-Nearest Neighbour to categorize Indonesian’s news articles
Text classification is the process of grouping documents based on similarity in categories. Some of the obstacles in doing text classification are many words appeared in the text, and some words come up with infrequent frequency (sparse words). The way to solve this problem is to conduct the fea...
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Penerbit Universiti Kebangsaan Malaysia
2021
|
| Online Access: | http://journalarticle.ukm.my/16843/ http://journalarticle.ukm.my/16843/1/04.pdf |
| Summary: | Text classification is the process of grouping documents based on similarity in categories. Some of the obstacles
in doing text classification are many words appeared in the text, and some words come up with infrequent
frequency (sparse words). The way to solve this problem is to conduct the feature selection process. There are
several filter-based feature selection methods; some are Chi-Square, Information Gain, Genetic Algorithm, and
Particle Swarm Optimization (PSO). Aghdam's research shows that PSO is the best among those methods. This
study examined PSO to optimize the k-Nearest Neighbour (k-NN) algorithm's performance in categorizing news
articles. k-NN is an algorithm that is simple and easy to implement. If we use the appropriate features, then the k-NN will be a reliable algorithm. PSO algorithm is used to select keywords (term features), and it is continued with
classifying the documents using k-NN. The testing process consists of three stages. The stages are tuning the
parameter of k-NN, the parameter of PSO, and measuring the testing performance. The parameter tuning process
aims to determine the number of neighbours used in k-NN and optimize the PSO particles. Otherwise, the
performance testing compares the performance of k-NN with and without using PSO. The optimal number of
neighbours is 9, with the number of particles is 50. The testing showed that using the k-NN with PSO and a 50%
reduction in terms. The results 20 per cent better accuracy than k-NN without PSO. Although the PSO's process
did not always find the optimal conditions, the k-NN method can produce better accuracy. In this way, the k-NN
method can work better in grouping news articles, especially in Indonesian language news articles. |
|---|