Chinese character recognition using non-negative matrix factorization

Non-negative matrix factorization (NMF) was introduced by Paatero and Tapper in 1994 and it was a general way of reducing the dimension of the matrix with non-negative entries. Non-negative matrix factorization is very useful in many data analysis applications such as character recognition, text min...

Full description

Bibliographic Details
Main Authors: Chen, Huey Voon, Tang, Ker Shin, Ng, Wei Shean
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2024
Online Access:http://journalarticle.ukm.my/25254/
http://journalarticle.ukm.my/25254/1/kejut_24.pdf
_version_ 1848816308601225216
author Chen, Huey Voon
Tang, Ker Shin
Ng, Wei Shean
author_facet Chen, Huey Voon
Tang, Ker Shin
Ng, Wei Shean
author_sort Chen, Huey Voon
building UKM Institutional Repository
collection Online Access
description Non-negative matrix factorization (NMF) was introduced by Paatero and Tapper in 1994 and it was a general way of reducing the dimension of the matrix with non-negative entries. Non-negative matrix factorization is very useful in many data analysis applications such as character recognition, text mining, and others. This paper aims to study the application in Chinese character recognition using non-negative matrix factorization. Python was used to carry out the LU factorization and non-negative matrix factorization of a Chinese character in Boolean Matrix. Preliminary analysis confirmed that the data size of and and are chosen for the NMF of the Boolean matrix. In this project, one hundred printed Chinese characters were selected, and all the Chinese characters can be categorized into ten categories according to the number of strokes , for . The Euclidean distance between the Boolean matrix of a Chinese character and the matrix after both LU factorization and NMF is calculated for further analysis. Paired t-test confirmed that the factorization of Chinese characters in the Boolean matrix using NMF is better than the LU factorization. Finally, ten handwritten Chinese characters were selected to test whether the program is able to identify the handwritten and the printed Chinese characters. Experimental results showed that 70% of the characters can be recognized via the least Euclidean distance obtained. NMF is suitable to be applied in Chinese character recognition since it can reduce the dimension of the image and the error between the original Boolean matrix and after NMF is less than 5%.
first_indexed 2025-11-15T01:03:49Z
format Article
id oai:generic.eprints.org:25254
institution Universiti Kebangasaan Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T01:03:49Z
publishDate 2024
publisher Penerbit Universiti Kebangsaan Malaysia
recordtype eprints
repository_type Digital Repository
spelling oai:generic.eprints.org:252542025-05-23T13:30:59Z http://journalarticle.ukm.my/25254/ Chinese character recognition using non-negative matrix factorization Chen, Huey Voon Tang, Ker Shin Ng, Wei Shean Non-negative matrix factorization (NMF) was introduced by Paatero and Tapper in 1994 and it was a general way of reducing the dimension of the matrix with non-negative entries. Non-negative matrix factorization is very useful in many data analysis applications such as character recognition, text mining, and others. This paper aims to study the application in Chinese character recognition using non-negative matrix factorization. Python was used to carry out the LU factorization and non-negative matrix factorization of a Chinese character in Boolean Matrix. Preliminary analysis confirmed that the data size of and and are chosen for the NMF of the Boolean matrix. In this project, one hundred printed Chinese characters were selected, and all the Chinese characters can be categorized into ten categories according to the number of strokes , for . The Euclidean distance between the Boolean matrix of a Chinese character and the matrix after both LU factorization and NMF is calculated for further analysis. Paired t-test confirmed that the factorization of Chinese characters in the Boolean matrix using NMF is better than the LU factorization. Finally, ten handwritten Chinese characters were selected to test whether the program is able to identify the handwritten and the printed Chinese characters. Experimental results showed that 70% of the characters can be recognized via the least Euclidean distance obtained. NMF is suitable to be applied in Chinese character recognition since it can reduce the dimension of the image and the error between the original Boolean matrix and after NMF is less than 5%. Penerbit Universiti Kebangsaan Malaysia 2024 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/25254/1/kejut_24.pdf Chen, Huey Voon and Tang, Ker Shin and Ng, Wei Shean (2024) Chinese character recognition using non-negative matrix factorization. Jurnal Kejuruteraan, 36 (2). pp. 653-660. ISSN 0128-0198 https://www.ukm.my/jkukm/volume-3602-2024/
spellingShingle Chen, Huey Voon
Tang, Ker Shin
Ng, Wei Shean
Chinese character recognition using non-negative matrix factorization
title Chinese character recognition using non-negative matrix factorization
title_full Chinese character recognition using non-negative matrix factorization
title_fullStr Chinese character recognition using non-negative matrix factorization
title_full_unstemmed Chinese character recognition using non-negative matrix factorization
title_short Chinese character recognition using non-negative matrix factorization
title_sort chinese character recognition using non-negative matrix factorization
url http://journalarticle.ukm.my/25254/
http://journalarticle.ukm.my/25254/
http://journalarticle.ukm.my/25254/1/kejut_24.pdf