Analysis Of Failure In Offline English Alphabet Recognition With Data Mining Approach
Offline handwriting recognition is a long existing approach to identify the handwritten phrase, letters or digits. Earlier studies in the handwriting recognition field were mostly focused on recognizing characters using Neural Network Language Model (NNLM) classifier, Hidden Markov Model (HMM)...
| Main Author: | |
|---|---|
| Format: | Monograph |
| Language: | English |
| Published: |
Universiti Sains Malaysia
2019
|
| Subjects: | |
| Online Access: | http://eprints.usm.my/58271/ http://eprints.usm.my/58271/1/Analysis%20Of%20Failure%20In%20Offline%20English%20Alphabet%20Recognition%20With%20Data%20Mining%20Approach.pdf |
| Summary: | Offline handwriting recognition is a long existing approach to identify the handwritten
phrase, letters or digits. Earlier studies in the handwriting recognition field were mostly
focused on recognizing characters using Neural Network Language Model (NNLM)
classifier, Hidden Markov Model (HMM), and Support Vector Machine (SVM) with
segmentation technique, Hough Transform method, and structural features. However,
these approaches involve complex algorithms and require voluminous dataset as the
training model. Therefore, this study attempts a data mining approach to the analysis
of failure in offline English alphabet recognition. The objectives of the study are to
improve the pattern recognition approach for classifying English alphabets and to
determine the root of classification failure in handwritten English alphabets.
Handwritten data of capital letters of the English alphabet by 50 Universiti Sains
Malaysia student experimented. The data was pre-processed to remove the outliers
prior to classification analysis with the aid of the Waikato Environment for Knowledge
Analysis (WEKA) tool. Classification analysis was initially performed on all seven
classifier’s algorithms at 10-fold dross validation mode. At phase one, Stroke and
Curve are added into the dataset and classified respectively. At phase two, Sharp
Vertex, Closed Region, and Points are added in the dataset. The top three classification
algorithms were selected: IBk, LMT and Random Committee for further classification.
The classified result was further analyzed to identify the root of classification errors.
At the raw dataset classification, the classification accuracy is low with 25%. As the
attributes are added to raw dataset respectively, the accuracy of classification was
successfully increased to 89%. Conclusively, the accuracy of the classification
depends on the added attributes to distinguish characteristics of the alphabets. |
|---|