Ensemble learning for multidimensional poverty classification

The poverty rate in Malaysia is determined through financial or income indices and measurements. As such, periodic measurements are conducted through Household Expenditure and Income Survey (HEIS) twice every five years, and subsequently used to generate a Poverty Line Income (PLI) to determine pove...

Full description

Bibliographic Details
Main Authors: Azuraliza Abu Bakar, Rusnita Hamdan, Nor Samsiah Sani
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2020
Online Access:http://journalarticle.ukm.my/14778/
http://journalarticle.ukm.my/14778/1/ARTIKEL%2024.pdf
_version_ 1848813640737619968
author Azuraliza Abu Bakar,
Rusnita Hamdan,
Nor Samsiah Sani,
author_facet Azuraliza Abu Bakar,
Rusnita Hamdan,
Nor Samsiah Sani,
author_sort Azuraliza Abu Bakar,
building UKM Institutional Repository
collection Online Access
description The poverty rate in Malaysia is determined through financial or income indices and measurements. As such, periodic measurements are conducted through Household Expenditure and Income Survey (HEIS) twice every five years, and subsequently used to generate a Poverty Line Income (PLI) to determine poverty levels through statistical methods. Such uni-dimensional measurement however is unable to portray the overall deprivation conditions, especially based on the experience of the urban population. In addition, the United Nation Development Programme (UNDP) has introduced a set of multi-dimensional poverty measurements but is yet to be applied in the case of Malaysia. In view of this, a potential use of Machine Learning (ML) approaches that can produce new poverty measurement methods is therefore of interest, which must be triggered by the existence of a rich database collection on poverty, such as the eKasih database maintained by the Malaysian Government. The goal of this study was to determine whether ensemble learning method (random forest) can classify poverty and hence produce multidimensional poverty indicator compared to based learner method using eKasih dataset. CRoss Industry Standard Process for Data Mining (CRISP-DM) methods was used to ensure data mining and ML processes were conducted properly. Beside Random Forest, we also examined decision tree and general linear methods to benchmark their performance and determine the method with the highest accuracy. Fifteen variables were then rank using varImp method to search for important variables. Analysis of this study showed that Per Capita Income, State, Ethnic, Strata, Religion, Occupation and Education were found to be the most important variables in the classification of poverty at a rate of 99% accuracy confidence using Random Forest algorithm.
first_indexed 2025-11-15T00:21:24Z
format Article
id oai:generic.eprints.org:14778
institution Universiti Kebangasaan Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T00:21:24Z
publishDate 2020
publisher Penerbit Universiti Kebangsaan Malaysia
recordtype eprints
repository_type Digital Repository
spelling oai:generic.eprints.org:147782020-06-23T01:15:29Z http://journalarticle.ukm.my/14778/ Ensemble learning for multidimensional poverty classification Azuraliza Abu Bakar, Rusnita Hamdan, Nor Samsiah Sani, The poverty rate in Malaysia is determined through financial or income indices and measurements. As such, periodic measurements are conducted through Household Expenditure and Income Survey (HEIS) twice every five years, and subsequently used to generate a Poverty Line Income (PLI) to determine poverty levels through statistical methods. Such uni-dimensional measurement however is unable to portray the overall deprivation conditions, especially based on the experience of the urban population. In addition, the United Nation Development Programme (UNDP) has introduced a set of multi-dimensional poverty measurements but is yet to be applied in the case of Malaysia. In view of this, a potential use of Machine Learning (ML) approaches that can produce new poverty measurement methods is therefore of interest, which must be triggered by the existence of a rich database collection on poverty, such as the eKasih database maintained by the Malaysian Government. The goal of this study was to determine whether ensemble learning method (random forest) can classify poverty and hence produce multidimensional poverty indicator compared to based learner method using eKasih dataset. CRoss Industry Standard Process for Data Mining (CRISP-DM) methods was used to ensure data mining and ML processes were conducted properly. Beside Random Forest, we also examined decision tree and general linear methods to benchmark their performance and determine the method with the highest accuracy. Fifteen variables were then rank using varImp method to search for important variables. Analysis of this study showed that Per Capita Income, State, Ethnic, Strata, Religion, Occupation and Education were found to be the most important variables in the classification of poverty at a rate of 99% accuracy confidence using Random Forest algorithm. Penerbit Universiti Kebangsaan Malaysia 2020-02 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/14778/1/ARTIKEL%2024.pdf Azuraliza Abu Bakar, and Rusnita Hamdan, and Nor Samsiah Sani, (2020) Ensemble learning for multidimensional poverty classification. Sains Malaysiana, 49 (2). pp. 447-459. ISSN 0126-6039 http://www.ukm.my/jsm/malay_journals/jilid49bil2_2020/KandunganJilid49Bil2_2020.html
spellingShingle Azuraliza Abu Bakar,
Rusnita Hamdan,
Nor Samsiah Sani,
Ensemble learning for multidimensional poverty classification
title Ensemble learning for multidimensional poverty classification
title_full Ensemble learning for multidimensional poverty classification
title_fullStr Ensemble learning for multidimensional poverty classification
title_full_unstemmed Ensemble learning for multidimensional poverty classification
title_short Ensemble learning for multidimensional poverty classification
title_sort ensemble learning for multidimensional poverty classification
url http://journalarticle.ukm.my/14778/
http://journalarticle.ukm.my/14778/
http://journalarticle.ukm.my/14778/1/ARTIKEL%2024.pdf