Clustering mixed-type data via Dirichlet process mixture model with cluster-specific covariance matrices

Many studies have shown successful applications of the Dirichlet process mixture model (DPMM) for clustering continuous data. Beyond continuous data, in practice, one can expect to see different data types, including ordinal and nominal data. Existing DPMMs for clustering mixed-type data assume a st...

Full description

Bibliographic Details
Main Authors: Burhanuddin, Nurul Afiqah, Ibrahim, Kamarulzaman, Zulkafli, Hani Syahida, Mustapha, Norwati
Format: Article
Language:English
Published: Multidisciplinary Digital Publishing Institute (MDPI) 2024
Online Access:http://psasir.upm.edu.my/id/eprint/113587/
http://psasir.upm.edu.my/id/eprint/113587/1/113587.pdf
_version_ 1848866267855847424
author Burhanuddin, Nurul Afiqah
Ibrahim, Kamarulzaman
Zulkafli, Hani Syahida
Mustapha, Norwati
author_facet Burhanuddin, Nurul Afiqah
Ibrahim, Kamarulzaman
Zulkafli, Hani Syahida
Mustapha, Norwati
author_sort Burhanuddin, Nurul Afiqah
building UPM Institutional Repository
collection Online Access
description Many studies have shown successful applications of the Dirichlet process mixture model (DPMM) for clustering continuous data. Beyond continuous data, in practice, one can expect to see different data types, including ordinal and nominal data. Existing DPMMs for clustering mixed-type data assume a strict covariance matrix structure, resulting in an overfit model. This article explores a DPMM for mixed-type data that allows the covariance matrix to differ from one cluster to another. We assume an underlying latent variable framework for ordinal and nominal data, which is then modeled jointly with the continuous data. The identifiability issue on the covariance matrix poses computational challenges, thus requiring a nonstandard inferential algorithm. The applicability and flexibility of the proposed model are illustrated through simulation examples and real data applications.
first_indexed 2025-11-15T14:17:54Z
format Article
id upm-113587
institution Universiti Putra Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T14:17:54Z
publishDate 2024
publisher Multidisciplinary Digital Publishing Institute (MDPI)
recordtype eprints
repository_type Digital Repository
spelling upm-1135872024-11-14T04:00:15Z http://psasir.upm.edu.my/id/eprint/113587/ Clustering mixed-type data via Dirichlet process mixture model with cluster-specific covariance matrices Burhanuddin, Nurul Afiqah Ibrahim, Kamarulzaman Zulkafli, Hani Syahida Mustapha, Norwati Many studies have shown successful applications of the Dirichlet process mixture model (DPMM) for clustering continuous data. Beyond continuous data, in practice, one can expect to see different data types, including ordinal and nominal data. Existing DPMMs for clustering mixed-type data assume a strict covariance matrix structure, resulting in an overfit model. This article explores a DPMM for mixed-type data that allows the covariance matrix to differ from one cluster to another. We assume an underlying latent variable framework for ordinal and nominal data, which is then modeled jointly with the continuous data. The identifiability issue on the covariance matrix poses computational challenges, thus requiring a nonstandard inferential algorithm. The applicability and flexibility of the proposed model are illustrated through simulation examples and real data applications. Multidisciplinary Digital Publishing Institute (MDPI) 2024 Article PeerReviewed text en cc_by_4 http://psasir.upm.edu.my/id/eprint/113587/1/113587.pdf Burhanuddin, Nurul Afiqah and Ibrahim, Kamarulzaman and Zulkafli, Hani Syahida and Mustapha, Norwati (2024) Clustering mixed-type data via Dirichlet process mixture model with cluster-specific covariance matrices. Symmetry, 16 (6). art. no. 712. ISSN 2073-8994; eISSN: 2073-8994 https://www.mdpi.com/2073-8994/16/6/712 10.3390/sym16060712
spellingShingle Burhanuddin, Nurul Afiqah
Ibrahim, Kamarulzaman
Zulkafli, Hani Syahida
Mustapha, Norwati
Clustering mixed-type data via Dirichlet process mixture model with cluster-specific covariance matrices
title Clustering mixed-type data via Dirichlet process mixture model with cluster-specific covariance matrices
title_full Clustering mixed-type data via Dirichlet process mixture model with cluster-specific covariance matrices
title_fullStr Clustering mixed-type data via Dirichlet process mixture model with cluster-specific covariance matrices
title_full_unstemmed Clustering mixed-type data via Dirichlet process mixture model with cluster-specific covariance matrices
title_short Clustering mixed-type data via Dirichlet process mixture model with cluster-specific covariance matrices
title_sort clustering mixed-type data via dirichlet process mixture model with cluster-specific covariance matrices
url http://psasir.upm.edu.my/id/eprint/113587/
http://psasir.upm.edu.my/id/eprint/113587/
http://psasir.upm.edu.my/id/eprint/113587/
http://psasir.upm.edu.my/id/eprint/113587/1/113587.pdf