Clustering breast cancer data by consensus of different validity indices

Clustering algorithms will, in general, either partition a given data set into a pre-specified number of clusters or will produce a hierarchy of clusters. In this paper we analyse several different clustering techniques and apply them to a particular data set of breast cancer data. When we do not kn...

Full description

Bibliographic Details
Main Authors: Soria, Daniele, Garibaldi, Jonathan M., Ambrogi, Federico, Lisboa, Paulo J.G., Boracchi, Patrizia, Biganzoli, Elia M.
Format: Conference or Workshop Item
Published: IET Digital Library 2008
Subjects:
Online Access:https://eprints.nottingham.ac.uk/28148/
Description
Summary:Clustering algorithms will, in general, either partition a given data set into a pre-specified number of clusters or will produce a hierarchy of clusters. In this paper we analyse several different clustering techniques and apply them to a particular data set of breast cancer data. When we do not know a priori which is the best number of groups, we use a range of different validity indices to test the quality of clustering results and to determine the best number of clusters. While for the K-means method there is not absolute agreement among the indices as to which is the best number of clusters, for the PAM algorithm all the indices indicate 4 as the best cluster number.