A novel framework to elucidate core classes in a dataset

In this paper we present an original framework to extract representative groups from a dataset, and we validate it over a novel case study. The framework specifies the application of different clustering algorithms, then several statistical and visualisation techniques are used to characterise the...

Full description

Bibliographic Details
Main Authors: Soria, Daniele, Garibaldi, Jonathan M.
Format: Conference or Workshop Item
Published: 2010
Online Access:https://eprints.nottingham.ac.uk/28139/
_version_ 1848793516120997888
author Soria, Daniele
Garibaldi, Jonathan M.
author_facet Soria, Daniele
Garibaldi, Jonathan M.
author_sort Soria, Daniele
building Nottingham Research Data Repository
collection Online Access
description In this paper we present an original framework to extract representative groups from a dataset, and we validate it over a novel case study. The framework specifies the application of different clustering algorithms, then several statistical and visualisation techniques are used to characterise the results, and core classes are defined by consensus clustering. Classes may be verified using supervised classification algorithms to obtain a set of rules which may be useful for new data points in the future. This framework is validated over a novel set of histone markers for breast cancer patients. From a technical perspective, the resultant classes are well separated and characterised by low, medium and high levels of biological markers. Clinically, the groups appear to distinguish patients with poor overall survival from those with low grading score and better survival. Overall, this framework offers a promising methodology for elucidating core consensus groups from data.
first_indexed 2025-11-14T19:01:32Z
format Conference or Workshop Item
id nottingham-28139
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T19:01:32Z
publishDate 2010
recordtype eprints
repository_type Digital Repository
spelling nottingham-281392020-05-04T20:25:54Z https://eprints.nottingham.ac.uk/28139/ A novel framework to elucidate core classes in a dataset Soria, Daniele Garibaldi, Jonathan M. In this paper we present an original framework to extract representative groups from a dataset, and we validate it over a novel case study. The framework specifies the application of different clustering algorithms, then several statistical and visualisation techniques are used to characterise the results, and core classes are defined by consensus clustering. Classes may be verified using supervised classification algorithms to obtain a set of rules which may be useful for new data points in the future. This framework is validated over a novel set of histone markers for breast cancer patients. From a technical perspective, the resultant classes are well separated and characterised by low, medium and high levels of biological markers. Clinically, the groups appear to distinguish patients with poor overall survival from those with low grading score and better survival. Overall, this framework offers a promising methodology for elucidating core consensus groups from data. 2010 Conference or Workshop Item PeerReviewed Soria, Daniele and Garibaldi, Jonathan M. (2010) A novel framework to elucidate core classes in a dataset. In: IEEE Congress on Evolutionary Computation (CEC) 2010, 18-23 July 2010, Barcelona, Spain. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5586331
spellingShingle Soria, Daniele
Garibaldi, Jonathan M.
A novel framework to elucidate core classes in a dataset
title A novel framework to elucidate core classes in a dataset
title_full A novel framework to elucidate core classes in a dataset
title_fullStr A novel framework to elucidate core classes in a dataset
title_full_unstemmed A novel framework to elucidate core classes in a dataset
title_short A novel framework to elucidate core classes in a dataset
title_sort novel framework to elucidate core classes in a dataset
url https://eprints.nottingham.ac.uk/28139/
https://eprints.nottingham.ac.uk/28139/