A first attempt on global evolutionary undersampling for imbalanced big data

The design of efficient big data learning models has become a common need in a great number of applications. The massive amounts of available data may hinder the use of traditional data mining techniques, especially when evolutionary algorithms are involved as a key step. Existing solutions typicall...

Full description

Bibliographic Details
Main Authors:	Triguero, Isaac, Galar, M., Bustince, H., Herrera, Francisco
Format:	Conference or Workshop Item
Published:	2017
Online Access:	https://eprints.nottingham.ac.uk/44071/

_version_	1848796830483087360
author	Triguero, Isaac Galar, M. Bustince, H. Herrera, Francisco
author_facet	Triguero, Isaac Galar, M. Bustince, H. Herrera, Francisco
author_sort	Triguero, Isaac
building	Nottingham Research Data Repository
collection	Online Access
description	The design of efficient big data learning models has become a common need in a great number of applications. The massive amounts of available data may hinder the use of traditional data mining techniques, especially when evolutionary algorithms are involved as a key step. Existing solutions typically follow a divide-and-conquer approach in which the data is split into several chunks that are addressed individually. Next, the partial knowledge acquired from every slice of data is aggregated in multiple ways to solve the entire problem. However, these approaches are missing a global view of the data as a whole, which may result in less accurate models. In this work we carry out a first attempt on the design of a global evolutionary undersampling model for imbalanced classification problems. These are characterised by having a highly skewed distribution of classes in which evolutionary models are being used to balance it by selecting only the most relevant data. Using Apache Spark as big data technology, we have introduced a number of variations to the well-known CHC algorithm to work very large chromosomes and reduce the costs associated to fitness evaluation. We discuss some preliminary results, showing the great potential of this new kind of evolutionary big data model.
first_indexed	2025-11-14T19:54:13Z
format	Conference or Workshop Item
id	nottingham-44071
institution	University of Nottingham Malaysia Campus
institution_category	Local University
last_indexed	2025-11-14T19:54:13Z
publishDate	2017
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-440712020-05-04T18:54:30Z https://eprints.nottingham.ac.uk/44071/ A first attempt on global evolutionary undersampling for imbalanced big data Triguero, Isaac Galar, M. Bustince, H. Herrera, Francisco The design of efficient big data learning models has become a common need in a great number of applications. The massive amounts of available data may hinder the use of traditional data mining techniques, especially when evolutionary algorithms are involved as a key step. Existing solutions typically follow a divide-and-conquer approach in which the data is split into several chunks that are addressed individually. Next, the partial knowledge acquired from every slice of data is aggregated in multiple ways to solve the entire problem. However, these approaches are missing a global view of the data as a whole, which may result in less accurate models. In this work we carry out a first attempt on the design of a global evolutionary undersampling model for imbalanced classification problems. These are characterised by having a highly skewed distribution of classes in which evolutionary models are being used to balance it by selecting only the most relevant data. Using Apache Spark as big data technology, we have introduced a number of variations to the well-known CHC algorithm to work very large chromosomes and reduce the costs associated to fitness evaluation. We discuss some preliminary results, showing the great potential of this new kind of evolutionary big data model. 2017-07-07 Conference or Workshop Item PeerReviewed Triguero, Isaac, Galar, M., Bustince, H. and Herrera, Francisco (2017) A first attempt on global evolutionary undersampling for imbalanced big data. In: IEEE Congress on Evolutionary Computation (CEC 2017), 5-8 Jun 2017, San Sebastian, Spain. http://ieeexplore.ieee.org/document/7969553/
spellingShingle	Triguero, Isaac Galar, M. Bustince, H. Herrera, Francisco A first attempt on global evolutionary undersampling for imbalanced big data
title	A first attempt on global evolutionary undersampling for imbalanced big data
title_full	A first attempt on global evolutionary undersampling for imbalanced big data
title_fullStr	A first attempt on global evolutionary undersampling for imbalanced big data
title_full_unstemmed	A first attempt on global evolutionary undersampling for imbalanced big data
title_short	A first attempt on global evolutionary undersampling for imbalanced big data
title_sort	first attempt on global evolutionary undersampling for imbalanced big data
url	https://eprints.nottingham.ac.uk/44071/ https://eprints.nottingham.ac.uk/44071/

A first attempt on global evolutionary undersampling for imbalanced big data

Similar Items