MRPR: a MapReduce solution for prototype reduction in big data classification

In the era of big data, analyzing and extracting knowledge from large-scale data sets is a very interesting and challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable mining method that embraces the huge storage and pr...

Full description

Bibliographic Details
Main Authors:	Triguero, Isaac, Peralta, Daniel, Bacardit, Jaume, García, Salvador, Herrera, Francisco
Format:	Article
Published:	Elsevier 2015
Subjects:	Big data Mahout Hadoop Prototype reduction Prototype generation Nearest neighbor classification
Online Access:	https://eprints.nottingham.ac.uk/45415/

_version_	1848797127028768768
author	Triguero, Isaac Peralta, Daniel Bacardit, Jaume García, Salvador Herrera, Francisco
author_facet	Triguero, Isaac Peralta, Daniel Bacardit, Jaume García, Salvador Herrera, Francisco
author_sort	Triguero, Isaac
building	Nottingham Research Data Repository
collection	Online Access
description	In the era of big data, analyzing and extracting knowledge from large-scale data sets is a very interesting and challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable mining method that embraces the huge storage and processing capacity of cloud platforms is required. In this work, we propose a novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification. These methods aim at representing original training data sets as a reduced number of instances. Their main purposes are to speed up the classification process and reduce the storage requirements and sensitivity to noise of the nearest neighbor rule. However, the standard prototype reduction methods cannot cope with very large data sets. To overcome this limitation, we develop a MapReduce-based framework to distribute the functioning of these algorithms through a cluster of computing elements, proposing several algorithmic strategies to integrate multiple partial solutions (reduced sets of prototypes) into a single one. The proposed model enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss. We test the speeding up capabilities of our model with data sets up to 5.7 millions of instances. The results show that this model is a suitable tool to enhance the performance of the nearest neighbor classifier with big data.
first_indexed	2025-11-14T19:58:56Z
format	Article
id	nottingham-45415
institution	University of Nottingham Malaysia Campus
institution_category	Local University
last_indexed	2025-11-14T19:58:56Z
publishDate	2015
publisher	Elsevier
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-454152020-05-04T17:02:22Z https://eprints.nottingham.ac.uk/45415/ MRPR: a MapReduce solution for prototype reduction in big data classification Triguero, Isaac Peralta, Daniel Bacardit, Jaume García, Salvador Herrera, Francisco In the era of big data, analyzing and extracting knowledge from large-scale data sets is a very interesting and challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable mining method that embraces the huge storage and processing capacity of cloud platforms is required. In this work, we propose a novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification. These methods aim at representing original training data sets as a reduced number of instances. Their main purposes are to speed up the classification process and reduce the storage requirements and sensitivity to noise of the nearest neighbor rule. However, the standard prototype reduction methods cannot cope with very large data sets. To overcome this limitation, we develop a MapReduce-based framework to distribute the functioning of these algorithms through a cluster of computing elements, proposing several algorithmic strategies to integrate multiple partial solutions (reduced sets of prototypes) into a single one. The proposed model enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss. We test the speeding up capabilities of our model with data sets up to 5.7 millions of instances. The results show that this model is a suitable tool to enhance the performance of the nearest neighbor classifier with big data. Elsevier 2015-02-20 Article PeerReviewed Triguero, Isaac, Peralta, Daniel, Bacardit, Jaume, García, Salvador and Herrera, Francisco (2015) MRPR: a MapReduce solution for prototype reduction in big data classification. Neurocomputing, 150 (A). pp. 331-345. ISSN 0925-2312 Big data Mahout Hadoop Prototype reduction Prototype generation Nearest neighbor classification http://www.sciencedirect.com/science/article/pii/S0925231214013009?via%3Dihub doi:10.1016/j.neucom.2014.04.078 doi:10.1016/j.neucom.2014.04.078
spellingShingle	Big data Mahout Hadoop Prototype reduction Prototype generation Nearest neighbor classification Triguero, Isaac Peralta, Daniel Bacardit, Jaume García, Salvador Herrera, Francisco MRPR: a MapReduce solution for prototype reduction in big data classification
title	MRPR: a MapReduce solution for prototype reduction in big data classification
title_full	MRPR: a MapReduce solution for prototype reduction in big data classification
title_fullStr	MRPR: a MapReduce solution for prototype reduction in big data classification
title_full_unstemmed	MRPR: a MapReduce solution for prototype reduction in big data classification
title_short	MRPR: a MapReduce solution for prototype reduction in big data classification
title_sort	mrpr: a mapreduce solution for prototype reduction in big data classification
topic	Big data Mahout Hadoop Prototype reduction Prototype generation Nearest neighbor classification
url	https://eprints.nottingham.ac.uk/45415/ https://eprints.nottingham.ac.uk/45415/ https://eprints.nottingham.ac.uk/45415/

MRPR: a MapReduce solution for prototype reduction in big data classification

Similar Items