From Big data to Smart Data with the K-Nearest Neighbours algorithm

The k-nearest neighbours algorithm is one of the most widely used data mining models because of its simplicity and accurate results. However, when it comes to deal with big datasets, with potentially noisy and missing information, this technique becomes ineffective and inefficient. Due to its drawba...

Full description

Bibliographic Details
Main Authors:	Triguero, Isaac, Maillo, Jesus, Luengo, Julian, García, Salvador, Herrera, Francisco
Format:	Conference or Workshop Item
Published:	2016
Subjects:	k-Nearest Neighbours Prototype Reduction Data Preprocessing Smart Data Big Data
Online Access:	https://eprints.nottingham.ac.uk/42475/

_version_	1848796495010070528
author	Triguero, Isaac Maillo, Jesus Luengo, Julian García, Salvador Herrera, Francisco
author_facet	Triguero, Isaac Maillo, Jesus Luengo, Julian García, Salvador Herrera, Francisco
author_sort	Triguero, Isaac
building	Nottingham Research Data Repository
collection	Online Access
description	The k-nearest neighbours algorithm is one of the most widely used data mining models because of its simplicity and accurate results. However, when it comes to deal with big datasets, with potentially noisy and missing information, this technique becomes ineffective and inefficient. Due to its drawbacks to tackle large amounts of imperfect data, plenty of research has aimed at improving this algorithm by means of data preprocessing techniques. These weaknesses have turned out as strengths and the k-nearest neighbours rule has become a core model to actually detect and correct imperfect data, eliminating noisy and redundant data, as well as correcting missing values. In this work, we delve into the role of the k nearest neighbour algorithm to come up with smart data from big datasets. We analyse how this model is affected by the big data problem, but at the same time, how it can be used to transform raw data into useful data. Concretely, we discuss the benefits of recent big data technologies (Hadoop and Spark) to enable this model to address large amounts of data, as well as the usefulness of prototype reduction and missing values imputation techniques based on it. As a result, guidelines on the use of the k-nearest neighbour to obtain Smart data are provided and new potential research trends are drawn.
first_indexed	2025-11-14T19:48:53Z
format	Conference or Workshop Item
id	nottingham-42475
institution	University of Nottingham Malaysia Campus
institution_category	Local University
last_indexed	2025-11-14T19:48:53Z
publishDate	2016
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-424752020-05-04T18:25:33Z https://eprints.nottingham.ac.uk/42475/ From Big data to Smart Data with the K-Nearest Neighbours algorithm Triguero, Isaac Maillo, Jesus Luengo, Julian García, Salvador Herrera, Francisco The k-nearest neighbours algorithm is one of the most widely used data mining models because of its simplicity and accurate results. However, when it comes to deal with big datasets, with potentially noisy and missing information, this technique becomes ineffective and inefficient. Due to its drawbacks to tackle large amounts of imperfect data, plenty of research has aimed at improving this algorithm by means of data preprocessing techniques. These weaknesses have turned out as strengths and the k-nearest neighbours rule has become a core model to actually detect and correct imperfect data, eliminating noisy and redundant data, as well as correcting missing values. In this work, we delve into the role of the k nearest neighbour algorithm to come up with smart data from big datasets. We analyse how this model is affected by the big data problem, but at the same time, how it can be used to transform raw data into useful data. Concretely, we discuss the benefits of recent big data technologies (Hadoop and Spark) to enable this model to address large amounts of data, as well as the usefulness of prototype reduction and missing values imputation techniques based on it. As a result, guidelines on the use of the k-nearest neighbour to obtain Smart data are provided and new potential research trends are drawn. 2016-12-16 Conference or Workshop Item PeerReviewed Triguero, Isaac, Maillo, Jesus, Luengo, Julian, García, Salvador and Herrera, Francisco (2016) From Big data to Smart Data with the K-Nearest Neighbours algorithm. In: IEEE International Conference on Smart Data (Smart Data 2016), 16-19 December 2016, Chengdu, China. k-Nearest Neighbours Prototype Reduction Data Preprocessing Smart Data Big Data
spellingShingle	k-Nearest Neighbours Prototype Reduction Data Preprocessing Smart Data Big Data Triguero, Isaac Maillo, Jesus Luengo, Julian García, Salvador Herrera, Francisco From Big data to Smart Data with the K-Nearest Neighbours algorithm
title	From Big data to Smart Data with the K-Nearest Neighbours algorithm
title_full	From Big data to Smart Data with the K-Nearest Neighbours algorithm
title_fullStr	From Big data to Smart Data with the K-Nearest Neighbours algorithm
title_full_unstemmed	From Big data to Smart Data with the K-Nearest Neighbours algorithm
title_short	From Big data to Smart Data with the K-Nearest Neighbours algorithm
title_sort	from big data to smart data with the k-nearest neighbours algorithm
topic	k-Nearest Neighbours Prototype Reduction Data Preprocessing Smart Data Big Data
url	https://eprints.nottingham.ac.uk/42475/

From Big data to Smart Data with the K-Nearest Neighbours algorithm

Similar Items