Exact fuzzy k-Nearest neighbor classification for big datasets

The k-Nearest Neighbors (kNN) classifier is one of the most effective methods in supervised learning problems. It classifies unseen cases comparing their similarity with the training data. Nevertheless, it gives to each labeled sample the same importance to classify. There are several approaches to...

Full description

Bibliographic Details
Main Authors:	Maillo, Jesus, Luengo, Julian, García, Salvador, Herrera, Francisco, Triguero, Isaac
Format:	Conference or Workshop Item
Published:	2017
Online Access:	https://eprints.nottingham.ac.uk/44937/

_version_	1848797032123203584
author	Maillo, Jesus Luengo, Julian García, Salvador Herrera, Francisco Triguero, Isaac
author_facet	Maillo, Jesus Luengo, Julian García, Salvador Herrera, Francisco Triguero, Isaac
author_sort	Maillo, Jesus
building	Nottingham Research Data Repository
collection	Online Access
description	The k-Nearest Neighbors (kNN) classifier is one of the most effective methods in supervised learning problems. It classifies unseen cases comparing their similarity with the training data. Nevertheless, it gives to each labeled sample the same importance to classify. There are several approaches to enhance its precision, with the Fuzzy k Nearest Neighbors (FuzzykNN) classifier being among the most successful ones. FuzzykNN computes a fuzzy degree of membership of each instance to the classes of the problem. As a result, it generates smoother borders between classes. Apart from the existing kNN approach to handle big datasets, there is not a fuzzy variant to manage that volume of data. Nevertheless, calculating this class membership adds an extra computational cost becoming even less scalable to tackle large datasets because of memory needs and high runtime. In this work, we present an exact and distributed approach to run the Fuzzy-kNN classifier on big datasets based on Spark, which provides the same precision than the original algorithm. It presents two separately stages. The first stage transforms the training set adding the class membership degrees. The second stage classifies with the kNN algorithm the test set using the class membership computed previously. In our experiments, we study the scaling-up capabilities of the proposed approach with datasets up to 11 million instances, showing promising results.
first_indexed	2025-11-14T19:57:25Z
format	Conference or Workshop Item
id	nottingham-44937
institution	University of Nottingham Malaysia Campus
institution_category	Local University
last_indexed	2025-11-14T19:57:25Z
publishDate	2017
recordtype	eprints
repository_type	Digital Repository
spelling	nottingham-449372020-05-04T18:55:06Z https://eprints.nottingham.ac.uk/44937/ Exact fuzzy k-Nearest neighbor classification for big datasets Maillo, Jesus Luengo, Julian García, Salvador Herrera, Francisco Triguero, Isaac The k-Nearest Neighbors (kNN) classifier is one of the most effective methods in supervised learning problems. It classifies unseen cases comparing their similarity with the training data. Nevertheless, it gives to each labeled sample the same importance to classify. There are several approaches to enhance its precision, with the Fuzzy k Nearest Neighbors (FuzzykNN) classifier being among the most successful ones. FuzzykNN computes a fuzzy degree of membership of each instance to the classes of the problem. As a result, it generates smoother borders between classes. Apart from the existing kNN approach to handle big datasets, there is not a fuzzy variant to manage that volume of data. Nevertheless, calculating this class membership adds an extra computational cost becoming even less scalable to tackle large datasets because of memory needs and high runtime. In this work, we present an exact and distributed approach to run the Fuzzy-kNN classifier on big datasets based on Spark, which provides the same precision than the original algorithm. It presents two separately stages. The first stage transforms the training set adding the class membership degrees. The second stage classifies with the kNN algorithm the test set using the class membership computed previously. In our experiments, we study the scaling-up capabilities of the proposed approach with datasets up to 11 million instances, showing promising results. 2017-07-10 Conference or Workshop Item PeerReviewed Maillo, Jesus, Luengo, Julian, García, Salvador, Herrera, Francisco and Triguero, Isaac (2017) Exact fuzzy k-Nearest neighbor classification for big datasets. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2017), 9-12 Jul 2017, Naples, Italy.
spellingShingle	Maillo, Jesus Luengo, Julian García, Salvador Herrera, Francisco Triguero, Isaac Exact fuzzy k-Nearest neighbor classification for big datasets
title	Exact fuzzy k-Nearest neighbor classification for big datasets
title_full	Exact fuzzy k-Nearest neighbor classification for big datasets
title_fullStr	Exact fuzzy k-Nearest neighbor classification for big datasets
title_full_unstemmed	Exact fuzzy k-Nearest neighbor classification for big datasets
title_short	Exact fuzzy k-Nearest neighbor classification for big datasets
title_sort	exact fuzzy k-nearest neighbor classification for big datasets
url	https://eprints.nottingham.ac.uk/44937/

Exact fuzzy k-Nearest neighbor classification for big datasets

Similar Items