Impacts of sample design for validation data on the accuracy of feedforward neural network classification

Validation data are often used to evaluate the performance of a trained neural network and used in the selection of a network deemed optimal for the task at-hand. Optimality is commonly assessed with a measure, such as overall classification accuracy. The latter is often calculated directly from a c...

Full description

Bibliographic Details
Main Author: Foody, Giles
Format: Article
Published: MDPI 2017
Subjects:
Online Access:https://eprints.nottingham.ac.uk/45703/
_version_ 1848797177511411712
author Foody, Giles
author_facet Foody, Giles
author_sort Foody, Giles
building Nottingham Research Data Repository
collection Online Access
description Validation data are often used to evaluate the performance of a trained neural network and used in the selection of a network deemed optimal for the task at-hand. Optimality is commonly assessed with a measure, such as overall classification accuracy. The latter is often calculated directly from a confusion matrix showing the counts of cases in the validation set with particular labelling properties. The sample design used to form the validation set can, however, influence the estimated magnitude of the accuracy. Commonly, the validation set is formed with a stratified sample to give balanced classes, but also via random sampling, which reflects class abundance. It is suggested that if the ultimate aim is to accurately classify a dataset in which the classes do vary in abundance, a validation set formed via random, rather than stratified, sampling is preferred. This is illustrated with the classification of simulated and remotely-sensed datasets. With both datasets, statistically significant differences in the accuracy with which the data could be classified arose from the use of validation sets formed via random and stratified sampling (z = 2.7 and 1.9 for the simulated and real datasets respectively, for both p < 0.05%). The accuracy of the classifications that used a stratified sample in validation were smaller, a result of cases of an abundant class being commissioned into a rarer class. Simple means to address the issue are suggested.
first_indexed 2025-11-14T19:59:44Z
format Article
id nottingham-45703
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T19:59:44Z
publishDate 2017
publisher MDPI
recordtype eprints
repository_type Digital Repository
spelling nottingham-457032020-05-04T19:02:44Z https://eprints.nottingham.ac.uk/45703/ Impacts of sample design for validation data on the accuracy of feedforward neural network classification Foody, Giles Validation data are often used to evaluate the performance of a trained neural network and used in the selection of a network deemed optimal for the task at-hand. Optimality is commonly assessed with a measure, such as overall classification accuracy. The latter is often calculated directly from a confusion matrix showing the counts of cases in the validation set with particular labelling properties. The sample design used to form the validation set can, however, influence the estimated magnitude of the accuracy. Commonly, the validation set is formed with a stratified sample to give balanced classes, but also via random sampling, which reflects class abundance. It is suggested that if the ultimate aim is to accurately classify a dataset in which the classes do vary in abundance, a validation set formed via random, rather than stratified, sampling is preferred. This is illustrated with the classification of simulated and remotely-sensed datasets. With both datasets, statistically significant differences in the accuracy with which the data could be classified arose from the use of validation sets formed via random and stratified sampling (z = 2.7 and 1.9 for the simulated and real datasets respectively, for both p < 0.05%). The accuracy of the classifications that used a stratified sample in validation were smaller, a result of cases of an abundant class being commissioned into a rarer class. Simple means to address the issue are suggested. MDPI 2017-08-30 Article PeerReviewed Foody, Giles (2017) Impacts of sample design for validation data on the accuracy of feedforward neural network classification. Applied Sciences, 7 (9). 888/1-888/15. ISSN 2076-3417 cross-validation; multi-layer perceptron; remote sensing; classification error; sample design; machine learning http://www.mdpi.com/2076-3417/7/9/888 doi:10.3390/app7090888 doi:10.3390/app7090888
spellingShingle cross-validation; multi-layer perceptron; remote sensing; classification error; sample design; machine learning
Foody, Giles
Impacts of sample design for validation data on the accuracy of feedforward neural network classification
title Impacts of sample design for validation data on the accuracy of feedforward neural network classification
title_full Impacts of sample design for validation data on the accuracy of feedforward neural network classification
title_fullStr Impacts of sample design for validation data on the accuracy of feedforward neural network classification
title_full_unstemmed Impacts of sample design for validation data on the accuracy of feedforward neural network classification
title_short Impacts of sample design for validation data on the accuracy of feedforward neural network classification
title_sort impacts of sample design for validation data on the accuracy of feedforward neural network classification
topic cross-validation; multi-layer perceptron; remote sensing; classification error; sample design; machine learning
url https://eprints.nottingham.ac.uk/45703/
https://eprints.nottingham.ac.uk/45703/
https://eprints.nottingham.ac.uk/45703/