Novel automated classification approaches for citizen science

Citizen science, traditionally known as the engagement of amateur participants in research, is showing a great potential for large-scale processing of data. In areas such as astronomy, ecology, or geo-sciences, where emerging technologies generate huge volumes of data, citizen science projects enabl...

Full description

Bibliographic Details
Main Author: Jiménez Morales, Manuel Alejandro
Format: Thesis (University of Nottingham only)
Language:English
Published: 2020
Subjects:
Online Access:https://eprints.nottingham.ac.uk/63085/
_version_ 1848799993714966528
author Jiménez Morales, Manuel Alejandro
author_facet Jiménez Morales, Manuel Alejandro
author_sort Jiménez Morales, Manuel Alejandro
building Nottingham Research Data Repository
collection Online Access
description Citizen science, traditionally known as the engagement of amateur participants in research, is showing a great potential for large-scale processing of data. In areas such as astronomy, ecology, or geo-sciences, where emerging technologies generate huge volumes of data, citizen science projects enable image classification at a rate not possible to accomplish by experts alone. Using the power of the web, virtual communities of volunteers sharing a common goal are able to coordinate the classification of hundreds of thousands of images in a reasonable amount of time. However, expert evaluations usually reveal biases and uncertainty in the results, since the participants involved are typically inexperienced in the task and hold variable skills and backgrounds. Consequently, the research community tends to distrust citizen science outcomes, claiming a generalised lack of accuracy and validation, and leaving the major part of the resulting data unemployed after the finalisation of the projects. Citizen science also offers a great amount of labelled data at a reduced cost for the training of machine learning classifiers. Nonetheless, current efforts attempting the exploitation of citizen science outcomes with machine learning tools have ignored the inherent uncertainty in results as well as the potential of expert classifications to ameliorate this issue. The ultimate goal has mainly been to replicate the amateur endeavours, thus propagating their biases and limitations in the automated classification. Similarly, the potential behind the learning from unlabelled data to alleviate this uncertainty has also been disregarded. This framework claims for a solution that can take advantage of all levels of knowledge: expert classifications, citizen science data, and unlabelled data. However, the synergy between these sources of data remains unexplored, waiting for the development of new methodologies that may lead to an enhanced automated classification. This thesis focuses on the development of automated approaches for classification problems aided by citizen science projects on the web, aiming to leverage the inherent uncertainty in the results and all levels of knowledge available about the problem. As a case study, we select the longest running implementation of a scientific problem aided by modern citizen science: the classification of galaxies from images. We exploit the results of the first edition of the Galaxy Zoo, a citizen science project that nowadays represents the largest galaxy image database manually annotated. The research is completed through three progressive stages. First, we introduce a novel multi-stage approach to handle the uncertainty within data labelled in the course of citizen science projects. Our method proposes a set of transformations that leverage the uncertainty in amateur classifications in conjunction with a hybridisation strategy that provides the best aggregation of the transformed data for improving the quality and confidence in the results. The second stage comprises a thorough study of machine learning methods for image classification, introducing the use of autoencoders to learn from unlabelled data, and exploring the learning from amateur and expert classifications by the exploitation of pre-training and fine-tuning of convolutional neural networks. Finally, in the third stage of the research, the previous findings are combined to propose a solution to the novel learning paradigm defined that is able to exploit data either labelled by experts and amateurs in the course of citizen science projects, and unlabelled data. In summary, the research conducted here introduces a set of novel mechanisms towards an improved automated classification based on citizen science data, expert classifications, and raw data. As a result, the proposed method for handling the uncertainty boosts the accuracy and is able to classify a higher number of images in comparison with previous approaches. This is accomplished by taking advantage of the uncertainty measured by participants themselves. The use of autoencoders greatly speeds up feature extraction with respect to state-of-the-art methods, also revealing the potential behind the exploitation of amateur and expert classifications by deep learning-based classifiers. In last place, a novel approach leverages all insights previously found and presents an innovative setting to learn from expert and amateur classifications and unlabelled data that surpasses the performance obtained using such label sets separately or joint. These results have also signified a global study of the automated classification of galaxy images problem that, from state-of-the-art approaches, have contributed new methods built on the boundary amongst citizen science, astroinformatics, and machine learning fields of study.
first_indexed 2025-11-14T20:44:30Z
format Thesis (University of Nottingham only)
id nottingham-63085
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T20:44:30Z
publishDate 2020
recordtype eprints
repository_type Digital Repository
spelling nottingham-630852025-02-28T15:03:52Z https://eprints.nottingham.ac.uk/63085/ Novel automated classification approaches for citizen science Jiménez Morales, Manuel Alejandro Citizen science, traditionally known as the engagement of amateur participants in research, is showing a great potential for large-scale processing of data. In areas such as astronomy, ecology, or geo-sciences, where emerging technologies generate huge volumes of data, citizen science projects enable image classification at a rate not possible to accomplish by experts alone. Using the power of the web, virtual communities of volunteers sharing a common goal are able to coordinate the classification of hundreds of thousands of images in a reasonable amount of time. However, expert evaluations usually reveal biases and uncertainty in the results, since the participants involved are typically inexperienced in the task and hold variable skills and backgrounds. Consequently, the research community tends to distrust citizen science outcomes, claiming a generalised lack of accuracy and validation, and leaving the major part of the resulting data unemployed after the finalisation of the projects. Citizen science also offers a great amount of labelled data at a reduced cost for the training of machine learning classifiers. Nonetheless, current efforts attempting the exploitation of citizen science outcomes with machine learning tools have ignored the inherent uncertainty in results as well as the potential of expert classifications to ameliorate this issue. The ultimate goal has mainly been to replicate the amateur endeavours, thus propagating their biases and limitations in the automated classification. Similarly, the potential behind the learning from unlabelled data to alleviate this uncertainty has also been disregarded. This framework claims for a solution that can take advantage of all levels of knowledge: expert classifications, citizen science data, and unlabelled data. However, the synergy between these sources of data remains unexplored, waiting for the development of new methodologies that may lead to an enhanced automated classification. This thesis focuses on the development of automated approaches for classification problems aided by citizen science projects on the web, aiming to leverage the inherent uncertainty in the results and all levels of knowledge available about the problem. As a case study, we select the longest running implementation of a scientific problem aided by modern citizen science: the classification of galaxies from images. We exploit the results of the first edition of the Galaxy Zoo, a citizen science project that nowadays represents the largest galaxy image database manually annotated. The research is completed through three progressive stages. First, we introduce a novel multi-stage approach to handle the uncertainty within data labelled in the course of citizen science projects. Our method proposes a set of transformations that leverage the uncertainty in amateur classifications in conjunction with a hybridisation strategy that provides the best aggregation of the transformed data for improving the quality and confidence in the results. The second stage comprises a thorough study of machine learning methods for image classification, introducing the use of autoencoders to learn from unlabelled data, and exploring the learning from amateur and expert classifications by the exploitation of pre-training and fine-tuning of convolutional neural networks. Finally, in the third stage of the research, the previous findings are combined to propose a solution to the novel learning paradigm defined that is able to exploit data either labelled by experts and amateurs in the course of citizen science projects, and unlabelled data. In summary, the research conducted here introduces a set of novel mechanisms towards an improved automated classification based on citizen science data, expert classifications, and raw data. As a result, the proposed method for handling the uncertainty boosts the accuracy and is able to classify a higher number of images in comparison with previous approaches. This is accomplished by taking advantage of the uncertainty measured by participants themselves. The use of autoencoders greatly speeds up feature extraction with respect to state-of-the-art methods, also revealing the potential behind the exploitation of amateur and expert classifications by deep learning-based classifiers. In last place, a novel approach leverages all insights previously found and presents an innovative setting to learn from expert and amateur classifications and unlabelled data that surpasses the performance obtained using such label sets separately or joint. These results have also signified a global study of the automated classification of galaxy images problem that, from state-of-the-art approaches, have contributed new methods built on the boundary amongst citizen science, astroinformatics, and machine learning fields of study. 2020-12-11 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en arr https://eprints.nottingham.ac.uk/63085/1/M_Jimenez_Thesis-%28final%29.pdf Jiménez Morales, Manuel Alejandro (2020) Novel automated classification approaches for citizen science. PhD thesis, University of Nottingham. automated classification computer learning citizen science data processing
spellingShingle automated classification
computer learning
citizen science
data processing
Jiménez Morales, Manuel Alejandro
Novel automated classification approaches for citizen science
title Novel automated classification approaches for citizen science
title_full Novel automated classification approaches for citizen science
title_fullStr Novel automated classification approaches for citizen science
title_full_unstemmed Novel automated classification approaches for citizen science
title_short Novel automated classification approaches for citizen science
title_sort novel automated classification approaches for citizen science
topic automated classification
computer learning
citizen science
data processing
url https://eprints.nottingham.ac.uk/63085/