Examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach

Deep learning's recent rapid development has prompted scientists to investigate a wide range of complex data problems. Among them, automated image captioning has increasingly drawn the attention of many researchers due to its challenging but intriguing architecture that involves a combination o...

Full description

Bibliographic Details
Main Author: Nguyen, Anh
Format: Dissertation (University of Nottingham only)
Language:English
Published: 2020
Subjects:
Online Access:https://eprints.nottingham.ac.uk/66357/
_version_ 1848800320674594816
author Nguyen, Anh
author_facet Nguyen, Anh
author_sort Nguyen, Anh
building Nottingham Research Data Repository
collection Online Access
description Deep learning's recent rapid development has prompted scientists to investigate a wide range of complex data problems. Among them, automated image captioning has increasingly drawn the attention of many researchers due to its challenging but intriguing architecture that involves a combination of image processing and text analytics. It has also attracted investment from businesses thanks to several practical applications, including image retrieval, impaired vision support, product tagging and automatic drive. This dissertation investigates the application of a stratified sample split in evaluating the performance of the automated caption model. Despite its popularity in machine learning, stratification has not yet been directly applied in previous works of image captioning, as (1) researchers often utilise pre-defined sample split from data providers, (2) image and annotation are unstructured data that require more novel methodology in clustering versus structured data. By applying topic modelling to images' annotations, this dissertation validated the positive impact of stratified sampling towards prediction results compared to the usage of a simple random split. The findings also specified the sample size territory where this strategy delivered the best performance and unveiled the reason behind this phenomenon. Finally, the study provided a more comprehensive understanding of the problem with insights on the behaviours of different support techniques in topic modelling and image encoding.
first_indexed 2025-11-14T20:49:41Z
format Dissertation (University of Nottingham only)
id nottingham-66357
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T20:49:41Z
publishDate 2020
recordtype eprints
repository_type Digital Repository
spelling nottingham-663572023-04-20T08:42:40Z https://eprints.nottingham.ac.uk/66357/ Examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach Nguyen, Anh Deep learning's recent rapid development has prompted scientists to investigate a wide range of complex data problems. Among them, automated image captioning has increasingly drawn the attention of many researchers due to its challenging but intriguing architecture that involves a combination of image processing and text analytics. It has also attracted investment from businesses thanks to several practical applications, including image retrieval, impaired vision support, product tagging and automatic drive. This dissertation investigates the application of a stratified sample split in evaluating the performance of the automated caption model. Despite its popularity in machine learning, stratification has not yet been directly applied in previous works of image captioning, as (1) researchers often utilise pre-defined sample split from data providers, (2) image and annotation are unstructured data that require more novel methodology in clustering versus structured data. By applying topic modelling to images' annotations, this dissertation validated the positive impact of stratified sampling towards prediction results compared to the usage of a simple random split. The findings also specified the sample size territory where this strategy delivered the best performance and unveiled the reason behind this phenomenon. Finally, the study provided a more comprehensive understanding of the problem with insights on the behaviours of different support techniques in topic modelling and image encoding. 2020-12-01 Dissertation (University of Nottingham only) NonPeerReviewed application/pdf en https://eprints.nottingham.ac.uk/66357/1/20243144_BUSI4374_Dissertation.pdf Nguyen, Anh (2020) Examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach. [Dissertation (University of Nottingham only)] automated image caption deep learning topic modelling stratified sampling convolutional neural network
spellingShingle automated image caption
deep learning
topic modelling
stratified sampling
convolutional neural network
Nguyen, Anh
Examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach
title Examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach
title_full Examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach
title_fullStr Examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach
title_full_unstemmed Examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach
title_short Examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach
title_sort examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach
topic automated image caption
deep learning
topic modelling
stratified sampling
convolutional neural network
url https://eprints.nottingham.ac.uk/66357/