Visual Semantic Context-aware Attention-based Dialog Model

Visual dialogue dataset, i.e. VisDial v1.0 includes a wide range of Microsoft Common Objects in Context (MSCOCO) image contents and collected questions via a crowdsourcing marketplace platform (i.e. Amazon Mechanical Turk). The use of existing question history and images no longer contributes to...

Full description

Bibliographic Details
Main Author: Eugene, Tan Boon Hong
Format: Thesis
Language:English
Published: 2024
Subjects:
Online Access:http://eprints.usm.my/61954/
http://eprints.usm.my/61954/1/EUGENE%20TAN%20BOON%20HONG%20-%20TESIS24.pdf
_version_ 1848884849399562240
author Eugene, Tan Boon Hong
author_facet Eugene, Tan Boon Hong
author_sort Eugene, Tan Boon Hong
building USM Institutional Repository
collection Online Access
description Visual dialogue dataset, i.e. VisDial v1.0 includes a wide range of Microsoft Common Objects in Context (MSCOCO) image contents and collected questions via a crowdsourcing marketplace platform (i.e. Amazon Mechanical Turk). The use of existing question history and images no longer contributes to a better understanding of the image context as they do not cover the entire image semantic context. This research proposes the DsDial dataset, which is a context-aware visual dialogue that groups all relevant dialogue histories extracted based on their respective MSCOCO image categories. This research also exploits the overlapping visual context between images via adaptive relevant dialogue history selection during new dataset generation based on the groups of all relevant dialogue histories. It is half of 2.6 million question-answer pairs. Meanwhile, this research proposes Diverse History-Dialog (DS-Dialog) to resolve the missing visual semantic information for each image via context-aware visual attention. The context-aware visual attention includes the question-guided and relevant-dialoguehistory- guided visual attention modules to get the relevant visual context when both have achieved great confidence. The qualitative and quantitative experimental results on the VisDial v1.0 and DsDial datasets demonstrate that the proposed DS-Dialog not only outperforms the existing methods, but also achieves a competitive results by contributing to a better visual semantic extraction. DsDial dataset has proven its significance on LF model as compared to VisDal v1.0. Overall quantitative results show that DS-Dialog with DsDial dataset has achieved the best test scores for recall@1, recall@5, recall@10, mean rank, MRR, and NDCG respectively.
first_indexed 2025-11-15T19:13:14Z
format Thesis
id usm-61954
institution Universiti Sains Malaysia
institution_category Local University
language English
last_indexed 2025-11-15T19:13:14Z
publishDate 2024
recordtype eprints
repository_type Digital Repository
spelling usm-619542025-03-03T01:53:23Z http://eprints.usm.my/61954/ Visual Semantic Context-aware Attention-based Dialog Model Eugene, Tan Boon Hong T1-995 Technology(General) Visual dialogue dataset, i.e. VisDial v1.0 includes a wide range of Microsoft Common Objects in Context (MSCOCO) image contents and collected questions via a crowdsourcing marketplace platform (i.e. Amazon Mechanical Turk). The use of existing question history and images no longer contributes to a better understanding of the image context as they do not cover the entire image semantic context. This research proposes the DsDial dataset, which is a context-aware visual dialogue that groups all relevant dialogue histories extracted based on their respective MSCOCO image categories. This research also exploits the overlapping visual context between images via adaptive relevant dialogue history selection during new dataset generation based on the groups of all relevant dialogue histories. It is half of 2.6 million question-answer pairs. Meanwhile, this research proposes Diverse History-Dialog (DS-Dialog) to resolve the missing visual semantic information for each image via context-aware visual attention. The context-aware visual attention includes the question-guided and relevant-dialoguehistory- guided visual attention modules to get the relevant visual context when both have achieved great confidence. The qualitative and quantitative experimental results on the VisDial v1.0 and DsDial datasets demonstrate that the proposed DS-Dialog not only outperforms the existing methods, but also achieves a competitive results by contributing to a better visual semantic extraction. DsDial dataset has proven its significance on LF model as compared to VisDal v1.0. Overall quantitative results show that DS-Dialog with DsDial dataset has achieved the best test scores for recall@1, recall@5, recall@10, mean rank, MRR, and NDCG respectively. 2024-09 Thesis NonPeerReviewed application/pdf en http://eprints.usm.my/61954/1/EUGENE%20TAN%20BOON%20HONG%20-%20TESIS24.pdf Eugene, Tan Boon Hong (2024) Visual Semantic Context-aware Attention-based Dialog Model. PhD thesis, Universiti Sains Malaysia.
spellingShingle T1-995 Technology(General)
Eugene, Tan Boon Hong
Visual Semantic Context-aware Attention-based Dialog Model
title Visual Semantic Context-aware Attention-based Dialog Model
title_full Visual Semantic Context-aware Attention-based Dialog Model
title_fullStr Visual Semantic Context-aware Attention-based Dialog Model
title_full_unstemmed Visual Semantic Context-aware Attention-based Dialog Model
title_short Visual Semantic Context-aware Attention-based Dialog Model
title_sort visual semantic context-aware attention-based dialog model
topic T1-995 Technology(General)
url http://eprints.usm.my/61954/
http://eprints.usm.my/61954/1/EUGENE%20TAN%20BOON%20HONG%20-%20TESIS24.pdf