Learning to rank salient objects using transformers and graph reasoning

This thesis explores the domain of salient object detection, aiming to find the most visually important objects within a given image. Many of the current approaches have focused on datasets with many images containing only a single salient object located towards the center. We focus here on the more...

Full description

Bibliographic Details
Main Author: Bowen, Deng
Format: Thesis (University of Nottingham only)
Language:English
Published: 2025
Subjects:
Online Access:https://eprints.nottingham.ac.uk/77925/
_version_ 1848801214104338432
author Bowen, Deng
author_facet Bowen, Deng
author_sort Bowen, Deng
building Nottingham Research Data Repository
collection Online Access
description This thesis explores the domain of salient object detection, aiming to find the most visually important objects within a given image. Many of the current approaches have focused on datasets with many images containing only a single salient object located towards the center. We focus here on the more complex task of images containing multiple objects, where relative saliency between objects must also be evaluated. A novel multiple salient object detection framework is proposed, utilizing both spatial and channel-wise non-local blocks within a convolutional network. The experiments compare the approach against 14 state-of-the-art methods on five widely used SOD benchmarks and a newly curated multi-object dataset. The proposed method exceeds all previous state-of-the-art approaches in three evaluation metrics and provides a further performance boost against competing techniques on the proposed dataset. We then build upon this work to investigate the multiple salient object detection task in greater depth, exploring the problem of instance-level relative saliency ranking. This is an emerging field, and considering the lack of appropriate datasets in this domain, we produce a large-scale instance-level relative saliency ranking dataset using real human fixations. To the best of our knowledge, this is the first and largest dataset created by real human fixations for relative saliency ranking. A novel framework is then introduced that models multi-scale ranking-aware information cues in a nested style graph, drawing features from a query-based transformer. Experimental findings demonstrate the effectiveness of this proposed method. We exceed all previous state-of-the-art approaches with a large margin under three evaluation metrics. The model and full dataset will be released into the community.
first_indexed 2025-11-14T21:03:53Z
format Thesis (University of Nottingham only)
id nottingham-77925
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T21:03:53Z
publishDate 2025
recordtype eprints
repository_type Digital Repository
spelling nottingham-779252025-07-24T04:30:10Z https://eprints.nottingham.ac.uk/77925/ Learning to rank salient objects using transformers and graph reasoning Bowen, Deng This thesis explores the domain of salient object detection, aiming to find the most visually important objects within a given image. Many of the current approaches have focused on datasets with many images containing only a single salient object located towards the center. We focus here on the more complex task of images containing multiple objects, where relative saliency between objects must also be evaluated. A novel multiple salient object detection framework is proposed, utilizing both spatial and channel-wise non-local blocks within a convolutional network. The experiments compare the approach against 14 state-of-the-art methods on five widely used SOD benchmarks and a newly curated multi-object dataset. The proposed method exceeds all previous state-of-the-art approaches in three evaluation metrics and provides a further performance boost against competing techniques on the proposed dataset. We then build upon this work to investigate the multiple salient object detection task in greater depth, exploring the problem of instance-level relative saliency ranking. This is an emerging field, and considering the lack of appropriate datasets in this domain, we produce a large-scale instance-level relative saliency ranking dataset using real human fixations. To the best of our knowledge, this is the first and largest dataset created by real human fixations for relative saliency ranking. A novel framework is then introduced that models multi-scale ranking-aware information cues in a nested style graph, drawing features from a query-based transformer. Experimental findings demonstrate the effectiveness of this proposed method. We exceed all previous state-of-the-art approaches with a large margin under three evaluation metrics. The model and full dataset will be released into the community. 2025-07-23 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en cc_by https://eprints.nottingham.ac.uk/77925/1/Thesis_corrected_final.pdf Bowen, Deng (2025) Learning to rank salient objects using transformers and graph reasoning. PhD thesis, University of Nottingham. saliency salient object detection saliency ranking multiple salient object detection transformers graph neural networks
spellingShingle saliency
salient object detection
saliency ranking
multiple salient object detection
transformers
graph neural networks
Bowen, Deng
Learning to rank salient objects using transformers and graph reasoning
title Learning to rank salient objects using transformers and graph reasoning
title_full Learning to rank salient objects using transformers and graph reasoning
title_fullStr Learning to rank salient objects using transformers and graph reasoning
title_full_unstemmed Learning to rank salient objects using transformers and graph reasoning
title_short Learning to rank salient objects using transformers and graph reasoning
title_sort learning to rank salient objects using transformers and graph reasoning
topic saliency
salient object detection
saliency ranking
multiple salient object detection
transformers
graph neural networks
url https://eprints.nottingham.ac.uk/77925/