Multivariate Image Processing in Minerals Engineering with Vision Transformers

Vision transformers (ViTs) are a new class of deep learning algorithms that have recently emerged as a competitive alternative to convolutional neural networks. In this investigation, their application to two operations previously studied in the mineral processing industry is considered. These are i...

Full description

Bibliographic Details
Main Authors: Liu, Xiu, Aldrich, Chris
Format: Journal Article
Published: Elsevier 2024
Online Access:http://purl.org/au-research/grants/arc/CE200100009
http://hdl.handle.net/20.500.11937/94374
_version_ 1848765863391395840
author Liu, Xiu
Aldrich, Chris
author_facet Liu, Xiu
Aldrich, Chris
author_sort Liu, Xiu
building Curtin Institutional Repository
collection Online Access
description Vision transformers (ViTs) are a new class of deep learning algorithms that have recently emerged as a competitive alternative to convolutional neural networks. In this investigation, their application to two operations previously studied in the mineral processing industry is considered. These are image recognition of fines in coal particles on conveyor belts and characterisation of the particle size in the underflow of a hydrocyclone. Promising results were achieved by use of vision transformers, as they performed as well as, or better than convolutional neural networks in these image recognition problems. In addition, features extracted from the best ViT model could be used to visualise its performance and these features could also serve as a basis for nonlinear process monitoring models. Furthermore, explainability techniques such as attention maps for ViTs were implemented to better understand the ViT models, similar to techniques such as occlusion sensitivity maps used with convolutional neural networks.
first_indexed 2025-11-14T11:42:00Z
format Journal Article
id curtin-20.500.11937-94374
institution Curtin University Malaysia
institution_category Local University
last_indexed 2025-11-14T11:42:00Z
publishDate 2024
publisher Elsevier
recordtype eprints
repository_type Digital Repository
spelling curtin-20.500.11937-943742024-04-04T06:01:21Z Multivariate Image Processing in Minerals Engineering with Vision Transformers Liu, Xiu Aldrich, Chris Vision transformers (ViTs) are a new class of deep learning algorithms that have recently emerged as a competitive alternative to convolutional neural networks. In this investigation, their application to two operations previously studied in the mineral processing industry is considered. These are image recognition of fines in coal particles on conveyor belts and characterisation of the particle size in the underflow of a hydrocyclone. Promising results were achieved by use of vision transformers, as they performed as well as, or better than convolutional neural networks in these image recognition problems. In addition, features extracted from the best ViT model could be used to visualise its performance and these features could also serve as a basis for nonlinear process monitoring models. Furthermore, explainability techniques such as attention maps for ViTs were implemented to better understand the ViT models, similar to techniques such as occlusion sensitivity maps used with convolutional neural networks. 2024 Journal Article http://hdl.handle.net/20.500.11937/94374 10.1016/j.mineng.2024.108599 http://purl.org/au-research/grants/arc/CE200100009 http://creativecommons.org/licenses/by/4.0/ Elsevier fulltext
spellingShingle Liu, Xiu
Aldrich, Chris
Multivariate Image Processing in Minerals Engineering with Vision Transformers
title Multivariate Image Processing in Minerals Engineering with Vision Transformers
title_full Multivariate Image Processing in Minerals Engineering with Vision Transformers
title_fullStr Multivariate Image Processing in Minerals Engineering with Vision Transformers
title_full_unstemmed Multivariate Image Processing in Minerals Engineering with Vision Transformers
title_short Multivariate Image Processing in Minerals Engineering with Vision Transformers
title_sort multivariate image processing in minerals engineering with vision transformers
url http://purl.org/au-research/grants/arc/CE200100009
http://hdl.handle.net/20.500.11937/94374