Towards vocal tract MRI synthesis from facial signals using external to internal correlation modelling

Oral health underpins everyday functions such as speech, mastication and swallowing, yet acquiring detailed kinematic data on the vocal tract remains technically and financially demanding. Ultrasound and electromagnetic articulography offer only partial coverage, while Real Time Magnetic Resonance I...

Full description

Bibliographic Details
Main Author: Shahid, Muhammad Suhaib
Format: Thesis (University of Nottingham only)
Language:English
Published: 2025
Subjects:
Online Access:https://eprints.nottingham.ac.uk/81139/
_version_ 1848801296677601280
author Shahid, Muhammad Suhaib
author_facet Shahid, Muhammad Suhaib
author_sort Shahid, Muhammad Suhaib
building Nottingham Research Data Repository
collection Online Access
description Oral health underpins everyday functions such as speech, mastication and swallowing, yet acquiring detailed kinematic data on the vocal tract remains technically and financially demanding. Ultrasound and electromagnetic articulography offer only partial coverage, while Real Time Magnetic Resonance Imaging (RtMRI) data delivers richer information but requires expensive scanners and bespoke acquisition protocols. These constraints limit large-scale studies and the routine use of dynamic vocal-tract models in both research and clinical practice. Motivated by the need for an affordable, non-invasive alternative, this thesis introduces External to Internal Correlation Modelling (E2ICM), a novel framework that learns correlations between external facial signals and internal articulator motion, enabling vocal-tract modelling without direct imaging. The work pursues four objectives: (i) advanced segmentation of RtMRI sequences, (ii) quantification of articulator interdependencies, (iii) prediction of internal motion from purely external observations, and (iv) ethical evaluation of AI-driven approaches in oral healthcare. Both static and temporal segmentation pipelines are developed for RtMRI data. Generative adversarial networks and diffusion models are then employed to synthesise internal views from facial video, addressing data scarcity through tailored augmentation strategies. A thematic analysis of professional interviews highlights concerns around privacy, security and algorithmic bias, informing an ethical framework for clinical deployment. A key contribution is a dual-view dataset comprising synchronised high-resolution RtMRI and external video captured during controlled speech and chewing tasks. Experimental results demonstrate that (E2ICM can predict vocal-tract configurations with promising accuracy while reducing reliance on costly imaging. Improved segmentation techniques and a deeper understanding of articulator dynamics further advance the state of the art in non-invasive oral-movement modelling.
first_indexed 2025-11-14T21:05:12Z
format Thesis (University of Nottingham only)
id nottingham-81139
institution University of Nottingham Malaysia Campus
institution_category Local University
language English
last_indexed 2025-11-14T21:05:12Z
publishDate 2025
recordtype eprints
repository_type Digital Repository
spelling nottingham-811392025-07-30T04:40:21Z https://eprints.nottingham.ac.uk/81139/ Towards vocal tract MRI synthesis from facial signals using external to internal correlation modelling Shahid, Muhammad Suhaib Oral health underpins everyday functions such as speech, mastication and swallowing, yet acquiring detailed kinematic data on the vocal tract remains technically and financially demanding. Ultrasound and electromagnetic articulography offer only partial coverage, while Real Time Magnetic Resonance Imaging (RtMRI) data delivers richer information but requires expensive scanners and bespoke acquisition protocols. These constraints limit large-scale studies and the routine use of dynamic vocal-tract models in both research and clinical practice. Motivated by the need for an affordable, non-invasive alternative, this thesis introduces External to Internal Correlation Modelling (E2ICM), a novel framework that learns correlations between external facial signals and internal articulator motion, enabling vocal-tract modelling without direct imaging. The work pursues four objectives: (i) advanced segmentation of RtMRI sequences, (ii) quantification of articulator interdependencies, (iii) prediction of internal motion from purely external observations, and (iv) ethical evaluation of AI-driven approaches in oral healthcare. Both static and temporal segmentation pipelines are developed for RtMRI data. Generative adversarial networks and diffusion models are then employed to synthesise internal views from facial video, addressing data scarcity through tailored augmentation strategies. A thematic analysis of professional interviews highlights concerns around privacy, security and algorithmic bias, informing an ethical framework for clinical deployment. A key contribution is a dual-view dataset comprising synchronised high-resolution RtMRI and external video captured during controlled speech and chewing tasks. Experimental results demonstrate that (E2ICM can predict vocal-tract configurations with promising accuracy while reducing reliance on costly imaging. Improved segmentation techniques and a deeper understanding of articulator dynamics further advance the state of the art in non-invasive oral-movement modelling. 2025-07-30 Thesis (University of Nottingham only) NonPeerReviewed application/pdf en cc_by https://eprints.nottingham.ac.uk/81139/1/University_of_Nottingham_PhD_Thesis_Suhaib__Final_correction.pdf Shahid, Muhammad Suhaib (2025) Towards vocal tract MRI synthesis from facial signals using external to internal correlation modelling. PhD thesis, University of Nottingham. artificial intelligence ai oral health vocal-tract modelling
spellingShingle artificial intelligence
ai
oral health
vocal-tract modelling
Shahid, Muhammad Suhaib
Towards vocal tract MRI synthesis from facial signals using external to internal correlation modelling
title Towards vocal tract MRI synthesis from facial signals using external to internal correlation modelling
title_full Towards vocal tract MRI synthesis from facial signals using external to internal correlation modelling
title_fullStr Towards vocal tract MRI synthesis from facial signals using external to internal correlation modelling
title_full_unstemmed Towards vocal tract MRI synthesis from facial signals using external to internal correlation modelling
title_short Towards vocal tract MRI synthesis from facial signals using external to internal correlation modelling
title_sort towards vocal tract mri synthesis from facial signals using external to internal correlation modelling
topic artificial intelligence
ai
oral health
vocal-tract modelling
url https://eprints.nottingham.ac.uk/81139/