Combining residual networks with LSTMs for lipreading

QR Code

Combining residual networks with LSTMs for lipreading

We propose an end-to-end deep learning architecture for word level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We trained and evaluated it on the Lipreading In-The-Wild benchmark, a challenging da...

Full description

Bibliographic Details
Main Authors:	Stafylakis, Themos, Tzimiropoulos, Georgios
Format:	Conference or Workshop Item
Published:	2017
Subjects:	visual speech recognition lipreading deep learning
Online Access:	https://eprints.nottingham.ac.uk/44756/

Similar Items

Deep word embeddings for visual speech recognition
by: Stafylakis, Themos, et al.
Published: (2018)

Speaker discriminability for visual speech modes
by: Kim, J., et al.
Published: (2009)

A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities
by: Wong, Y.W., et al.
Published: (2011)

Motion Path Generation Using A Modified 6th Order Polynomial Function for Visual Speech Synthesis
by: Salleh, Siti Salwa
Published: (2008)

Figurative language detection using deep and contextual features
by: Razali, Md Saifullah
Published: (2023)

A new penalty term for the BIC with respect to speaker diarization
by: Stafylakis, Themos, et al.
Published: (2010)

A semi-automatic methodology for facial landmark annotation
by: Sagonas, Christos, et al.
Published: (2013)

A multi-filter system for speech enhancement under low signal-to-noise ratios
by: Yiu, Ka Fai, et al.
Published: (2009)

Web application for lipreading
by: Lau, Yee Lin
Published: (2024)

Self-supervised learning for automatic speech recognition In low-resource environments
by: Fatehi, Kavan
Published: (2024)

Machine learning for neural coding of sound envelopes: slithering from sinusoids to speech
by: Levy, Alban Hugo
Published: (2018)

Multichannel filters for speech recognition using a particle swarm optimization
by: Chan, Kit Yan, et al.
Published: (2012)

Development of an Isolated Digit Speech Recognition Based on Multilayer Perceptron Model
by: Mohamad Hussin, Ummu Salmah
Published: (2004)

Deep Learning Using Tiny Domain-Specific Datasets with Sparse Labels
by: Smith, Thomas J
Published: (2021)

Automatic speech recognition predicts speech intelligibility and comprehension for listeners with simulated age-related hearing loss
by: Fontan, Lionel, et al.
Published: (2017)

An investigation of deep learning for image processing applications
by: Hou, Xianxu
Published: (2018)

Speech recognition enhancement using beamforming and a genetic algorithm
by: Chan, Kit Yan, et al.
Published: (2009)

Deep learning approach with image noise reduction to determine planting density and defected paddy seedlings
by: Mohamed Anuar, Mohamed Marzhar
Published: (2022)

Deep tissue analysis: advancing optical techniques with interpretable deep learning and aberration correction
by: Kok, Yong En
Published: (2025)

Improving understanding of EEG measurements using transparent machine learning models
by: Roadknight, Chris, et al.
Published: (2019)

Deep machine learning provides state-of-the-art performance in image-based plant phenotyping
by: Pound, Michael P., et al.
Published: (2017)

A hybrid noise suppression filter for accuracy enhancement of commercial speech recognizers in varying noisy conditions
by: Chan, Kit, et al.
Published: (2014)

Enhancement of speech recognitions for control automation using an intelligent particle swarm optimization
by: Chan, Kit Yan, et al.
Published: (2012)

EEG activity evoked in preparation for multi-talker listening by adults and children
by: Holmes, Emma, et al.
Published: (2016)

Sensor fusion of motion-based sign language interpretation with deep learning
by: Lee, Boon Giin, et al.
Published: (2020)

The Cosmic Evolution of Galaxy Structure and Morphology at 0.5 < z < 8
by: Ferreira, Leonardo
Published: (2023)

End-to-end DVB-S2X system design with deep learning-based channel estimation over satellite fading channels
by: Mfarej, Sumaya Dhari Awad
Published: (2021)

Efficient online subspace learning with an indefinite kernel for visual tracking and recognition
by: Liwicki, Stephan, et al.
Published: (2012)

A hybrid medical text classification framework: integrating attentive rule construction and neural network
by: Li, Xiang, et al.
Published: (2021)

Principal component analysis of image gradient orientations for face recognition
by: Tzimiropoulos, Georgios, et al.
Published: (2011)

An affine invariant function using PCA bases with an application to within-class object recognition
by: Tzimiropoulos, Georgios, et al.
Published: (2007)

Instance segmentation of front doors in mobile mapping system images
by: Klimkowska, Anna Maria
Published: (2020)

An investigation into image-based indoor localization using deep learning
by: Li, Qing
Published: (2020)

Synthetic data driven deep learning for plant phenotyping
by: Hartley, Zane K.J.
Published: (2024)

Multitasking deep neural network models for Arabic dialect sentiment analysis
by: Alali, Muath Mohammad Oqlah
Published: (2022)

Input matters: speed of word recognition in 2-year-olds exposed to multiple accents
by: Buckler, Helen, et al.
Published: (2017)

Deep learning models of biological visual information processing
by: Turcsány, Diána
Published: (2016)

Large-scale detection, mapping, and initial health assessment of date palm trees using multiplatform remotely-sensed data and deep learning techniques
by: Gibril, Mohamed Barakat Abdelfatah
Published: (2023)

Portable form filling assistant for the visually impaired
by: Peng, En, et al.
Published: (2010)

Robust recognition of planar shapes under affine transforms using principal component analysis
by: Tzimiropoulos, Georgios, et al.
Published: (2007)