End-to-end audiovisual speech recognition

QR Code

End-to-end audiovisual speech recognition

Several end-to-end deep learning approaches have been recently presented which extract either audio or visual features from the input images or audio signals and perform speech recognition. However, research on end-to-end audiovisual models is very limited. In this work, we present an end-to-end aud...

Full description

Bibliographic Details
Main Authors:	Petridis, Stavros, Stafylakis, Themos, Ma, Pingchuan, Cai, Feipeng, Tzimiropoulos, Georgios, Pantic, Maja
Format:	Conference or Workshop Item
Language:	English
Published:	2018
Online Access:	https://eprints.nottingham.ac.uk/51132/

Similar Items

Deep word embeddings for visual speech recognition
by: Stafylakis, Themos, et al.
Published: (2018)

Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
by: Eyben, F., et al.
Published: (2011)

Combining residual networks with LSTMs for lipreading
by: Stafylakis, Themos, et al.
Published: (2017)

Principal component analysis of image gradient orientations for face recognition
by: Tzimiropoulos, Georgios, et al.
Published: (2011)

Sparse representations of image gradient orientations for visual recognition and tracking
by: Tzimiropoulos, Georgios, et al.
Published: (2011)

A new penalty term for the BIC with respect to speaker diarization
by: Stafylakis, Themos, et al.
Published: (2010)

Efficient online subspace learning with an indefinite kernel for visual tracking and recognition
by: Liwicki, Stephan, et al.
Published: (2012)

Optimization problems for fast AAM fitting in-the-wild
by: Tzimiropoulos, Georgios, et al.
Published: (2013)

Gauss-Newton Deformable Part Models for face alignment in-the-wild
by: Tzimiropoulos, Georgios, et al.
Published: (2014)

Fast algorithms for fitting active appearance models to unconstrained images
by: Tzimiropoulos, Georgios, et al.
Published: (2016)

Subspace learning from image gradient orientations
by: Tzimiropoulos, Georgios, et al.
Published: (2012)

Fast and exact bi-directional fitting of active appearance models
by: Kossaifi, Jean, et al.
Published: (2015)

Robust and efficient parametric face alignment
by: Tzimiropoulos, Georgios, et al.
Published: (2011)

Subspace analysis of arbitrarily many linear filter responses with an application to face tracking
by: Zafeiriou, Stefanos, et al.
Published: (2011)

Subspace learning from image gradient orientations
by: Tzimiropoulos, Georgios, et al.
Published: (2012)

Fast and exact Newton and bidirectional fitting of Active Appearance Models
by: Kossaifi, Jean, et al.
Published: (2016)

Effect of attentional load on audiovisual speech perception: evidence from ERPs
by: Alsius, Agnes, et al.
Published: (2014)

Comprehensive design and development of time efficiency speaker recognition model from front end to back end
by: Ahmad, Abdul Manan, et al.
Published: (2008)

A semi-automatic methodology for facial landmark annotation
by: Sagonas, Christos, et al.
Published: (2013)

Online learning and fusion of orientation appearance models for robust rigid object tracking
by: Marras, Ioannis, et al.
Published: (2014)

Fast and robust appearance-based tracking
by: Liwicki, Stephan, et al.
Published: (2011)

Euler principal component analysis
by: Liwicki, Stephan, et al.
Published: (2013)

300 Faces in-the-Wild Challenge: the first facial landmark localization challenge
by: Sagonas, Christos, et al.
Published: (2013)

End-to-end object detection with transformers
by: Lai, Eddy Thin Jun
Published: (2024)

Active orientation models for face alignment in-the-wild
by: Tzimiropoulos, Georgios, et al.
Published: (2014)

From pixels to response maps: discriminative image filtering for face alignment in the wild
by: Asthana, Akshay, et al.
Published: (2014)

Generic active appearance models revisited
by: Tzimiropoulos, Georgios, et al.
Published: (2013)

300 faces in-the-wild challenge: database and results
by: Sagonas, Christos, et al.
Published: (2016)

Analysis of end-To-end delay characteristics for various packets in IEC 61850 substation communications system
by: Das, N., et al.
Published: (2015)

The end of the world
Published: (2010)

The end of confrontation
by: Mazlan Nordin,
Published: (2005)

Beginning of the end?
by: Abd Razak, Dzulkifli
Published: (2009)

An end in sight
by: Menon, Sandhya
Published: (2021)

A Cdk1 phosphomimic mutant of MCAK impairs microtubule end recognition
by: Belsham, Hannah R., et al.
Published: (2017)

Online learning and fusion of orientation appearance models for robust rigid object tracking
by: Marras, Ioannis, et al.
Published: (2013)

Comparison Study of Various Factors Affecting End-to-End Delay in IEC 61850 Substation Communications Using OPNET
by: Das, Narottam, et al.
Published: (2012)

CFD analysis on mismatched end-to-end internal diameter of RSVG models
by: Yahya, Muhd Nur Rahman, et al.
Published: (2014)

Writing Cinematic Sound: Audiovisual Phrasing and Screenwriting Practice
by: Strand, Joachim Wichman
Published: (2023)

End of semester examination
by: Ismail, Yusof
Published: (2011)

End of semester examination
by: Ismail, Yusof
Published: (2011)