End-to-end audiovisual speech recognition
Several end-to-end deep learning approaches have been recently presented which extract either audio or visual features from the input images or audio signals and perform speech recognition. However, research on end-to-end audiovisual models is very limited. In this work, we present an end-to-end aud...
| Main Authors: | Petridis, Stavros, Stafylakis, Themos, Ma, Pingchuan, Cai, Feipeng, Tzimiropoulos, Georgios, Pantic, Maja |
|---|---|
| Format: | Conference or Workshop Item |
| Language: | English |
| Published: |
2018
|
| Online Access: | https://eprints.nottingham.ac.uk/51132/ |
Similar Items
Deep word embeddings for visual speech recognition
by: Stafylakis, Themos, et al.
Published: (2018)
by: Stafylakis, Themos, et al.
Published: (2018)
Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
by: Eyben, F., et al.
Published: (2011)
by: Eyben, F., et al.
Published: (2011)
Combining residual networks with LSTMs for lipreading
by: Stafylakis, Themos, et al.
Published: (2017)
by: Stafylakis, Themos, et al.
Published: (2017)
Principal component analysis of image gradient orientations for face recognition
by: Tzimiropoulos, Georgios, et al.
Published: (2011)
by: Tzimiropoulos, Georgios, et al.
Published: (2011)
Sparse representations of image gradient orientations for visual recognition and tracking
by: Tzimiropoulos, Georgios, et al.
Published: (2011)
by: Tzimiropoulos, Georgios, et al.
Published: (2011)
A new penalty term for the BIC with respect to speaker diarization
by: Stafylakis, Themos, et al.
Published: (2010)
by: Stafylakis, Themos, et al.
Published: (2010)
Efficient online subspace learning with an indefinite kernel for visual tracking and recognition
by: Liwicki, Stephan, et al.
Published: (2012)
by: Liwicki, Stephan, et al.
Published: (2012)
Optimization problems for fast AAM fitting in-the-wild
by: Tzimiropoulos, Georgios, et al.
Published: (2013)
by: Tzimiropoulos, Georgios, et al.
Published: (2013)
Gauss-Newton Deformable Part Models for face alignment in-the-wild
by: Tzimiropoulos, Georgios, et al.
Published: (2014)
by: Tzimiropoulos, Georgios, et al.
Published: (2014)
Fast algorithms for fitting active appearance models to unconstrained images
by: Tzimiropoulos, Georgios, et al.
Published: (2016)
by: Tzimiropoulos, Georgios, et al.
Published: (2016)
Subspace learning from image gradient orientations
by: Tzimiropoulos, Georgios, et al.
Published: (2012)
by: Tzimiropoulos, Georgios, et al.
Published: (2012)
Fast and exact bi-directional fitting of active appearance models
by: Kossaifi, Jean, et al.
Published: (2015)
by: Kossaifi, Jean, et al.
Published: (2015)
Robust and efficient parametric face alignment
by: Tzimiropoulos, Georgios, et al.
Published: (2011)
by: Tzimiropoulos, Georgios, et al.
Published: (2011)
Subspace analysis of arbitrarily many linear filter responses with an application to face tracking
by: Zafeiriou, Stefanos, et al.
Published: (2011)
by: Zafeiriou, Stefanos, et al.
Published: (2011)
Subspace learning from image gradient orientations
by: Tzimiropoulos, Georgios, et al.
Published: (2012)
by: Tzimiropoulos, Georgios, et al.
Published: (2012)
Fast and exact Newton and bidirectional fitting of Active Appearance Models
by: Kossaifi, Jean, et al.
Published: (2016)
by: Kossaifi, Jean, et al.
Published: (2016)
Effect of attentional load on audiovisual speech perception: evidence from ERPs
by: Alsius, Agnes, et al.
Published: (2014)
by: Alsius, Agnes, et al.
Published: (2014)
A semi-automatic methodology for facial landmark annotation
by: Sagonas, Christos, et al.
Published: (2013)
by: Sagonas, Christos, et al.
Published: (2013)
Online learning and fusion of orientation appearance models for robust rigid object tracking
by: Marras, Ioannis, et al.
Published: (2014)
by: Marras, Ioannis, et al.
Published: (2014)
Fast and robust appearance-based tracking
by: Liwicki, Stephan, et al.
Published: (2011)
by: Liwicki, Stephan, et al.
Published: (2011)
Euler principal component analysis
by: Liwicki, Stephan, et al.
Published: (2013)
by: Liwicki, Stephan, et al.
Published: (2013)
300 Faces in-the-Wild Challenge: the first facial landmark localization challenge
by: Sagonas, Christos, et al.
Published: (2013)
by: Sagonas, Christos, et al.
Published: (2013)
Comprehensive design and development of time efficiency speaker recognition model from front end to back end
by: Ahmad, Abdul Manan, et al.
Published: (2008)
by: Ahmad, Abdul Manan, et al.
Published: (2008)
End-to-end object detection with transformers
by: Lai, Eddy Thin Jun
Published: (2024)
by: Lai, Eddy Thin Jun
Published: (2024)
Active orientation models for face alignment in-the-wild
by: Tzimiropoulos, Georgios, et al.
Published: (2014)
by: Tzimiropoulos, Georgios, et al.
Published: (2014)
From pixels to response maps: discriminative image filtering for face alignment in the wild
by: Asthana, Akshay, et al.
Published: (2014)
by: Asthana, Akshay, et al.
Published: (2014)
Generic active appearance models revisited
by: Tzimiropoulos, Georgios, et al.
Published: (2013)
by: Tzimiropoulos, Georgios, et al.
Published: (2013)
300 faces in-the-wild challenge: database and results
by: Sagonas, Christos, et al.
Published: (2016)
by: Sagonas, Christos, et al.
Published: (2016)
Analysis of end-To-end delay characteristics for various packets in IEC 61850 substation communications system
by: Das, N., et al.
Published: (2015)
by: Das, N., et al.
Published: (2015)
The end of the world
Published: (2010)
Published: (2010)
The end of confrontation
by: Mazlan Nordin,
Published: (2005)
by: Mazlan Nordin,
Published: (2005)
Beginning of the end?
by: Abd Razak, Dzulkifli
Published: (2009)
by: Abd Razak, Dzulkifli
Published: (2009)
An end in sight
by: Menon, Sandhya
Published: (2021)
by: Menon, Sandhya
Published: (2021)
A Cdk1 phosphomimic mutant of MCAK impairs microtubule end recognition
by: Belsham, Hannah R., et al.
Published: (2017)
by: Belsham, Hannah R., et al.
Published: (2017)
Online learning and fusion of orientation appearance models for robust rigid object tracking
by: Marras, Ioannis, et al.
Published: (2013)
by: Marras, Ioannis, et al.
Published: (2013)
Comparison Study of Various Factors Affecting End-to-End Delay in IEC 61850 Substation Communications Using OPNET
by: Das, Narottam, et al.
Published: (2012)
by: Das, Narottam, et al.
Published: (2012)
CFD analysis on mismatched end-to-end internal diameter of RSVG models
by: Yahya, Muhd Nur Rahman, et al.
Published: (2014)
by: Yahya, Muhd Nur Rahman, et al.
Published: (2014)
Writing Cinematic Sound: Audiovisual Phrasing and Screenwriting
Practice
by: Strand, Joachim Wichman
Published: (2023)
by: Strand, Joachim Wichman
Published: (2023)
End of semester examination
by: Ismail, Yusof
Published: (2011)
by: Ismail, Yusof
Published: (2011)
End of semester examination
by: Ismail, Yusof
Published: (2011)
by: Ismail, Yusof
Published: (2011)
Similar Items
-
Deep word embeddings for visual speech recognition
by: Stafylakis, Themos, et al.
Published: (2018) -
Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
by: Eyben, F., et al.
Published: (2011) -
Combining residual networks with LSTMs for lipreading
by: Stafylakis, Themos, et al.
Published: (2017) -
Principal component analysis of image gradient orientations for face recognition
by: Tzimiropoulos, Georgios, et al.
Published: (2011) -
Sparse representations of image gradient orientations for visual recognition and tracking
by: Tzimiropoulos, Georgios, et al.
Published: (2011)