Combining residual networks with LSTMs for lipreading

We propose an end-to-end deep learning architecture for word level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We trained and evaluated it on the Lipreading In-The-Wild benchmark, a challenging da...

Full description

Bibliographic Details
Main Authors:	Stafylakis, Themos, Tzimiropoulos, Georgios
Format:	Conference or Workshop Item
Published:	2017
Subjects:	visual speech recognition lipreading deep learning
Online Access:	https://eprints.nottingham.ac.uk/44756/

Internet

https://eprints.nottingham.ac.uk/44756/

Combining residual networks with LSTMs for lipreading

Internet

Similar Items