Combining residual networks with LSTMs for lipreading
We propose an end-to-end deep learning architecture for word level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We trained and evaluated it on the Lipreading In-The-Wild benchmark, a challenging da...
| Main Authors: | , |
|---|---|
| Format: | Conference or Workshop Item |
| Published: |
2017
|
| Subjects: | |
| Online Access: | https://eprints.nottingham.ac.uk/44756/ |