End-to-end audiovisual speech recognition
Several end-to-end deep learning approaches have been recently presented which extract either audio or visual features from the input images or audio signals and perform speech recognition. However, research on end-to-end audiovisual models is very limited. In this work, we present an end-to-end aud...
Main Authors: | , , , , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2018
|
Online Access: | http://eprints.nottingham.ac.uk/51132/ http://eprints.nottingham.ac.uk/51132/1/av_speech1.pdf |