Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual...
| Main Authors: | , , , , , |
|---|---|
| Format: | Conference or Workshop Item |
| Published: |
2011
|
| Subjects: | |
| Online Access: | https://eprints.nottingham.ac.uk/31428/ |
| _version_ | 1848794199196958720 |
|---|---|
| author | Eyben, F. Petridis, S. Schuller, Björn Tzimiropoulos, Georgios Zafeiriou, Stefanos Pantic, Maja |
| author_facet | Eyben, F. Petridis, S. Schuller, Björn Tzimiropoulos, Georgios Zafeiriou, Stefanos Pantic, Maja |
| author_sort | Eyben, F. |
| building | Nottingham Research Data Repository |
| collection | Online Access |
| description | We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features. |
| first_indexed | 2025-11-14T19:12:24Z |
| format | Conference or Workshop Item |
| id | nottingham-31428 |
| institution | University of Nottingham Malaysia Campus |
| institution_category | Local University |
| last_indexed | 2025-11-14T19:12:24Z |
| publishDate | 2011 |
| recordtype | eprints |
| repository_type | Digital Repository |
| spelling | nottingham-314282020-05-04T20:23:51Z https://eprints.nottingham.ac.uk/31428/ Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks Eyben, F. Petridis, S. Schuller, Björn Tzimiropoulos, Georgios Zafeiriou, Stefanos Pantic, Maja We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features. 2011 Conference or Workshop Item PeerReviewed Eyben, F., Petridis, S., Schuller, Björn, Tzimiropoulos, Georgios, Zafeiriou, Stefanos and Pantic, Maja (2011) Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks. In: ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 22-27 May 2011, Prague, Czech Republic. Audio Signal Processing Audio-Visual Systems Recurrent Neural Nets Support Vector Machines Video Signal Processing http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5947690 |
| spellingShingle | Audio Signal Processing Audio-Visual Systems Recurrent Neural Nets Support Vector Machines Video Signal Processing Eyben, F. Petridis, S. Schuller, Björn Tzimiropoulos, Georgios Zafeiriou, Stefanos Pantic, Maja Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks |
| title | Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks |
| title_full | Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks |
| title_fullStr | Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks |
| title_full_unstemmed | Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks |
| title_short | Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks |
| title_sort | audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks |
| topic | Audio Signal Processing Audio-Visual Systems Recurrent Neural Nets Support Vector Machines Video Signal Processing |
| url | https://eprints.nottingham.ac.uk/31428/ https://eprints.nottingham.ac.uk/31428/ |