Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks

We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual...

Full description

Bibliographic Details
Main Authors: Eyben, F., Petridis, S., Schuller, Björn, Tzimiropoulos, Georgios, Zafeiriou, Stefanos, Pantic, Maja
Format: Conference or Workshop Item
Published: 2011
Subjects:
Online Access:https://eprints.nottingham.ac.uk/31428/
_version_ 1848794199196958720
author Eyben, F.
Petridis, S.
Schuller, Björn
Tzimiropoulos, Georgios
Zafeiriou, Stefanos
Pantic, Maja
author_facet Eyben, F.
Petridis, S.
Schuller, Björn
Tzimiropoulos, Georgios
Zafeiriou, Stefanos
Pantic, Maja
author_sort Eyben, F.
building Nottingham Research Data Repository
collection Online Access
description We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features.
first_indexed 2025-11-14T19:12:24Z
format Conference or Workshop Item
id nottingham-31428
institution University of Nottingham Malaysia Campus
institution_category Local University
last_indexed 2025-11-14T19:12:24Z
publishDate 2011
recordtype eprints
repository_type Digital Repository
spelling nottingham-314282020-05-04T20:23:51Z https://eprints.nottingham.ac.uk/31428/ Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks Eyben, F. Petridis, S. Schuller, Björn Tzimiropoulos, Georgios Zafeiriou, Stefanos Pantic, Maja We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features. 2011 Conference or Workshop Item PeerReviewed Eyben, F., Petridis, S., Schuller, Björn, Tzimiropoulos, Georgios, Zafeiriou, Stefanos and Pantic, Maja (2011) Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks. In: ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 22-27 May 2011, Prague, Czech Republic. Audio Signal Processing Audio-Visual Systems Recurrent Neural Nets Support Vector Machines Video Signal Processing http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5947690
spellingShingle Audio Signal Processing
Audio-Visual Systems
Recurrent Neural Nets
Support Vector Machines
Video Signal Processing
Eyben, F.
Petridis, S.
Schuller, Björn
Tzimiropoulos, Georgios
Zafeiriou, Stefanos
Pantic, Maja
Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
title Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
title_full Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
title_fullStr Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
title_full_unstemmed Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
title_short Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
title_sort audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
topic Audio Signal Processing
Audio-Visual Systems
Recurrent Neural Nets
Support Vector Machines
Video Signal Processing
url https://eprints.nottingham.ac.uk/31428/
https://eprints.nottingham.ac.uk/31428/