Modeling sub-event dynamics in first-person action recognition

First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We prop...

Full description

Bibliographic Details
Main Authors: Mohd Zaki, Hasan Firdaus, Shafait, Faisal, Mian, Ajmal S.
Format: Proceeding Paper
Language:English
English
Published: IEEE 2017
Subjects:
Online Access:http://irep.iium.edu.my/64353/
http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf
http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf
_version_ 1848786199398842368
author Mohd Zaki, Hasan Firdaus
Shafait, Faisal
Mian, Ajmal S.
author_facet Mohd Zaki, Hasan Firdaus
Shafait, Faisal
Mian, Ajmal S.
author_sort Mohd Zaki, Hasan Firdaus
building IIUM Repository
collection Online Access
description First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal feature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets.
first_indexed 2025-11-14T17:05:14Z
format Proceeding Paper
id iium-64353
institution International Islamic University Malaysia
institution_category Local University
language English
English
last_indexed 2025-11-14T17:05:14Z
publishDate 2017
publisher IEEE
recordtype eprints
repository_type Digital Repository
spelling iium-643532018-07-05T06:56:36Z http://irep.iium.edu.my/64353/ Modeling sub-event dynamics in first-person action recognition Mohd Zaki, Hasan Firdaus Shafait, Faisal Mian, Ajmal S. TK7885 Computer engineering First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal feature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets. IEEE 2017-11-09 Proceeding Paper PeerReviewed application/pdf en http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf application/pdf en http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf Mohd Zaki, Hasan Firdaus and Shafait, Faisal and Mian, Ajmal S. (2017) Modeling sub-event dynamics in first-person action recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21st-26th July 2017, Honolulu, USA. https://ieeexplore.ieee.org/document/8099659/ 10.1109/CVPR.2017.176
spellingShingle TK7885 Computer engineering
Mohd Zaki, Hasan Firdaus
Shafait, Faisal
Mian, Ajmal S.
Modeling sub-event dynamics in first-person action recognition
title Modeling sub-event dynamics in first-person action recognition
title_full Modeling sub-event dynamics in first-person action recognition
title_fullStr Modeling sub-event dynamics in first-person action recognition
title_full_unstemmed Modeling sub-event dynamics in first-person action recognition
title_short Modeling sub-event dynamics in first-person action recognition
title_sort modeling sub-event dynamics in first-person action recognition
topic TK7885 Computer engineering
url http://irep.iium.edu.my/64353/
http://irep.iium.edu.my/64353/
http://irep.iium.edu.my/64353/
http://irep.iium.edu.my/64353/8/64353%20Modeling%20Sub-Event%20Dynamics%20in%20First-Person%20Action%20Recognition.pdf
http://irep.iium.edu.my/64353/7/64353%20Modeling%20sub-event%20dynamics%20in%20first-person%20action%20recognition%20SCOPUS.pdf