New hybrid deep learning method to recognize human action from video

There has been a tremendous increase in internet users and enough bandwidth in recent years. Because Internet connectivity is so inexpensive, information sharing (text, audio, and video) has become more popular and faster. This video content must be examined in order to classify it for different pur...

Full description

Bibliographic Details
Main Authors: Md Shofiqul, Islam, Sunjida, Sultana, Md Jabbarul, Islam
Format: Article
Language:English
Published: Universitas Ahmad Dahlan 2021
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/33225/
http://umpir.ump.edu.my/id/eprint/33225/1/New%20hybrid%20deep%20learning%20method%20to%20recognize_FULL.pdf
Description
Summary:There has been a tremendous increase in internet users and enough bandwidth in recent years. Because Internet connectivity is so inexpensive, information sharing (text, audio, and video) has become more popular and faster. This video content must be examined in order to classify it for different purposes for users. Several machine learning approaches for video classification have been developed to save users time and energy. The use of deep neural networks to recognize human behavior has become a popular issue in recent years. Although significant progress has been made in the field of video recognition, there are still numerous challenges in the realm of video to be overcome. Convolutional neural networks (CNNs) are well-known for requiring a fixedsize image input, which limits the network topology and reduces identification accuracy. Despite the fact that this problem has been solved in the world of photos, it has yet to be solved in the area of video. We present a ten stacked three-dimensional (3D) convolutional network based on the spatial pyramidbased pooling to handle the input problem of fixed size video frames in video recognition. The network structure is made up of three sections, as the name suggests: a ten-layer stacked 3DCNN, DenseNet, and SPPNet. A KTH dataset was used to test our algorithms. The experimental findings showed that our model outperformed existing models in the area of video-based behavior identification by 2% margin accuracy.