Video anomaly detection with U-Net temporal modelling and contrastive regularization

Video anomaly detection (VAD) which is able to automatically identify the location of the anomaly event that happened in the video is one of the current hot study areas in deep learning. Due to expensive frame-level annotation in video samples, most of the VAD are trained with the weakly-supervised...

Full description

Bibliographic Details
Main Author: Gan, Kian Yu
Format: Final Year Project / Dissertation / Thesis
Published: 2023
Subjects:
Online Access:http://eprints.utar.edu.my/5786/
http://eprints.utar.edu.my/5786/1/fyp_CS_2023_GKY.pdf
_version_ 1848886504672198656
author Gan, Kian Yu
author_facet Gan, Kian Yu
author_sort Gan, Kian Yu
building UTAR Institutional Repository
collection Online Access
description Video anomaly detection (VAD) which is able to automatically identify the location of the anomaly event that happened in the video is one of the current hot study areas in deep learning. Due to expensive frame-level annotation in video samples, most of the VAD are trained with the weakly-supervised method. In a weakly-supervised manner, the labels are at video level. VAD is still an open question and challenging task because the model is trained with a limited sample in weakly supervised video-level labels. In this project, we aim to improve the VAD network with 2 different aspects. Firstly, we explore a technique to model the local and global temporal dependencies. Temporal dependencies are critical to detect anomaly events. Previous methods such as stacked RNN, temporal consistency and ConvLSTM can only capture short-range dependencies. GCN-based methods can model long-range dependencies, but they are slower and more difficult to train. RTFM captures both the short and long-temporal dependencies using two parallel structures, one for each type. However, the two dependencies are considered separately, neglecting the close relationship between them. In this aspect, we propose to use U-Net like structure to model both local and global dependencies for specialized features generation. Second, we explore a new regularization technique in a weakly-supervised manner to reduce overfitting. Insufficient training samples will lead to overfitting easily. Generally, the overfitting issue can be improved by reducing the complexity of the network, data augmentation, injecting noise into the network or applying dropout regularization. For VAD, previous works have applied special heuristics such as sparsity constraint and temporal smoothness to regulate the output of the model. However, none of the existing work has extended a feature-based approach to regularization where the strategy is to learn more discriminative features. In this project, we extend contrastive regularization in a weakly-supervised manner as a new regularization technique to reduce overfitting by learning more discriminative features and enhancing the separability of the features from different classes. We evaluated our model’s performance and compared the AUC performance with other state-of-the-art methods. Experimental results show that our model achieves the second-highest AUC performance compared to all published work on a benchmark dataset, namely UCF-Crime using the same pre-trained features.
first_indexed 2025-11-15T19:39:33Z
format Final Year Project / Dissertation / Thesis
id utar-5786
institution Universiti Tunku Abdul Rahman
institution_category Local University
last_indexed 2025-11-15T19:39:33Z
publishDate 2023
recordtype eprints
repository_type Digital Repository
spelling utar-57862023-09-08T14:15:56Z Video anomaly detection with U-Net temporal modelling and contrastive regularization Gan, Kian Yu Q Science (General) T Technology (General) Video anomaly detection (VAD) which is able to automatically identify the location of the anomaly event that happened in the video is one of the current hot study areas in deep learning. Due to expensive frame-level annotation in video samples, most of the VAD are trained with the weakly-supervised method. In a weakly-supervised manner, the labels are at video level. VAD is still an open question and challenging task because the model is trained with a limited sample in weakly supervised video-level labels. In this project, we aim to improve the VAD network with 2 different aspects. Firstly, we explore a technique to model the local and global temporal dependencies. Temporal dependencies are critical to detect anomaly events. Previous methods such as stacked RNN, temporal consistency and ConvLSTM can only capture short-range dependencies. GCN-based methods can model long-range dependencies, but they are slower and more difficult to train. RTFM captures both the short and long-temporal dependencies using two parallel structures, one for each type. However, the two dependencies are considered separately, neglecting the close relationship between them. In this aspect, we propose to use U-Net like structure to model both local and global dependencies for specialized features generation. Second, we explore a new regularization technique in a weakly-supervised manner to reduce overfitting. Insufficient training samples will lead to overfitting easily. Generally, the overfitting issue can be improved by reducing the complexity of the network, data augmentation, injecting noise into the network or applying dropout regularization. For VAD, previous works have applied special heuristics such as sparsity constraint and temporal smoothness to regulate the output of the model. However, none of the existing work has extended a feature-based approach to regularization where the strategy is to learn more discriminative features. In this project, we extend contrastive regularization in a weakly-supervised manner as a new regularization technique to reduce overfitting by learning more discriminative features and enhancing the separability of the features from different classes. We evaluated our model’s performance and compared the AUC performance with other state-of-the-art methods. Experimental results show that our model achieves the second-highest AUC performance compared to all published work on a benchmark dataset, namely UCF-Crime using the same pre-trained features. 2023-01 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/5786/1/fyp_CS_2023_GKY.pdf Gan, Kian Yu (2023) Video anomaly detection with U-Net temporal modelling and contrastive regularization. Final Year Project, UTAR. http://eprints.utar.edu.my/5786/
spellingShingle Q Science (General)
T Technology (General)
Gan, Kian Yu
Video anomaly detection with U-Net temporal modelling and contrastive regularization
title Video anomaly detection with U-Net temporal modelling and contrastive regularization
title_full Video anomaly detection with U-Net temporal modelling and contrastive regularization
title_fullStr Video anomaly detection with U-Net temporal modelling and contrastive regularization
title_full_unstemmed Video anomaly detection with U-Net temporal modelling and contrastive regularization
title_short Video anomaly detection with U-Net temporal modelling and contrastive regularization
title_sort video anomaly detection with u-net temporal modelling and contrastive regularization
topic Q Science (General)
T Technology (General)
url http://eprints.utar.edu.my/5786/
http://eprints.utar.edu.my/5786/1/fyp_CS_2023_GKY.pdf