Video anomaly detection with U-Net temporal modelling and contrastive regularization

Video anomaly detection (VAD) which is able to automatically identify the location of the anomaly event that happened in the video is one of the current hot study areas in deep learning. Due to expensive frame-level annotation in video samples, most of the VAD are trained with the weakly-supervised...

Full description

Bibliographic Details
Main Author:	Gan, Kian Yu
Format:	Final Year Project / Dissertation / Thesis
Published:	2023
Subjects:	Q Science (General) T Technology (General)
Online Access:	http://eprints.utar.edu.my/5786/ http://eprints.utar.edu.my/5786/1/fyp_CS_2023_GKY.pdf

_version_	1848886504672198656
author	Gan, Kian Yu
author_facet	Gan, Kian Yu
author_sort	Gan, Kian Yu
building	UTAR Institutional Repository
collection	Online Access
description	Video anomaly detection (VAD) which is able to automatically identify the location of the anomaly event that happened in the video is one of the current hot study areas in deep learning. Due to expensive frame-level annotation in video samples, most of the VAD are trained with the weakly-supervised method. In a weakly-supervised manner, the labels are at video level. VAD is still an open question and challenging task because the model is trained with a limited sample in weakly supervised video-level labels. In this project, we aim to improve the VAD network with 2 different aspects. Firstly, we explore a technique to model the local and global temporal dependencies. Temporal dependencies are critical to detect anomaly events. Previous methods such as stacked RNN, temporal consistency and ConvLSTM can only capture short-range dependencies. GCN-based methods can model long-range dependencies, but they are slower and more difficult to train. RTFM captures both the short and long-temporal dependencies using two parallel structures, one for each type. However, the two dependencies are considered separately, neglecting the close relationship between them. In this aspect, we propose to use U-Net like structure to model both local and global dependencies for specialized features generation. Second, we explore a new regularization technique in a weakly-supervised manner to reduce overfitting. Insufficient training samples will lead to overfitting easily. Generally, the overfitting issue can be improved by reducing the complexity of the network, data augmentation, injecting noise into the network or applying dropout regularization. For VAD, previous works have applied special heuristics such as sparsity constraint and temporal smoothness to regulate the output of the model. However, none of the existing work has extended a feature-based approach to regularization where the strategy is to learn more discriminative features. In this project, we extend contrastive regularization in a weakly-supervised manner as a new regularization technique to reduce overfitting by learning more discriminative features and enhancing the separability of the features from different classes. We evaluated our model’s performance and compared the AUC performance with other state-of-the-art methods. Experimental results show that our model achieves the second-highest AUC performance compared to all published work on a benchmark dataset, namely UCF-Crime using the same pre-trained features.
first_indexed	2025-11-15T19:39:33Z
format	Final Year Project / Dissertation / Thesis
id	utar-5786
institution	Universiti Tunku Abdul Rahman
institution_category	Local University
last_indexed	2025-11-15T19:39:33Z
publishDate	2023
recordtype	eprints
repository_type	Digital Repository
spelling	utar-57862023-09-08T14:15:56Z Video anomaly detection with U-Net temporal modelling and contrastive regularization Gan, Kian Yu Q Science (General) T Technology (General) Video anomaly detection (VAD) which is able to automatically identify the location of the anomaly event that happened in the video is one of the current hot study areas in deep learning. Due to expensive frame-level annotation in video samples, most of the VAD are trained with the weakly-supervised method. In a weakly-supervised manner, the labels are at video level. VAD is still an open question and challenging task because the model is trained with a limited sample in weakly supervised video-level labels. In this project, we aim to improve the VAD network with 2 different aspects. Firstly, we explore a technique to model the local and global temporal dependencies. Temporal dependencies are critical to detect anomaly events. Previous methods such as stacked RNN, temporal consistency and ConvLSTM can only capture short-range dependencies. GCN-based methods can model long-range dependencies, but they are slower and more difficult to train. RTFM captures both the short and long-temporal dependencies using two parallel structures, one for each type. However, the two dependencies are considered separately, neglecting the close relationship between them. In this aspect, we propose to use U-Net like structure to model both local and global dependencies for specialized features generation. Second, we explore a new regularization technique in a weakly-supervised manner to reduce overfitting. Insufficient training samples will lead to overfitting easily. Generally, the overfitting issue can be improved by reducing the complexity of the network, data augmentation, injecting noise into the network or applying dropout regularization. For VAD, previous works have applied special heuristics such as sparsity constraint and temporal smoothness to regulate the output of the model. However, none of the existing work has extended a feature-based approach to regularization where the strategy is to learn more discriminative features. In this project, we extend contrastive regularization in a weakly-supervised manner as a new regularization technique to reduce overfitting by learning more discriminative features and enhancing the separability of the features from different classes. We evaluated our model’s performance and compared the AUC performance with other state-of-the-art methods. Experimental results show that our model achieves the second-highest AUC performance compared to all published work on a benchmark dataset, namely UCF-Crime using the same pre-trained features. 2023-01 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/5786/1/fyp_CS_2023_GKY.pdf Gan, Kian Yu (2023) Video anomaly detection with U-Net temporal modelling and contrastive regularization. Final Year Project, UTAR. http://eprints.utar.edu.my/5786/
spellingShingle	Q Science (General) T Technology (General) Gan, Kian Yu Video anomaly detection with U-Net temporal modelling and contrastive regularization
title	Video anomaly detection with U-Net temporal modelling and contrastive regularization
title_full	Video anomaly detection with U-Net temporal modelling and contrastive regularization
title_fullStr	Video anomaly detection with U-Net temporal modelling and contrastive regularization
title_full_unstemmed	Video anomaly detection with U-Net temporal modelling and contrastive regularization
title_short	Video anomaly detection with U-Net temporal modelling and contrastive regularization
title_sort	video anomaly detection with u-net temporal modelling and contrastive regularization
topic	Q Science (General) T Technology (General)
url	http://eprints.utar.edu.my/5786/ http://eprints.utar.edu.my/5786/1/fyp_CS_2023_GKY.pdf

Video anomaly detection with U-Net temporal modelling and contrastive regularization

Similar Items