Enhanced Intelligent Video Monitoring using Hybrid Integration of Spatiotemporal Autoencoders and Convolutional LSTMs
Abstract
Perceiving meaningful activities in surveillance videos presents significant challenges due to the ambiguous nature of anomalies and scene complexity. This paper proposes a hybrid deep learning framework that combines spatial-temporal autoencoders with convolutional LSTMs for automated anomaly detection in surveillance videos. The architecture integrates stacked convolutional autoencoders for semi-supervised feature representation with LSTM networks for preserving temporal information. Experiments conducted on the UCSD Ped1 dataset demonstrate that our LSTM-based Stacked CAE achieves an AUC of 83.5%, a detection rate of 81.5%, and an Equal Error Rate (EER) of 19.2%. The model particularly excels in temporal pattern recognition with an accuracy of 84.5% and sequence processing efficiency of 82.7%. Comparative analysis with state-of-the-art methods reveals that the proposed architecture achieves competitive performance, particularly in handling complex motion patterns and maintaining temporal consistency. The model shows significant improvement in false alarm rate reduction at 15.8% compared to the basic CAE’s 17.2%. The results demonstrate that integrating LSTM with stacked convolutional autoencoders provides a robust framework for real-world surveillance applications, especially in scenarios requiring both spatial and temporal anomaly detection.References
References
R. Nayak, U. C. Pati, and S. K. Das, “Video Anomaly Detection
using Convolutional Spatiotemporal Autoencoder,” 2020 International
Conference on Contemporary Computing and Applications, IC3A 2020,
pp. 175–180, 2020.
W. Sultani, C. Chen, and M. Shah, “Real-world Anomaly Detection in
Surveillance Videos,” IEEE Computer Society Conference on Computer
Vision and Pattern Recognition Workshops, pp. 6479–6488.
S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional neural networks
for human action recognition,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 35, no. 1, pp. 221–231, 2013.
R. Nayak, U. C. Pati, and S. K. Das, “A comprehensive review on deep
learning-based methods for video anomaly detection,” Image and Vision
Computing, vol. 106, p. 104078, 2021.
M. Cho, T. Kim, S. Cho, and C. V. Aug, “Unsupervised video anomaly detection via normalizing flows with implicit latent features”, Pattern Recognition, Volume 129, 2022, 108703, ISSN 0031-3203,
W. Luo, W. Liu, and S. Gao, “A Revisit of Sparse Coding Based
Anomaly Detection in Stacked RNN Framework,” Proceedings of the
IEEE International Conference on Computer Vision, vol. 2017-Octob,
pp. 341–349, 2017.
L. Wang, F. Zhou, Z. Li, W. Zuo, and H. Tan, “Abnormal Event
Detection in Videos Using Hybrid Spatio-Temporal Autoencoder,”
Proceedings - International Conference on Image Processing, ICIP,
pp. 2276–2280, 2018.
C. H. Yeh, C. Y. Lin, K. Muchtar, H. E. Lai, and M. T. Sun, “Three Pronged Compensation and Hysteresis Thresholding for Moving Object Detection in Real-Time Video Surveillance,” IEEE Transactions on Industrial Electronics, vol. 64, no. 6, pp. 4945–4955, 2017.
M. G. Narasimhan and S. Sowmya Kamath, “Dynamic video anomaly
detection and localization using sparse denoising autoencoders,” Multimedia Tools and Applications, vol. 77, pp. 13173–13195, jun 2018.
P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: ´An evaluation of the state of the art,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 743–761, 2012.
V. Singh, S. Singh, and P. Gupta, “Real-Time Anomaly Recognition
Through CCTV Using Neural Networks,” in Procedia Computer Science, vol. 173, pp. 254–263, Elsevier B.V., 2020.
M. Sharif, M. A. Khan, T. Akram, M. Y. Javed, T. Saba, and A. Rehman, “A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection,” Eurasip Journal on Image and Video Processing, vol. 2017, dec 2017.
D. O. Pop, A. Rogozan, C. Chatelain, F. Nashashibi, and A. Bensrhair, “Multi-task deep learning for pedestrian detection, action recognition and time to cross prediction,” IEEE Access, vol. 7, pp. 149318–149327, 2019.
K. Xu, T. Sun, and X. Jiang, “Video Anomaly Detection and Localization Based on an Adaptive Intra-Frame Classification Network,” IEEE Transactions on Multimedia, vol. 22, no. 2, pp. 394–406, 2020.
L. Havasi, Z. Szlavik, and T. Szir ´ anyi, “Detection of gait characteristics ´ for scene registration in video surveillance system,” IEEE Transactions on Image Processing, vol. 16, pp. 503–510, feb 2007.
S. Pang, J. J. del Coz, Z. Yu, O. Luaces, and J. D´ıez, “Deep learning to frame objects for visual target tracking,” Engineering Applications of Artificial Intelligence, vol. 65, pp. 406–420, oct 2017.
J. Luo, J. Zhao, B. Wen, and Y. Zhang, “Explaining the semantics
capturing capability of scene graph generation models,” Pattern Recognition, vol. 110, no. xxxx, p. 107427, 2021.
F. Zhong, M. Li, K. Zhang, J. Hu, and L. Liu, “DSPNet: A low
computational-cost network for human pose estimation,” Neurocomputing, vol. 423, pp. 327–335, jan 2021.
H. Kadu and C. C. Kuo, “Automatic human mocap data classification,” IEEE Transactions on Multimedia, vol. 16, pp. 2191–2202, dec 2014.
N. Nasaruddin, K. Muchtar, A. Afdhal, and A. P. J. Dwiyantoro, “Deep anomaly detection through visual attention in surveillance videos,” Journal of Big Data, vol. 7, dec 2020.
D. Tran, J. Yuan, and D. Forsyth, “Video event detection: From subvolume localization to spatiotemporal path search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, pp. 404–416, feb 2014.
D. Chen, Z. Yuan, G. Hua, N. Zheng, and J. Wang, “Similarity
learning on an explicit polynomial kernel feature map for person reidentification,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07-12-June,
pp. 1565–1573, 2015.
M. Haseeb and E. R. Hancock, “Unsupervised clustering of human
pose using spectral embedding,” Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), vol. 7626 LNCS, pp. 467–473, 2012.
M. Sedghi, M. Geo, and G. Atia, “A Multi-criteria Approach for Fast
and Robust Representative Selection from Manifolds,” IEEE Transactions on Knowledge and Data Engineering, vol. 4347, no. c, pp. 1–1, 2020.
A. S. Gaafar, J. M. Dahr, and A. K. Hamoud, “Comparative Analysis of Performance of Deep Learning Classification Approach based on LSTM-RNN for Textual and Image Datasets,” Informatica (Slovenia), vol. 46, no. 5, pp. 21–28, 2022.
A. Chefrour and S. Drissi, “K-CAE: Image Classification Using Convolutional AutoEncoder Pre-Training and K-means Clustering,” Informatica (Slovenia), vol. 47, no. 7, pp. 31–40, 2023.
S. Aberkane and M. Elarbi-Boudihir, “Deep Reinforcement Learningbased Anomaly Detection for Video Surveillance,” Informatica (Slovenia), vol. 46, no. 2, pp. 291–298, 2022.
J. T. Zhou, J. Du, H. Zhu, X. Peng, Y. Liu, and R. S. M. Goh,
“AnomalyNet: An Anomaly Detection Network for Video Surveillance,”
IEEE Transactions on Information Forensics and Security, vol. 14,
no. 10, pp. 2537–2550, 2019.
K. Wong, R. Dornberger, and T. Hanne, “An analysis of weight
initialization methods in connection with different activation functions for feedforward neural networks,” Evolutionary Intelligence,
no. 0123456789, 2022.
J. T. Zhou, K. Di, J. Du, X. Peng, H. Yang, S. J. Pan, I. W. Tsang,
Y. Liu, Z. Qin, and R. S. M. Goh, “Sc2Net: Sparse LSTMs for sparse
coding,” 32nd AAAI Conference on Artificial Intelligence, AAAI 2018,
pp. 4588–4595, 2018.
DOI:
https://doi.org/10.31449/inf.v49i18.7502Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







