Enhanced Intelligent Video Monitoring using Hybrid Integration of Spatiotemporal Autoencoders and Convolutional LSTMs

Ankita Umale-Nagmote, Charu Goel, Nidhi Lal

Abstract


Perceiving meaningful activities in surveillance videos presents significant challenges due to the ambiguous nature of anomalies and scene complexity. This paper proposes a hybrid deep learning framework that combines spatial-temporal autoencoders with convolutional LSTMs for automated anomaly detection in surveillance videos. The architecture integrates stacked convolutional autoencoders for semi-supervised feature representation with LSTM networks for preserving temporal information. Experiments conducted on the UCSD Ped1 dataset demonstrate that our LSTM-based Stacked CAE achieves an AUC of 83.5%, a detection rate of 81.5%, and an Equal Error Rate (EER) of 19.2%. The model particularly excels in temporal pattern recognition with an accuracy of 84.5% and sequence processing efficiency of 82.7%. Comparative analysis with state-of-the-art methods reveals that the proposed architecture achieves competitive performance, particularly in handling complex motion patterns and maintaining temporal consistency. The model shows significant improvement in false alarm rate reduction at 15.8% compared to the basic CAE’s 17.2%. The results demonstrate that integrating LSTM with stacked convolutional autoencoders provides a robust framework for real-world surveillance applications, especially in scenarios requiring both spatial and temporal anomaly detection.

Full Text:

PDF

References


References

R. Nayak, U. C. Pati, and S. K. Das, “Video Anomaly Detection

using Convolutional Spatiotemporal Autoencoder,” 2020 International

Conference on Contemporary Computing and Applications, IC3A 2020,

pp. 175–180, 2020.

W. Sultani, C. Chen, and M. Shah, “Real-world Anomaly Detection in

Surveillance Videos,” IEEE Computer Society Conference on Computer

Vision and Pattern Recognition Workshops, pp. 6479–6488.

S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional neural networks

for human action recognition,” IEEE Transactions on Pattern Analysis

and Machine Intelligence, vol. 35, no. 1, pp. 221–231, 2013.

R. Nayak, U. C. Pati, and S. K. Das, “A comprehensive review on deep

learning-based methods for video anomaly detection,” Image and Vision

Computing, vol. 106, p. 104078, 2021.

M. Cho, T. Kim, S. Cho, and C. V. Aug, “Unsupervised video anomaly detection via normalizing flows with implicit latent features”, Pattern Recognition, Volume 129, 2022, 108703, ISSN 0031-3203,

W. Luo, W. Liu, and S. Gao, “A Revisit of Sparse Coding Based

Anomaly Detection in Stacked RNN Framework,” Proceedings of the

IEEE International Conference on Computer Vision, vol. 2017-Octob,

pp. 341–349, 2017.

L. Wang, F. Zhou, Z. Li, W. Zuo, and H. Tan, “Abnormal Event

Detection in Videos Using Hybrid Spatio-Temporal Autoencoder,”

Proceedings - International Conference on Image Processing, ICIP,

pp. 2276–2280, 2018.

C. H. Yeh, C. Y. Lin, K. Muchtar, H. E. Lai, and M. T. Sun, “Three Pronged Compensation and Hysteresis Thresholding for Moving Object Detection in Real-Time Video Surveillance,” IEEE Transactions on Industrial Electronics, vol. 64, no. 6, pp. 4945–4955, 2017.

M. G. Narasimhan and S. Sowmya Kamath, “Dynamic video anomaly

detection and localization using sparse denoising autoencoders,” Multimedia Tools and Applications, vol. 77, pp. 13173–13195, jun 2018.

P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: ´An evaluation of the state of the art,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 743–761, 2012.

V. Singh, S. Singh, and P. Gupta, “Real-Time Anomaly Recognition

Through CCTV Using Neural Networks,” in Procedia Computer Science, vol. 173, pp. 254–263, Elsevier B.V., 2020.

M. Sharif, M. A. Khan, T. Akram, M. Y. Javed, T. Saba, and A. Rehman, “A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection,” Eurasip Journal on Image and Video Processing, vol. 2017, dec 2017.

D. O. Pop, A. Rogozan, C. Chatelain, F. Nashashibi, and A. Bensrhair, “Multi-task deep learning for pedestrian detection, action recognition and time to cross prediction,” IEEE Access, vol. 7, pp. 149318–149327, 2019.

K. Xu, T. Sun, and X. Jiang, “Video Anomaly Detection and Localization Based on an Adaptive Intra-Frame Classification Network,” IEEE Transactions on Multimedia, vol. 22, no. 2, pp. 394–406, 2020.

L. Havasi, Z. Szlavik, and T. Szir ´ anyi, “Detection of gait characteristics ´ for scene registration in video surveillance system,” IEEE Transactions on Image Processing, vol. 16, pp. 503–510, feb 2007.

S. Pang, J. J. del Coz, Z. Yu, O. Luaces, and J. D´ıez, “Deep learning to frame objects for visual target tracking,” Engineering Applications of Artificial Intelligence, vol. 65, pp. 406–420, oct 2017.

J. Luo, J. Zhao, B. Wen, and Y. Zhang, “Explaining the semantics

capturing capability of scene graph generation models,” Pattern Recognition, vol. 110, no. xxxx, p. 107427, 2021.

F. Zhong, M. Li, K. Zhang, J. Hu, and L. Liu, “DSPNet: A low

computational-cost network for human pose estimation,” Neurocomputing, vol. 423, pp. 327–335, jan 2021.

H. Kadu and C. C. Kuo, “Automatic human mocap data classification,” IEEE Transactions on Multimedia, vol. 16, pp. 2191–2202, dec 2014.

N. Nasaruddin, K. Muchtar, A. Afdhal, and A. P. J. Dwiyantoro, “Deep anomaly detection through visual attention in surveillance videos,” Journal of Big Data, vol. 7, dec 2020.

D. Tran, J. Yuan, and D. Forsyth, “Video event detection: From subvolume localization to spatiotemporal path search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, pp. 404–416, feb 2014.

D. Chen, Z. Yuan, G. Hua, N. Zheng, and J. Wang, “Similarity

learning on an explicit polynomial kernel feature map for person reidentification,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07-12-June,

pp. 1565–1573, 2015.

M. Haseeb and E. R. Hancock, “Unsupervised clustering of human

pose using spectral embedding,” Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence and Lecture

Notes in Bioinformatics), vol. 7626 LNCS, pp. 467–473, 2012.

M. Sedghi, M. Geo, and G. Atia, “A Multi-criteria Approach for Fast

and Robust Representative Selection from Manifolds,” IEEE Transactions on Knowledge and Data Engineering, vol. 4347, no. c, pp. 1–1, 2020.

A. S. Gaafar, J. M. Dahr, and A. K. Hamoud, “Comparative Analysis of Performance of Deep Learning Classification Approach based on LSTM-RNN for Textual and Image Datasets,” Informatica (Slovenia), vol. 46, no. 5, pp. 21–28, 2022.

A. Chefrour and S. Drissi, “K-CAE: Image Classification Using Convolutional AutoEncoder Pre-Training and K-means Clustering,” Informatica (Slovenia), vol. 47, no. 7, pp. 31–40, 2023.

S. Aberkane and M. Elarbi-Boudihir, “Deep Reinforcement Learningbased Anomaly Detection for Video Surveillance,” Informatica (Slovenia), vol. 46, no. 2, pp. 291–298, 2022.

J. T. Zhou, J. Du, H. Zhu, X. Peng, Y. Liu, and R. S. M. Goh,

“AnomalyNet: An Anomaly Detection Network for Video Surveillance,”

IEEE Transactions on Information Forensics and Security, vol. 14,

no. 10, pp. 2537–2550, 2019.

K. Wong, R. Dornberger, and T. Hanne, “An analysis of weight

initialization methods in connection with different activation functions for feedforward neural networks,” Evolutionary Intelligence,

no. 0123456789, 2022.

J. T. Zhou, K. Di, J. Du, X. Peng, H. Yang, S. J. Pan, I. W. Tsang,

Y. Liu, Z. Qin, and R. S. M. Goh, “Sc2Net: Sparse LSTMs for sparse

coding,” 32nd AAAI Conference on Artificial Intelligence, AAAI 2018,

pp. 4588–4595, 2018.




DOI: https://doi.org/10.31449/inf.v49i18.7502

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.