A Deployment-Oriented Hybrid CNN–LSTM–MIL System for Real- World Video Anomaly Detection
Abstract
Intelligent surveillance systems require video anomaly detection methods that operate reliably under real- world conditions rather than controlled benchmark settings. This paper presents a deployment-oriented hybrid CNN–LSTM–MIL framework that integrates spatio–temporal feature learning, weakly supervised anomaly scoring, and reconstruction-based regularity modeling to address the practical challenges of large-scale video surveillance. The proposed framework is evaluated on widely used benchmark datasets, including UCF-Crime, CUHK Avenue, ShanghaiTech, and UMN, as well as on diverse real-world CCTV footage captured from urban streets, shopping malls, traffic intersections, and railway stations. Experimental results demonstrate competitive detection performance, achieving AUC scores of 85.9% on UCF-Crime and 91.3% on CUHK Avenue, while maintaining near real-time inference speeds of 28–50 frames per second on GPU and edge platforms through deployment-oriented optimizations such as pruning and quantization. Additional evaluation on real-world surveillance data shows reduced false alarm rates and stable detection performance under challenging conditions, including illumination variations, background clutter, occlusions, and varying crowd densities. By jointly analyzing detection accuracy, computational efficiency, and deployment feasibility, this work bridges the gap between benchmark-oriented research and practical intelligent surveillance deployment for public safety and traffic monitoring applications.References
[1] S. Bhatt, A. Patel, and R. Mehta, Hybrid CNN–LSTM models for early disease diagnosis from medical
imaging data, Biomedical Signal Processing and Control, vol. 89, pp. 105–118, 2026.
[2] P. Singh, R. Kumar, and A. Verma, Image authentication using chaotic maps for secure visual communication, Multimedia Tools and Applications, vol. 80, no. 9, pp. 13541–13562, 2021.
[3] J. Kim and K. Grauman, Observe locally, infer globally: A space–time MRF for detecting abnormal activities, in Proc. IEEE CVPR, 2009, pp. 2921– 2928. https://doi.org/10.1109/CVPR.2009.5206599
[4] R. Mehran, A. Oyama, and M. Shah, Abnormal crowd behavior detection using social force model, in Proc. IEEE CVPR, 2009, pp. 935–942. https://doi.org/10.1109/CVPR.2009.5206641
[5] W. Sultani, C. Chen, and M. Shah, Real-world anomaly detection in surveillance videos, in Proc. IEEE CVPR, 2018, pp. 6479–6488. https://doi.org/10.1109/CVPR.2018.00678
[6] C. Lu, J. Shi, and J. Jia, Abnormal event detection at 150 FPS in MATLAB, in Proc. IEEE ICCV, 2013, pp. 2720–2727. https://doi.org/10.1109/ICCV.2013.338
[7] W. Luo, W. Liu, and S. Gao, A revisit of sparse coding-based anomaly detection in stacked RNN framework, in Proc. IEEE ICCV, Venice, Italy, 2017, pp. 341–349. https://doi.org/10.1109/ICCV.2017.45
[8] W. Li, V. Mahadevan, and N. Vasconcelos, Anomaly detection and localization in crowded scenes, IEEE TPAMI, vol. 36, no. 1, pp. 18–32, 2014. https://doi.org/10.1109/TPAMI.2013.111
[9] L. Wang, F. Zhou, Z. Li, W. Zuo, and H. Tan, Abnormal event detection in videos using hybrid spatio-temporal autoencoder, in Proc. IEEE ICIP, Athens, Greece, 2018, pp. 2276–2280. https://doi.org/10.1109/ICIP.2018.8451070
[10] M. Hasan, J. Choi, J. Neumann, A. Roy-Chowdhury, and L. Davis, Learning temporal regularity in video sequences, in Proc. IEEE CVPR, 2016, pp. 733–742. https://doi.org/10.1109/CVPR.2016.86
[11] D. Gong et al., Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection, in Proc. IEEE/CVF ICCV, Seoul, South Korea, 2019, pp. 1705–1714. https://doi.org/10.1109/ICCV.2019.00179
[12] G. Pang, C. Shen, L. Cao, and A. van den Hengel, Deep learning for anomaly detection: A review, ACM Computing Surveys, vol. 54, no. 2, Article 38, pp. 1–38, 2021.https://doi.org/10.1145/3439950
[13] S. Peng, Y. Cai, Z. Yao, et al., Weakly supervised video anomaly detection via temporal resolution feature learning, Applied Intelligence, vol. 53, pp. 30607–30625, 2023. https://doi.org/10.1007/s10489-023-05072-8
[14] W. Ullah, L. U. Khan, M. Guizani, C.-D. Wang, and D. Wu, Graph-based temporal attention network for anomaly recognition in Internet of Things video surveillance, IEEE Internet of Things Journal, 2025. https://doi.org/10.1109/JIOT.2025.3597219
[15] C. Feichtenhofer, A. Pinz, and R. P. Wildes, Spatiotemporal multiplier networks for video action recognition, in Proc. IEEE CVPR, Honolulu, HI, USA, 2017, pp. 7445– 7454.https://doi.org/10.1109/CVPR.2017.787
[16] S. Han, H. Mao, and W. J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding, in Proc. ICLR, 2016.
[17] A. G. Howard et al., MobileNets: Efficient convolutional neural networks for mobile vision applications, arXiv:1704.04861, 2017.
[18] J. Donahue et al., Long-term recurrent convolutional networks for visual recognition and description, in
Proc. IEEE CVPR, 2015, pp. 2625–2634. https://doi.org/10.1109/CVPR.2015.7298878
[19] K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, NeurIPS, 2014
[20] W. Liu, W. Luo, D. Lian, and S. Gao, Future frame prediction for anomaly detection – A new baseline, in Proc. IEEE/CVF CVPR, Salt Lake City, UT, USA, 2018, pp. 6536–6545. https://doi.org/10.1109/CVPR.2018.00684
[21] Ö. Cebeci and A. K. Hocaoğlu, Anomaly detection in crowded scene, in Proc. ELECO, Bursa, Türkiye, 2024, pp. 1
–5. https://doi.org/10.1109/ELECO64362.2024.10847215
[22] M. Ravanbakhsh et al., Plug -and-play CNN for crowd motion analysis: An application in abnormal event detection, in Proc. IEEE WACV, 2018, pp. 1689–1698. https://doi.org/10.1109/WACV.2018.00188
[23] M. Sabokrou et al., Deep -anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes, CVIU, vol. 172, pp. 88–97, 2018. https://doi.org/10.1016/j.cviu.2018.02.006
[24] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, in Proc. IEEE CVPR, 2005, pp. 886 –893.https://doi.org/10.1109/CVPR.2005.177
[25] D. Aishwarya and R. I. Minu, Edge computing -based surveillance framework for real -time activity recognition, ICT Express, vol. 7, no. 2, pp. 182–186, 2021.https://doi.org/10.1016/j.icte.2021.04.010
[26] J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, in Proc. IEEE CVPR, 2017, pp. 7263 –7271. https://doi.org/10.1109/CVPR.2017.690
DOI:
https://doi.org/10.31449/inf.v50i1.12915Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







