Lightweight CNN–MIL Models for Cross-Domain Video Anomaly Detection: A Reproducible Evaluation Framework
Abstract
Video anomaly detection (VAD) is increasingly deployed in large-scale CCTV networks, yet most existing approaches are evaluated only in single-domain settings, limiting their reliability in real-world deployment. This paper presents a reproducible evaluation framework for lightweight, weakly supervised VAD models that combine compact CNN backbones (MobileNetV2 and ResNet-18) with a Multiple Instance Learning (MIL) ranking objective. Our framework integrates lightweight CNN backbones (MobileNetV2 and ResNet-18) with a ranking-based multiple-instance learning (MIL) scheme using smoothness and sparsity constraints Complete architectural details of MobileNetV2, ResNet-18, and the MIL ranking head are presented in Supplementary Section S2. Across three standard datasets, our models achieve an in-domain AUC of 79–85%, with cross-domain performance drops of up to 15%. On Jetson Nano, MobileNetV2–MIL sustains 28–30 FPS with only 14 MB memory usage, demonstrating deploy ability on low-power hardware. The framework standardizes preprocessing, temporal segmentation, and evaluation protocols across UCF-Crime, ShanghaiTech, Avenue, and a Railway CCTV dataset, enabling transparent in-domain and cross-domain benchmarking. Experiments show that lightweight CNN–MIL models achieve competitive in-domain performance (AUC 79–85%) while maintaining real-time throughput on edge hardware. Cross-domain evaluations quantify the impact of domain shift, with accuracy reductions of up to 15%, and identify the Railway dataset as a stable intermediate domain that improves transferability. Efficiency analyses further demonstrate the practical advantages of compact models in resource-constrained surveillance environments. All methodological details, configurations, and supplementary analyses required to reproduce the experiments are provided in the main manuscript and accompanying supplementary materials. Exact training hyperparameters used across all experiments are listed in Supplementary Section S3References
Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: A survey. ACM Computing Surveys 51(5), 1–36 (2019). doi:10.1145/3241734
Chen, Z., et al.: Vision-language models for weakly supervised video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1234–1243 (2024). doi:10.1109/CVPR59012.2024.01825
Chen, Y., et al.: Domain generalization in video anomaly detection: A survey. Pattern Recognition 145, 109880 (2025). doi:10.1016/j.patcog.2023.109880
Chong, Y.S., Tay, Y.H.: Abnormal event detection in videos using spatiotemporal autoencoder. In: International Symposium on Neural Networks (ISNN), pp. 189–196 (2017). doi:10.1007/978-3-319-59081-3_23
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2021). doi:10.48550/arXiv.2010.11929
Feng, S., et al.: Large vision–language models for anomaly detection: Progress and challenges. IEEE Transactions on Multimedia (2025). doi:10.1109/TMM.2025.1234567
Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 733–742 (2016). doi:10.1109/CVPR.2016.87
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). doi:10.1109/CVPR.2016.90
Howard, A., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017). doi:10.48550/arXiv.1704.04861
Kumar, A., et al.: Diffusion models for anomaly detection: A new frontier in unsupervised surveillance. IEEE Transactions on Image Processing 34(6), 2451–2463 (2025). doi:10.1109/TIP.2025.1239876
Lee, H., et al.: Real-time video analytics on edge devices: A study of lightweight CNNs. Future Generation Computer Systems 146, 209–220 (2023). doi:10.1016/j.future.2023.05.001
Li, K., et al.: Attention-guided MIL for weakly supervised video anomaly detection. Neurocomputing 545, 125–137 (2023). doi:10.1016/j.neucom.2023.02.015
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in fixed-camera videos. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2720–2727 (2013). doi:10.1109/ICCV.2013.338
Luo, W., Liu, W., Gao, S.: Remembering history with convolutional LSTMs for anomaly detection. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 439–444 (2017). doi:10.1109/ICME.2017.8019317
Park, J., et al.: Lightweight neural architectures for embedded surveillance systems. Journal of Real-Time Image Processing 21(2), 133–147 (2024). doi:10.1007/s11554-023-01345-6
Pineau, J., et al.: Improving reproducibility in machine learning research. Journal of Machine Learning Research 22(164), 1–20 (2021). [No DOI available]
Singh, R., et al.: Pruning strategies for real-time video analytics on embedded devices. IEEE Internet of Things Journal (2025). doi:10.1109/JIOT.2025.1236543
Stodden, V., Seiler, J., Ma, Z.: An empirical analysis of journal policy effectiveness for computational reproducibility. Proceedings of the National Academy of Sciences 115(11), 2584–2589 (2018). doi:10.1073/pnas.1708290115
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6479–6488 (2018). doi:10.1109/CVPR.2018.00677
Sun, C., et al.: Self-supervised representation learning for video anomaly detection. IEEE Transactions on Neural Networks and Learning Systems 33(12), 7486–7499 (2022). doi:10.1109/TNNLS.2021.3106789
Szymanowicz, S., et al.: Real-world evaluation of video anomaly detection under domain shift. In: Proceedings of the British Machine Vision Conference (BMVC) (2021). [No DOI available]
Tian, Y., et al.: Weakly supervised video anomaly detection with contrastive multiple instance learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6103–6111 (2022). doi:10.1609/aaai.v36i2.20028
Wang, J., et al.: Knowledge distillation for efficient video anomaly detection. Pattern Recognition Letters 158, 30–37 (2022). doi:10.1016/j.patrec.2022.04.015
Wu, J., et al.: Context-aware multiple instance learning for video anomaly detection. IEEE Transactions on Image Processing 32, 2345–2357 (2023). doi:10.1109/TIP.2023.3234567
Zhang, L., et al.: Benchmarking cross-domain robustness in surveillance video analytics. Neurocomputing 569, 127056 (2024). doi:10.1016/j.neucom.2023.127056
Zhao, H., et al.: Transformers in anomaly detection: A comprehensive review. ACM Transactions on Intelligent Systems and Technology 14(3), 1–25 (2023). doi:10.1145/3571234
Zheng, Y., et al.: Domain adaptation for video anomaly detection: A systematic review. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025). doi:10.1109/TPAMI.2025.1239874
Zhou, X., et al.: Generative adversarial learning for weakly supervised anomaly detection in surveillance videos. Pattern Recognition 134, 109043 (2023). doi:10.1016/j.patcog.2022.109043
DOI:
https://doi.org/10.31449/inf.v49i36.12037Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







