Lightweight CNN–MIL Models for Cross-Domain Video Anomaly Detection: A Reproducible Evaluation Framework

Abstract

Video anomaly detection (VAD) is increasingly deployed in large-scale CCTV networks, yet most existing approaches are evaluated only in single-domain settings, limiting their reliability in real-world deployment. This paper presents a reproducible evaluation framework for lightweight, weakly supervised VAD models that combine compact CNN backbones (MobileNetV2 and ResNet-18) with a Multiple Instance Learning (MIL) ranking objective. Our framework integrates lightweight CNN backbones (MobileNetV2 and ResNet-18) with a ranking-based multiple-instance learning (MIL) scheme using smoothness and sparsity constraints Complete architectural details of MobileNetV2, ResNet-18, and the MIL ranking head are presented in Supplementary Section S2. Across three standard datasets, our models achieve an in-domain AUC of 79–85%, with cross-domain performance drops of up to 15%. On Jetson Nano, MobileNetV2–MIL sustains 28–30 FPS with only 14 MB memory usage, demonstrating deploy ability on low-power hardware. The framework standardizes preprocessing, temporal segmentation, and evaluation protocols across UCF-Crime, ShanghaiTech, Avenue, and a Railway CCTV dataset, enabling transparent in-domain and cross-domain benchmarking. Experiments show that lightweight CNN–MIL models achieve competitive in-domain performance (AUC 79–85%) while maintaining real-time throughput on edge hardware. Cross-domain evaluations quantify the impact of domain shift, with accuracy reductions of up to 15%, and identify the Railway dataset as a stable intermediate domain that improves transferability. Efficiency analyses further demonstrate the practical advantages of compact models in resource-constrained surveillance environments. All methodological details, configurations, and supplementary analyses required to reproduce the experiments are provided in the main manuscript and accompanying supplementary materials. Exact training hyperparameters used across all experiments are listed in Supplementary Section S3

References

Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: A survey. ACM Computing Surveys 51(5), 1–36 (2019). doi:10.1145/3241734

Chen, Z., et al.: Vision-language models for weakly supervised video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1234–1243 (2024). doi:10.1109/CVPR59012.2024.01825

Chen, Y., et al.: Domain generalization in video anomaly detection: A survey. Pattern Recognition 145, 109880 (2025). doi:10.1016/j.patcog.2023.109880

Chong, Y.S., Tay, Y.H.: Abnormal event detection in videos using spatiotemporal autoencoder. In: International Symposium on Neural Networks (ISNN), pp. 189–196 (2017). doi:10.1007/978-3-319-59081-3_23

Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2021). doi:10.48550/arXiv.2010.11929

Feng, S., et al.: Large vision–language models for anomaly detection: Progress and challenges. IEEE Transactions on Multimedia (2025). doi:10.1109/TMM.2025.1234567

Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 733–742 (2016). doi:10.1109/CVPR.2016.87

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). doi:10.1109/CVPR.2016.90

Howard, A., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017). doi:10.48550/arXiv.1704.04861

Kumar, A., et al.: Diffusion models for anomaly detection: A new frontier in unsupervised surveillance. IEEE Transactions on Image Processing 34(6), 2451–2463 (2025). doi:10.1109/TIP.2025.1239876

Lee, H., et al.: Real-time video analytics on edge devices: A study of lightweight CNNs. Future Generation Computer Systems 146, 209–220 (2023). doi:10.1016/j.future.2023.05.001

Li, K., et al.: Attention-guided MIL for weakly supervised video anomaly detection. Neurocomputing 545, 125–137 (2023). doi:10.1016/j.neucom.2023.02.015

Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in fixed-camera videos. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2720–2727 (2013). doi:10.1109/ICCV.2013.338

Luo, W., Liu, W., Gao, S.: Remembering history with convolutional LSTMs for anomaly detection. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 439–444 (2017). doi:10.1109/ICME.2017.8019317

Park, J., et al.: Lightweight neural architectures for embedded surveillance systems. Journal of Real-Time Image Processing 21(2), 133–147 (2024). doi:10.1007/s11554-023-01345-6

Pineau, J., et al.: Improving reproducibility in machine learning research. Journal of Machine Learning Research 22(164), 1–20 (2021). [No DOI available]

Singh, R., et al.: Pruning strategies for real-time video analytics on embedded devices. IEEE Internet of Things Journal (2025). doi:10.1109/JIOT.2025.1236543

Stodden, V., Seiler, J., Ma, Z.: An empirical analysis of journal policy effectiveness for computational reproducibility. Proceedings of the National Academy of Sciences 115(11), 2584–2589 (2018). doi:10.1073/pnas.1708290115

Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6479–6488 (2018). doi:10.1109/CVPR.2018.00677

Sun, C., et al.: Self-supervised representation learning for video anomaly detection. IEEE Transactions on Neural Networks and Learning Systems 33(12), 7486–7499 (2022). doi:10.1109/TNNLS.2021.3106789

Szymanowicz, S., et al.: Real-world evaluation of video anomaly detection under domain shift. In: Proceedings of the British Machine Vision Conference (BMVC) (2021). [No DOI available]

Tian, Y., et al.: Weakly supervised video anomaly detection with contrastive multiple instance learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6103–6111 (2022). doi:10.1609/aaai.v36i2.20028

Wang, J., et al.: Knowledge distillation for efficient video anomaly detection. Pattern Recognition Letters 158, 30–37 (2022). doi:10.1016/j.patrec.2022.04.015

Wu, J., et al.: Context-aware multiple instance learning for video anomaly detection. IEEE Transactions on Image Processing 32, 2345–2357 (2023). doi:10.1109/TIP.2023.3234567

Zhang, L., et al.: Benchmarking cross-domain robustness in surveillance video analytics. Neurocomputing 569, 127056 (2024). doi:10.1016/j.neucom.2023.127056

Zhao, H., et al.: Transformers in anomaly detection: A comprehensive review. ACM Transactions on Intelligent Systems and Technology 14(3), 1–25 (2023). doi:10.1145/3571234

Zheng, Y., et al.: Domain adaptation for video anomaly detection: A systematic review. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025). doi:10.1109/TPAMI.2025.1239874

Zhou, X., et al.: Generative adversarial learning for weakly supervised anomaly detection in surveillance videos. Pattern Recognition 134, 109043 (2023). doi:10.1016/j.patcog.2022.109043

Authors

  • Rajat Gupta Shobhit Institute of Engineering and Technology :Shobhit University Bharati Vidyapeeth's College of Engineering, New Delhi
  • Nidhi Tyagi Shobhit Institute of Engineering and Technology :Shobhit University

DOI:

https://doi.org/10.31449/inf.v49i36.12037

Downloads

Published

12/20/2025

How to Cite

Gupta, R., & Tyagi, N. (2025). Lightweight CNN–MIL Models for Cross-Domain Video Anomaly Detection: A Reproducible Evaluation Framework. Informatica, 49(36). https://doi.org/10.31449/inf.v49i36.12037