Multi-Level CNN Feature Fusion from ResNet50 for Near-Duplicate Image Detection in Real Estate Imagery

Abstract

The volume of images uploaded to the internet is increasing at an unprecedented rate, making image deduplication, through accurate near-duplicate detection, a critical task in computer vision. However, comparing images for similarity remains challenging due to complex visual structures and subtle appearance variations.We propose a novel embedding method for image similarity detection. It constructs an enriched representation by concatenating outputs from multiple intermediate layers of a pre-trained ResNet50 convolutional neural network and trains a lightweight decision network on top to classify image pairs. Unlike aggregation approaches that average or sum intermediate features, our method preserves both low-level and high-level information in a single descriptor and maintains feature diversity. The multi-level embedding is further normalized to balance feature contributions and is evaluated against classical keypoint descriptors, a DCT-based perceptual hash, and a standard single-layer ResNet50 embedding.We evaluate this method on three real-world image deduplication tasks derived from real estate listings, covering (a) near-identical property photos with graphical overlays, (b) interior room photographs captured from different angles, and (c) schematic floor plan images. The proposed embedding achieves F1-scores of 0.96, 0.87, and 0.77, representing a 10-15% absolute improvement over baseline methods, including classical feature descriptors and standard ResNet50 final-layer embeddings.This approach has been successfully deployed in production on a large-scale real estate platform, reducing duplicate images and improving search quality. The results demonstrate that multi-layer CNN embeddings with explicit feature preservation offer a robust and scalable solution for near-duplicate image detection in structured domains such as real estate photography and schematic floor plans.

Author Biography

Taras Panchenko, Taras Shevchenko National University of Kyiv

Taras Panchenko, PhD (candidate of physical and mathematical sciences), Head of the Department of Theory and Technology of Programming at the Faculty of Computer Sciences and Cybernetics at Taras Shevchenko National University of Kyiv, ACM Europe committees leader and member, Ukrainian ACM Chapter Chair and leader, Hackathon Expert leader, innovator, mentor, facilitator and hackathon organiser. Scientific interests include distributed and high-load systems, low-level programming, system programming, data analysis, artificial intelligence, and cybersecurity.

References

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. Zero-shot text-to-image generation. In International conference on machine learning 2021, pp. 8821-8831.

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. Hierarchical text-conditional image generation with clip latents 2022, 1(2), 3.

https://doi.org/10.48550/arXiv.2204.06125

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition 2009, pp. 248-255.

https://doi.org/10.1109/CVPR.2009.5206848

Thyagharajan, K. K., & Kalaiarasi, G. A review on near-duplicate detection of images using computer vision techniques. Archives of Computational Methods in Engineering 2021, 28(3), pp. 897-916.

https://doi.org/10.1007/s11831-020-09422-6

Kaur, G., & Devgan, M. S. Data deduplication methods: a review. International Journal of Information Technology and Computer Science 2017, 10, pp. 29-36.

https://doi.org/10.5815/ijitcs.2017.10.03

Islam, S. M., & Debnath, R. A comparative evaluation of feature extraction and similarity measurement methods for content-based image retrieval. International Journal of Image, Graphics and Signal Processing 2020, 10(6), 19.

https://doi.org/10.5815/ijigsp.2020.06.03

Bajaj, E. N., Gill, E. J. S., & Kumar, R. An approach for similarity matching and comparison in content based image retrieval system. IJ Inf. Eng. Electron. Bus. 2015, pp. 48-54.

https://doi.org/10.5815/ijieeb.2015.05.07

Lowe, D. G. Distinctive image features from scale-invariant keypoints. International journal of computer vision 2004, 60, pp. 91-110.

https://doi.org/10.1023/B:VISI.0000029664.99615.94

Chum, O., Philbin, J., Isard, M., & Zisserman, A. Scalable near identical image and shot detection. In Proceedings of the 6th ACM international conference on Image and video retrieval 2007, pp. 549-556.

https://doi.org/10.1145/1282280.1282359

Bay, H., Tuytelaars, T., & Van Gool, L. Surf: Speeded up robust features. In Computer Vision-ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9 2006, pp. 404-417.

https://doi.org/10.1007/11744023_32

Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. ORB: An efficient alternative to SIFT or SURF. In 2011 International conference on computer vision 2011, pp. 2564-2571.

https://doi.org/10.1109/ICCV.2011.6126544

Lei, Y., Zheng, L., & Huang, J. Geometric invariant features in the Radon transform domain for near-duplicate image detection. Pattern recognition 2014, 47(11), 3630-3640.

https://doi.org/10.1016/j.patcog.2014.05.009

Tang, Z., Yang, F., Huang, L., & Zhang, X. Robust image hashing with dominant DCT coefficients. Optik 2014, 125(18), pp. 5102-5107.

https://doi.org/10.1016/j.ijleo.2014.04.079

Jie, Z. A novel block-DCT and PCA based image perceptual hashing algorithm. arXiv preprint arXiv:1306.4079 2013.

https://doi.org/10.48550/arXiv.1306.4079

Ke, Y., Sukthankar, R., Huston, L., Ke, Y., & Sukthankar, R. Efficient near-duplicate detection and sub-image retrieval. In ACM multimedia 2004, 4, 5.

Ke, Y., Sukthankar, R., & Huston, L. An efficient parts-based near-duplicate and sub-image retrieval system. In Proceedings of the 12th annual ACM international conference on Multimedia 2004, pp. 869-876.

https://doi.org/10.1145/1027527.1027729

Nian, F., Li, T., Wu, X., Gao, Q., & Li, F. Efficient near-duplicate image detection with a local-based binary representation. Multimedia Tools and Applications 2016, 75, 2435-2452.

https://doi.org/10.1007/s11042-015-2472-1

Zhao, W. L., Ngo, C. W., Tan, H. K., & Wu, X. Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Transactions on Multimedia 2007, 9(5), 1037-1048.

https://doi.org/10.1109/TMM.2007.898928

Zhao, W. L., & Ngo, C. W. Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection. IEEE Transactions on Image Processing 2009, 18(2), 412-423.

https://doi.org/10.1109/TIP.2008.2008900

Li, Y. A fast algorithm for near-duplicate image detection. In 2021 IEEE International Conference on Artificial Intelligence and Industrial Design (AIID) 2021, pp. 360-363).

Chum, O., Philbin, J., & Zisserman, A. Near duplicate image detection: Min-hash and TF-IDF weighting. In Bmvc 2008, 810, pp. 812-815.

https://doi.org/10.5244/C.22.50

Liu, L., Lu, Y., & Suen, C. Y. Variable-length signature for near-duplicate image matching. IEEE Transactions on Image Processing 2015, 24(4), pp. 1282-1296.

https://doi.org/10.1109/TIP.2015.2396208

Xie, L., Tian, Q., Zhou, W., & Zhang, B. Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb. Computer Vision and Image Understanding 2014, 124, 31-41.

https://doi.org/10.1016/j.cviu.2013.12.011

Zhou, Z., Lin, K., Cao, Y., Yang, C. N., & Liu, Y. Near-duplicate image detection system using coarse-to-fine matching scheme based on global and local CNN features. Mathematics 2020, 8(4), 644.

https://doi.org/10.3390/math8040644

Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. Near-duplicate video retrieval by aggregating intermediate cnn layers. In MultiMedia Modeling: 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part I 23 2017, pp. 251-263.

https://doi.org/10.1007/978-3-319-51811-4_21

Zhang, Y., Zhang, S., Li, Y., & Zhang, Y. Single-and cross-modality near duplicate image pairs detection via spatial transformer comparing CNN. Sensors 2021, 21(1), 255.

https://doi.org/10.3390/s21010255

Barz, B., & Denzler, J. Do we train on test data? purging cifar of near-duplicates. Journal of Imaging 2020, 6(6), 41. https://doi.org/10.3390/jimaging6060041

Matatov, H., Naaman, M., & Amir, O. Dataset and case studies for visual near-duplicates detection in the context of social media 2022.

https://doi.org/10.48550/arXiv.2203.07167

Tralic, D., Zupancic, I., Grgic, S., & Grgic, M. CoMoFoD—New database for copy-move forgery detection. In Proceedings ELMAR-2013 2013, pp. 49-54.

Morra, L., & Lamberti, F. Benchmarking unsupervised near-duplicate image detection. Expert Systems with Applications 2019, 135, pp. 313-326.

https://doi.org/10.1016/j.eswa.2019.05.002

Barz, B., & Denzler, J. Hierarchy-based image embeddings for semantic image retrieval. In 2019 IEEE winter conference on applications of computer vision (WACV) 2019, pp. 638-647.

https://doi.org/10.1109/WACV.2019.00073

Berman, M., Jégou, H., Vedaldi, A., Kokkinos, I., & Douze, M. Multigrain: a unified image embedding for classes and instances. arXiv preprint arXiv:1902.05509 2019.

https://doi.org/10.48550/arXiv.1902.05509

Yu, Z., Zheng, J., Lian, D., Zhou, Z., & Gao, S. Single-image piece-wise planar 3d reconstruction via associative embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, pp. 1029-1037.

https://doi.org/10.1109/CVPR.2019.00112

Rau, A., Garcia-Hernando, G., Stoyanov, D., Brostow, G. J., & Turmukhambetov, D. Predicting visual overlap of images through interpretable non-metric box embeddings. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V 16 2020, pp. 629-646.

https://doi.org/10.1007/978-3-030-58558-7_37

Feng, G., Hu, Z., Zhang, L., & Lu, H. Encoder fusion network with co-attention embedding for referring image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2021, pp. 15506-15515.

https://doi.org/10.1109/CVPR46437.2021.01525

Asadi-Aghbolaghi, M., Azad, R., Fathy, M., & Escalera, S. Multi-level context gating of embedded collective knowledge for medical image segmentation 2020.

https://doi.org/10.48550/arXiv.2003.05056

He, K., Zhang, X., Ren, S., & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition 2016, pp. 770-778.

https://doi.org/10.1109/CVPR.2016.90

Lytvynenko, T. I., Panchenko, T. V., & Redko, V. D. Sales forecasting using data mining methods. Вісник Київського національного університету імені Тараса Шевченка. Серія: Фізико-математичні науки 2015, 4, 148-155.

Bieda, I., & Panchenko, T. A systematic mapping study on artificial intelligence tools used in video editing. International Journal of Computer Science & Network Security, 2022. 22(3), 312-318.

https://doi.org/10.22937/IJCSNS.2022.22.3.40

Bieda, I., Kisil, A., & Panchenko, T. An approach to scene change detection. In 2021 11th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) 2021, 1, pp. 489-493.

https://doi.org/10.1109/IDAACS53288.2021.9660887

Panchenko, T., & Bieda, I. A Comparison of scene change localization methods over the open video scene detection dataset. International Journal of Computer Science & Network Security 2022, 22(6), 1-6.

Kubytskyi, V., & Panchenko, T. An Effective Approach to Image Embeddings for E-Commerce. In IT&I 2022, pp. 341-349.

Fawzy, M., Tawfik, N. S., & Saleh, S. N. Enhancing Image Copy Detection through Dynamic Augmentation and Efficient Sampling with Minimal Data. Electronics 2024, 13(16), 3125.

https://doi.org/10.3390/electronics13163125

Chandrasiri, M. D. N., & Talagala, P. D. Cross-ViT: Cross-attention Vision Transformer for Image Duplicate Detection. In 2023 8th International Conference on Information Technology Research (ICITR) 2023, pp. 1-6.

https://doi.org/10.1109/ICITR61062.2023.10382916

Qin, Y., Ye, O., & Fu, Y. An automatic near-duplicate video data cleaning method based on a consistent feature hash ring. Electronics 2024, 13(8), 1522.

https://doi.org/10.3390/electronics13081522

Kubytskyi, V.; Panchenko, T. Enriched Image Embeddings as a Combined Outputs from Different Layers of CNN for Various Image Similarity Problems More Precise Solution. In Advances in Artificial Systems for Logistics Engineering III; Hu, Z., et al. (Eds.); Lecture Notes in Data Engineering and Communications Technologies; Springer: Cham, Switzerland, 2023; 180, pp. 1-13.

https://doi.org/10.1007/978-3-031-36115-9_30

Image Similarity Dataset: 10000 labelled image pairs. Kaggle 2025. Available online:

https://www.kaggle.com/datasets/pantaras/near-duplicate-images/data

Authors

  • Taras Panchenko Taras Shevchenko National University of Kyiv
  • Artem Bozhok LUN.ua
  • Volodymyr Kubytskyi MacPaw

DOI:

https://doi.org/10.31449/inf.v50i9.12111

Downloads

Published

03/12/2026

How to Cite

Panchenko, T., Bozhok, A., & Kubytskyi, V. (2026). Multi-Level CNN Feature Fusion from ResNet50 for Near-Duplicate Image Detection in Real Estate Imagery. Informatica, 50(9). https://doi.org/10.31449/inf.v50i9.12111