Multi-Level CNN Feature Fusion from ResNet50 for Near-Duplicate Image Detection in Real Estate Imagery
Abstract
The volume of images uploaded to the internet is increasing at an unprecedented rate, making image deduplication, through accurate near-duplicate detection, a critical task in computer vision. However, comparing images for similarity remains challenging due to complex visual structures and subtle appearance variations.We propose a novel embedding method for image similarity detection. It constructs an enriched representation by concatenating outputs from multiple intermediate layers of a pre-trained ResNet50 convolutional neural network and trains a lightweight decision network on top to classify image pairs. Unlike aggregation approaches that average or sum intermediate features, our method preserves both low-level and high-level information in a single descriptor and maintains feature diversity. The multi-level embedding is further normalized to balance feature contributions and is evaluated against classical keypoint descriptors, a DCT-based perceptual hash, and a standard single-layer ResNet50 embedding.We evaluate this method on three real-world image deduplication tasks derived from real estate listings, covering (a) near-identical property photos with graphical overlays, (b) interior room photographs captured from different angles, and (c) schematic floor plan images. The proposed embedding achieves F1-scores of 0.96, 0.87, and 0.77, representing a 10-15% absolute improvement over baseline methods, including classical feature descriptors and standard ResNet50 final-layer embeddings.This approach has been successfully deployed in production on a large-scale real estate platform, reducing duplicate images and improving search quality. The results demonstrate that multi-layer CNN embeddings with explicit feature preservation offer a robust and scalable solution for near-duplicate image detection in structured domains such as real estate photography and schematic floor plans.References
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. Zero-shot text-to-image generation. In International conference on machine learning 2021, pp. 8821-8831.
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. Hierarchical text-conditional image generation with clip latents 2022, 1(2), 3.
https://doi.org/10.48550/arXiv.2204.06125
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition 2009, pp. 248-255.
https://doi.org/10.1109/CVPR.2009.5206848
Thyagharajan, K. K., & Kalaiarasi, G. A review on near-duplicate detection of images using computer vision techniques. Archives of Computational Methods in Engineering 2021, 28(3), pp. 897-916.
https://doi.org/10.1007/s11831-020-09422-6
Kaur, G., & Devgan, M. S. Data deduplication methods: a review. International Journal of Information Technology and Computer Science 2017, 10, pp. 29-36.
https://doi.org/10.5815/ijitcs.2017.10.03
Islam, S. M., & Debnath, R. A comparative evaluation of feature extraction and similarity measurement methods for content-based image retrieval. International Journal of Image, Graphics and Signal Processing 2020, 10(6), 19.
https://doi.org/10.5815/ijigsp.2020.06.03
Bajaj, E. N., Gill, E. J. S., & Kumar, R. An approach for similarity matching and comparison in content based image retrieval system. IJ Inf. Eng. Electron. Bus. 2015, pp. 48-54.
https://doi.org/10.5815/ijieeb.2015.05.07
Lowe, D. G. Distinctive image features from scale-invariant keypoints. International journal of computer vision 2004, 60, pp. 91-110.
https://doi.org/10.1023/B:VISI.0000029664.99615.94
Chum, O., Philbin, J., Isard, M., & Zisserman, A. Scalable near identical image and shot detection. In Proceedings of the 6th ACM international conference on Image and video retrieval 2007, pp. 549-556.
https://doi.org/10.1145/1282280.1282359
Bay, H., Tuytelaars, T., & Van Gool, L. Surf: Speeded up robust features. In Computer Vision-ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9 2006, pp. 404-417.
https://doi.org/10.1007/11744023_32
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. ORB: An efficient alternative to SIFT or SURF. In 2011 International conference on computer vision 2011, pp. 2564-2571.
https://doi.org/10.1109/ICCV.2011.6126544
Lei, Y., Zheng, L., & Huang, J. Geometric invariant features in the Radon transform domain for near-duplicate image detection. Pattern recognition 2014, 47(11), 3630-3640.
https://doi.org/10.1016/j.patcog.2014.05.009
Tang, Z., Yang, F., Huang, L., & Zhang, X. Robust image hashing with dominant DCT coefficients. Optik 2014, 125(18), pp. 5102-5107.
https://doi.org/10.1016/j.ijleo.2014.04.079
Jie, Z. A novel block-DCT and PCA based image perceptual hashing algorithm. arXiv preprint arXiv:1306.4079 2013.
https://doi.org/10.48550/arXiv.1306.4079
Ke, Y., Sukthankar, R., Huston, L., Ke, Y., & Sukthankar, R. Efficient near-duplicate detection and sub-image retrieval. In ACM multimedia 2004, 4, 5.
Ke, Y., Sukthankar, R., & Huston, L. An efficient parts-based near-duplicate and sub-image retrieval system. In Proceedings of the 12th annual ACM international conference on Multimedia 2004, pp. 869-876.
https://doi.org/10.1145/1027527.1027729
Nian, F., Li, T., Wu, X., Gao, Q., & Li, F. Efficient near-duplicate image detection with a local-based binary representation. Multimedia Tools and Applications 2016, 75, 2435-2452.
https://doi.org/10.1007/s11042-015-2472-1
Zhao, W. L., Ngo, C. W., Tan, H. K., & Wu, X. Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Transactions on Multimedia 2007, 9(5), 1037-1048.
https://doi.org/10.1109/TMM.2007.898928
Zhao, W. L., & Ngo, C. W. Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection. IEEE Transactions on Image Processing 2009, 18(2), 412-423.
https://doi.org/10.1109/TIP.2008.2008900
Li, Y. A fast algorithm for near-duplicate image detection. In 2021 IEEE International Conference on Artificial Intelligence and Industrial Design (AIID) 2021, pp. 360-363).
Chum, O., Philbin, J., & Zisserman, A. Near duplicate image detection: Min-hash and TF-IDF weighting. In Bmvc 2008, 810, pp. 812-815.
https://doi.org/10.5244/C.22.50
Liu, L., Lu, Y., & Suen, C. Y. Variable-length signature for near-duplicate image matching. IEEE Transactions on Image Processing 2015, 24(4), pp. 1282-1296.
https://doi.org/10.1109/TIP.2015.2396208
Xie, L., Tian, Q., Zhou, W., & Zhang, B. Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb. Computer Vision and Image Understanding 2014, 124, 31-41.
https://doi.org/10.1016/j.cviu.2013.12.011
Zhou, Z., Lin, K., Cao, Y., Yang, C. N., & Liu, Y. Near-duplicate image detection system using coarse-to-fine matching scheme based on global and local CNN features. Mathematics 2020, 8(4), 644.
https://doi.org/10.3390/math8040644
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. Near-duplicate video retrieval by aggregating intermediate cnn layers. In MultiMedia Modeling: 23rd International Conference, MMM 2017, Reykjavik, Iceland, January 4-6, 2017, Proceedings, Part I 23 2017, pp. 251-263.
https://doi.org/10.1007/978-3-319-51811-4_21
Zhang, Y., Zhang, S., Li, Y., & Zhang, Y. Single-and cross-modality near duplicate image pairs detection via spatial transformer comparing CNN. Sensors 2021, 21(1), 255.
https://doi.org/10.3390/s21010255
Barz, B., & Denzler, J. Do we train on test data? purging cifar of near-duplicates. Journal of Imaging 2020, 6(6), 41. https://doi.org/10.3390/jimaging6060041
Matatov, H., Naaman, M., & Amir, O. Dataset and case studies for visual near-duplicates detection in the context of social media 2022.
https://doi.org/10.48550/arXiv.2203.07167
Tralic, D., Zupancic, I., Grgic, S., & Grgic, M. CoMoFoD—New database for copy-move forgery detection. In Proceedings ELMAR-2013 2013, pp. 49-54.
Morra, L., & Lamberti, F. Benchmarking unsupervised near-duplicate image detection. Expert Systems with Applications 2019, 135, pp. 313-326.
https://doi.org/10.1016/j.eswa.2019.05.002
Barz, B., & Denzler, J. Hierarchy-based image embeddings for semantic image retrieval. In 2019 IEEE winter conference on applications of computer vision (WACV) 2019, pp. 638-647.
https://doi.org/10.1109/WACV.2019.00073
Berman, M., Jégou, H., Vedaldi, A., Kokkinos, I., & Douze, M. Multigrain: a unified image embedding for classes and instances. arXiv preprint arXiv:1902.05509 2019.
https://doi.org/10.48550/arXiv.1902.05509
Yu, Z., Zheng, J., Lian, D., Zhou, Z., & Gao, S. Single-image piece-wise planar 3d reconstruction via associative embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, pp. 1029-1037.
https://doi.org/10.1109/CVPR.2019.00112
Rau, A., Garcia-Hernando, G., Stoyanov, D., Brostow, G. J., & Turmukhambetov, D. Predicting visual overlap of images through interpretable non-metric box embeddings. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V 16 2020, pp. 629-646.
https://doi.org/10.1007/978-3-030-58558-7_37
Feng, G., Hu, Z., Zhang, L., & Lu, H. Encoder fusion network with co-attention embedding for referring image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2021, pp. 15506-15515.
https://doi.org/10.1109/CVPR46437.2021.01525
Asadi-Aghbolaghi, M., Azad, R., Fathy, M., & Escalera, S. Multi-level context gating of embedded collective knowledge for medical image segmentation 2020.
https://doi.org/10.48550/arXiv.2003.05056
He, K., Zhang, X., Ren, S., & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition 2016, pp. 770-778.
https://doi.org/10.1109/CVPR.2016.90
Lytvynenko, T. I., Panchenko, T. V., & Redko, V. D. Sales forecasting using data mining methods. Вісник Київського національного університету імені Тараса Шевченка. Серія: Фізико-математичні науки 2015, 4, 148-155.
Bieda, I., & Panchenko, T. A systematic mapping study on artificial intelligence tools used in video editing. International Journal of Computer Science & Network Security, 2022. 22(3), 312-318.
https://doi.org/10.22937/IJCSNS.2022.22.3.40
Bieda, I., Kisil, A., & Panchenko, T. An approach to scene change detection. In 2021 11th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) 2021, 1, pp. 489-493.
https://doi.org/10.1109/IDAACS53288.2021.9660887
Panchenko, T., & Bieda, I. A Comparison of scene change localization methods over the open video scene detection dataset. International Journal of Computer Science & Network Security 2022, 22(6), 1-6.
Kubytskyi, V., & Panchenko, T. An Effective Approach to Image Embeddings for E-Commerce. In IT&I 2022, pp. 341-349.
Fawzy, M., Tawfik, N. S., & Saleh, S. N. Enhancing Image Copy Detection through Dynamic Augmentation and Efficient Sampling with Minimal Data. Electronics 2024, 13(16), 3125.
https://doi.org/10.3390/electronics13163125
Chandrasiri, M. D. N., & Talagala, P. D. Cross-ViT: Cross-attention Vision Transformer for Image Duplicate Detection. In 2023 8th International Conference on Information Technology Research (ICITR) 2023, pp. 1-6.
https://doi.org/10.1109/ICITR61062.2023.10382916
Qin, Y., Ye, O., & Fu, Y. An automatic near-duplicate video data cleaning method based on a consistent feature hash ring. Electronics 2024, 13(8), 1522.
https://doi.org/10.3390/electronics13081522
Kubytskyi, V.; Panchenko, T. Enriched Image Embeddings as a Combined Outputs from Different Layers of CNN for Various Image Similarity Problems More Precise Solution. In Advances in Artificial Systems for Logistics Engineering III; Hu, Z., et al. (Eds.); Lecture Notes in Data Engineering and Communications Technologies; Springer: Cham, Switzerland, 2023; 180, pp. 1-13.
https://doi.org/10.1007/978-3-031-36115-9_30
Image Similarity Dataset: 10000 labelled image pairs. Kaggle 2025. Available online:
https://www.kaggle.com/datasets/pantaras/near-duplicate-images/data
DOI:
https://doi.org/10.31449/inf.v50i9.12111Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







