MGC-SIFT: A Multimodal Graph-Based Color SIFT Descriptor for Content-Based Image Retrieval

Abstract

Content-Based Image Retrieval (CBIR) systems critically depend on discriminative yet efficient feature representations to retrieve relevant images from large-scale databases. However, many existing handcrafted and graph-based methods face limitations in scalability and in jointly modeling multimodal information such as color, texture, and spatial relationships. To address these challenges, this paper proposes a novel feature extraction framework termed Multi-modal Graph Color SIFT (MGC-SIFT). In the proposed approach, color-augmented SIFT descriptors extracted in the YCbCr color space are organized as a graph of local keypoints, over which Graph Neural Networks (GNNs) are applied to model inter-keypoint spatial relationships. An attention mechanism is incorporated to emphasize discriminative keypoint regions, while proxy-based learning is employed to improve representation compactness and retrieval efficiency.The effectiveness of MGC-SIFT is evaluated on four benchmark datasets—Corel-1K, COIL-20, Oxford-102 Flowers, and UC-Merced Land Use—covering natural scenes, controlled object images, fine-grained categories, and aerial imagery. Experimental evaluation using standard CBIR metrics, including mean Average Precision (mAP), Precision@k, Recall@k, F1-score@k, and Accuracy@k, demonstrates that the proposed method achieves consistent and competitive retrieval performance across heterogeneous datasets, including robustness under image degradation conditions. Ablation studies further confirm the complementary contributions of color augmentation, graph-based modeling, attention mechanisms, and proxy-based learning. In addition, runtime and memory analysis indicate that proxy-based learning significantly reduces retrieval latency, supporting scalable image retrieval.Overall, the proposed MGC-SIFT framework provides a robust and interpretable multimodal representation for CBIR by explicitly modeling joint color–spatial dependencies at the local keypoint level, offering a practical solution for scalable image retrieval in real-world applications.

References

[1] S. Sikandar, A. Alsalman, and R. Mahum, “A Novel Hybrid Approach for a Content-Based Image Retrieval Using Feature Fusion,” Applied Sciences, vol. 13, no. 7, p. 4581, Apr. 2023, doi: 10.3390/app13074581.

[2] J. Kim and B. C. Ko, “Scene Graph and Natural Language-Based Semantic Image Retrieval Using Vision Sensor Data,” Sensors, vol. 25, no. 11, p. 3252, May 2025, doi: 10.3390/s25113252.

[3] A. W. M. Smeulders, S. Santini, M. Worring, R. Jain, and A. Gupta, “Content-Based Image Retrieval at the End of the Early Years,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349–1380, Jan. 2000, doi: 10.1109/34.895972.

[4] A. Humeau-Heurtier, “Texture Feature Extraction Methods: A Survey,” IEEE Access, vol. 7, pp. 8975–9000, 2019, doi: 10.1109/ACCESS.2018.2890743.

[5] N. Alpaslan and K. Hanbay, “Multi-Scale Shape Index-Based Local Binary Patterns for Texture Classification,” IEEE Signal Processing Letters, vol. 27, pp. 660–664, 2020, doi: 10.1109/LSP.2020.2987474.

[6] F. Mirzapour and H. Ghassemian, “Improving Hyperspectral Image Classification by Combining Spectral, Texture, and Shape Features,” International Journal of Remote Sensing, vol. 36, no. 4, pp. 1070–1096, 2015, doi: 10.1080/01431161.2015.1007251.

[7] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004, doi: 10.1023/B: VISI.0000029664.99615.94.

[8] J. R. R. van de Sande, T. Gevers, and C. G. M. Snoek, “Evaluating Color Descriptors for Object and Scene Recognition, ”IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 32, no. 9, pp. 1582–1596, 2010. DOI: 10.1109/TPAMI.2009.154

[9] X. Zhang, M. Jiang, Z. Zheng, X. Tan, E. Ding, and Y. Yang, “Understanding Image Retrieval Re-Ranking: A Graph Neural Network Perspective,” arXiv:2012.07620, 2020, doi: 10.48550/arXiv.2012.07620.

[10] H. Lacheheb and S. Aouat, “SIMIR: New Mean SIFT Color Multi-Clustering Image Retrieval,” Multimedia Tools and Applications, vol. 76, no. 5, pp. 6333–6354, 2016, doi: 10.1007/s11042-015-3167-3.

[11] D. Kobak and P. Berens, “The Art of Using t-SNE for Single-Cell Transcriptomics,” Nature Communications, vol. 10, p. 5416, 2019, doi: 10.1038/s41467-019-13056-x.

[12] X. Jia, A. Kale, V. Kumar, Z. Lin, and H. Zhao, “Personalized Image Retrieval with Sparse Graph Representation Learning,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 19, no. 4, pp. 2735–2743, 2020, doi: 10.1145/3394486.3403324.

[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017, doi: 10.1145/3065386.

[14] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv:1409.1556, 2014, doi: 10.48550/arXiv.1409.1556.

[15] X. Li, S. Wei, M. Ge, J. Wang, and Y. Du, “Adaptive Multi-Proxy for Remote Sensing Image Retrieval,” Remote Sensing, vol. 14, no. 21, p. 5615, 2022, doi: 10.3390/rs14215615.

[16] A. Hermans, L. Beyer, and B. Leibe, “In Defense of the Triplet Loss for Person Re-Identification,” arXiv:1703.07737, 2017, doi: 10.48550/arXiv.1703.07737.

[17] S. Kim, M. Cho, S. Kwak, and D. Kim, “Proxy Anchor Loss for Deep Metric Learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, doi: 10.1109/CVPR42600.2020.00330.

[18] Y. Movshovitz-Attias, S. Singh, A. Toshev, T. K. Leung, and S. Ioffe, “No Fuss Distance Metric Learning Using Proxies,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 360–368, doi: 10.1109/ICCV.2017.47.

[19] M. M. Adnan et al., “Image Annotation with YCbCr Color Features Based on Multiple Deep CNN-GLP,” IEEE Access, vol. 12, pp. 11340–11353, 2024, doi: 10.1109/ACCESS.2023.3330765.

[20] H. Yu et al., “Text–Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 812–824, 2023, doi: 10.1109/JSTARS.2022.3231851.

[21] Y. Zhang, X. Zheng, and X. Lu, “Remote Sensing Image Retrieval by Deep Attention Hashing with Distance-Adaptive Ranking,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 4301–4311, 2023, doi: 10.1109/JSTARS.2023.3271303.

[22] D. Zhao, S. Xiong, and Y. Chen, “Multiscale Context Deep Hashing for Remote Sensing Image Retrieval,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 7163–7172, 2023, doi: 10.1109/JSTARS.2023.3298990.

[23] Z. Cai, Y. Pan, and W. Jin, “Proxy-Based Rotation Invariant Deep Metric Learning for Remote Sensing Image Retrieval,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, pp. 7759–7772, 2024, doi: 10.1109/JSTARS.2024.3382845.

Authors

  • Trupti Babasaheb Ghatage Department of Technology, Shivaji University, Kolhapur, Maharashtra, India
  • Dattatraya Vishnu Kodavade DKTE Society’s Textile and Engineering Institute, Ichalkaranji, Maharashtra, India

DOI:

https://doi.org/10.31449/inf.v50i1.10558

Downloads

Published

04/13/2026

How to Cite

Ghatage, T. B., & Kodavade, D. V. (2026). MGC-SIFT: A Multimodal Graph-Based Color SIFT Descriptor for Content-Based Image Retrieval. Informatica, 50(1). https://doi.org/10.31449/inf.v50i1.10558

Issue

Section

Regular papers