Vision Transformer-Based Framework for AI-Generated Image Detection in Interior Design
Abstract
Increasingly, images generated by artificial intelligence (AI) are being used within interior design as a source of authenticity and ethical use. Based on limited Convolutional Neural Network (CNN) capabilities in data descriptive processes, including long-range dependencies and global patterns, this study examines how Vision Transformer (ViT) can be utilized in detecting AI-generated interior design images. We finetuned and evaluated four ViT models, ViT-B16, ViT-B32, ViT-L16, and ViT-L32, on 1,000 samples per class dataset. Accuracy, precision, recall, F1-score, and computational efficiency were used to assess performance. Results show that models with smaller patch sizes (i.e., 16×16) perform better than larger ones (i.e., 32×32). It was found that ViT-B16 and ViT-L16 had the highest accuracy (96.25%) and F1- score (0.9625) in identifying minor inconsistencies in the AI-generated images. ViT-B32 and ViT-L32 enjoy better computational efficiency based on lower classification performance (80.00% and 81.25% accuracy, respectively, for ViT-B32 and ViT-L32). The best tradeoff between accuracy and resource efficiency turns out to be ViT-B16. However, computational costs were higher with ViT — ViT-L16, although just as accurate. Computationally, ViT-B32 and ViT-L32 were also efficient, which was more appropriate for realtime applications with lower accuracy than speed. Through this work, we contribute a domain-specific deep learning framework for AI-generated image detection in interior design to increase authenticity verification. Future work will address improving computational efficiency and generalizing the model across all (or most) generative models and design styles.References
J. MARTIN DIEZ DE OÑATE, "Industrial Design and AI: how generative artificial intelligence can help the designer in the early stages of a project," 2022.
S. K. Alhabeeb and A. A. Al-Shargabi, "Text-to-Image Synthesis With Generative Models: Methods, Datasets, Performance Metrics, Challenges, and Future Direction," IEEE Access, 2024.
A. Spielberg et al., "Differentiable visual computing for inverse problems and machine learning," Nature Machine Intelligence, vol. 5, no. 11, pp. 1189-1199, 2023.
Y. LeCun and Y. Bengio, "Convolutional networks for images, speech, and time series," The handbook of brain theory and neural networks, vol. 3361, no. 10, p. 1995, 1995.
A. Dosovitskiy, "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020.
J. Hutson, J. Lively, B. Robertson, P. Cotroneo, and M. Lang, Creative Convergence: The AI Renaissance in Art and Design. Springer Nature, 2023.
W. Vivaldi and I. Sutedja, "Using Deep Learning and Cbir To Address Copyright Concerns of AI-Generated Art: A Systematic Literature Review," Devotion: Journal of Research and Community Service, vol. 5, no. 10, pp. 1320-1330, 2024.
H. Jo, J.-K. Lee, Y.-C. Lee, and S. Choo, "Generative artificial intelligence and building design: early photorealistic render visualization of façades using local identity-trained models," Journal of Computational Design and Engineering, vol. 11, no. 2, pp. 85-105, 2024.
B. Çeken and B. Akgöz, "THE IMPACT OF ARTIFICIAL INTELLIGENCE ON DESIGN: THE EXAMPLE OF DALL-E," Sanat ve Tasarım Dergisi, vol. 14, no. 1, pp. 374-397, 2024.
A. Jaruga-Rozdolska, "Artificial intelligence as part of future practices in the architect's work: MidJourney generative tool as part of a process of creating an architectural form," Architectus, no. 3 (71, pp. 95-104, 2022.
L. Zhang, A. Rao, and M. Agrawala, "Adding conditional control to text-to-image diffusion models," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3836-3847.
P. W. Tien, S. Wei, J. Darkwa, C. Wood, and J. K. Calautit, "Machine learning and deep learning methods for enhancing building energy efficiency and indoor environmental quality–a review," Energy and AI, vol. 10, p. 100198, 2022.
V. Nain, H. S. Shyam, N. Kumar, P. Tripathi, and M. Rai, "A Study on Object Detection Using Artificial Intelligence and Image Processing–Based Methods," Mathematical Models Using Artificial Intelligence for Surveillance Systems, pp. 121-148, 2024.
S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, "CNN-generated images are surprisingly easy to spot... for now," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8695-8704.
D. Gragnaniello, D. Cozzolino, F. Marra, G. Poggi, and L. Verdoliva, "Are GAN generated images easy to detect? A critical analysis of the state-of-the-art," in 2021 IEEE international conference on multimedia and expo (ICME), 2021: IEEE, pp. 1-6.
S. Niu, B. Li, X. Wang, and H. Lin, "Defect image sample generation with GAN for improving defect recognition," IEEE Transactions on Automation Science and Engineering, vol. 17, no. 3, pp. 1611-1622, 2020.
S. Paul and P.-Y. Chen, "Vision transformers are robust learners," in Proceedings of the AAAI conference on Artificial Intelligence, 2022, vol. 36, no. 2, pp. 2071-2081.
S. Jagatheesaperumal, S. Gaftandzhieva, and R. Doneva, "An Overview of Vision Transformers for Image Processing: A Survey."
S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, "Transformers in vision: A survey," ACM computing surveys (CSUR), vol. 54, no. 10s, pp. 1-41, 2022.
M. A. Arshed, A. Alwadain, R. Faizan Ali, S. Mumtaz, M. Ibrahim, and A. Muneer, "Unmasking Deception: Empowering Deepfake Detection with Vision Transformer Network," Mathematics, vol. 11, no. 17, p. 3710, 2023.
A. S. Paladugu, A. Deodeshmukh, A. R. Shekatkar, I. Kandasamy, and V. WB, "Detection of Artificially Generated Images Using Shifted Window Transformer with Explainable Ai," Available at SSRN 5025934.
E. Essa, "Feature fusion Vision Transformers using MLP-Mixer for enhanced deepfake detection," Neurocomputing, vol. 598, p. 128128, 2024.
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, "Training data-efficient image transformers & distillation through attention," in International conference on machine learning, 2021: PMLR, pp. 10347-10357.
J. Maurício, I. Domingues, and J. Bernardino, "Comparing vision transformers and convolutional neural networks for image classification: A literature review," Applied Sciences, vol. 13, no. 9, p. 5521, 2023.
K. Han et al., "A survey on vision transformer," IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 1, pp. 87-110, 2022.
J. Wang et al., "Objectformer for image manipulation detection and localization," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2364-2373.
H. Fan et al., "Multiscale vision transformers," in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 6824-6835.
C. Zhao, C. Wang, G. Hu, H. Chen, C. Liu, and J. Tang, "ISTVT: interpretable spatial-temporal video transformer for deepfake detection," IEEE Transactions on Information Forensics and Security, vol. 18, pp. 1335-1348, 2023.
J. M. Johnson and T. M. Khoshgoftaar, "Survey on deep learning with class imbalance," Journal of big data, vol. 6, no. 1, pp. 1-54, 2019.
R. Sauber-Cole and T. M. Khoshgoftaar, "The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey," Journal of Big Data, vol. 9, no. 1, p. 98, 2022.
C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning," Journal of big data, vol. 6, no. 1, pp. 1-48, 2019.
A. J. S. Kumar et al., "Evaluation of generative adversarial networks for high-resolution synthetic image generation of circumpapillary optical coherence tomography images for glaucoma," JAMA ophthalmology, vol. 140, no. 10, pp. 974-981, 2022.
DOI:
https://doi.org/10.31449/inf.v49i16.7979Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







