Stable Diffusion Image Generation System Optimized with Variational Autoencoders and Low-Rank Adaptation

Abstract

Text-to-image generation has quickly evolved with diffusion-based generative models that combine semantic conditioning and latent-space denoising, allowing machines to generate high-quality visuals from natural language prompts. Despite these developments, existing diffusion systems still face challenges in prompt clarification accuracy, model adaptability, and computational efficacy, which limit their performance in real-time and resource-limited settings. The research aims to design and optimize an image generation framework based on Stable Diffusion (SD) that improves prompt processing, improves image quality, and enables lightweight fine-tuning. The system utilizes the LAION-Aesthetics v2 4.5 dataset, which contains high-quality text–image pairs suitable for visual generation tasks. Preprocessing involves text cleaning, tokenization, and semantic structuring, utilizing a transformer-based tokenizer to ensure accurate language-to-visual mapping. The architecture integrates Stable Diffusion, Variational Autoencoder (VAE) for latent-space decoding, and Low-Rank Adaptation (LoRA) for efficient fine-tuning with minimal computational cost. Results show that SD-VAE-LoRA achieved a PSNR of 33.7 dB, SSIM of 93 %, FID of 17.8, Inception Score of 36.02, and R-Precision of 90 %, superior to baseline SD and advanced diffusion models such as Latent Diffusion Method (LDM) [24], Menstrual Cycle-Inspired Latent Diffusion Method (MCI-LDM) [24], and Conditional Generative Adversarial Networks, Attention mechanisms, and Contrastive Learning (C-GAN+ATT+CL). The optimized system advances semantic alignment, decreases training time, and preserves image realism, confirming its strength for scalable, adaptive, and high-fidelity image generation applications.

Authors

  • ChunLing Zhang Weifang University of Science and Technology, School of Computer Science, Shouguang, 262700, China

DOI:

https://doi.org/10.31449/inf.v50i7.10368

Downloads

Published

02/21/2026

How to Cite

Zhang, C. (2026). Stable Diffusion Image Generation System Optimized with Variational Autoencoders and Low-Rank Adaptation. Informatica, 50(7). https://doi.org/10.31449/inf.v50i7.10368