Spatiotemporal GAN-Based Real-Time Animation and Multi-Modal Interaction Optimization for Virtual Reality

Abstract

At present, animation generation and multi-modal interaction in virtual reality environments still face problems such as low generation quality, poor real-time performance and insufficient fusion between modes, which seriously restrict the authenticity and interaction efficiency of immersive experiences. A real-time synthesis and multi-modal interactive optimization method of generative adversarial network animation for a VR environment is proposed. In animation synthesis, a generation architecture with a spatiotemporal consistency adversarial training mechanism is constructed, and a multi-scale feature fusion strategy is combined to achieve high-quality and low-latency animation generation. Experiments were conducted on the VR (Virtual Reality)-Gesture-Voice dataset (60,000 training samples, 15,000 testing samples) and benchmarked against state-of-the-art (SOTA) models including VideoGAN and StyleGAN3. Key results: For the isolated GAN synthesis module: average rendering frame rate = 23.7 FPS (58% higher than VideoGAN), synthesis delay ≤ 4.5 ms; For the end-to-end VR system: average rendering frame rate = 85–92 FPS (meeting VR’s ≥72 FPS standard), end-to-end latency ≤17 ms (51% lower than StyleGAN3). In the real-time synthesis test of generative adversarial network (GAN) animation in the VR environment, two sets of key metrics are reported to clarify different system scopes: For the isolated GAN animation synthesis module (excluding end-to-end transmission and rendering), the improved algorithm achieved an average rendering frame rate of 23.7 FPS (58% higher than the traditional method) and controlled the synthesis delay within 4.5 Ms. Regarding system resource usage, GPU (Graphics Processing Unit) memory consumption is reduced by 0.6 GB, model reasoning time is reduced by 49.5%, and 85% real-time rendering efficiency can still be maintained at 8K resolution.

Authors

  • Zheng Wang Shandong Media Vocational College

DOI:

https://doi.org/10.31449/inf.v49i35.11457

Downloads

Published

12/16/2025

How to Cite

Wang, Z. (2025). Spatiotemporal GAN-Based Real-Time Animation and Multi-Modal Interaction Optimization for Virtual Reality. Informatica, 49(35). https://doi.org/10.31449/inf.v49i35.11457