API Interception-Based GPU Virtualization for Containerized HPC Workloads in Cloud Environments

Abstract

In recent years, with the development of cloud-native technologies such as containers and Kubernetes, high-performance computing (HPC) tasks using GPUs have gradually migrated to container cloud environments, introducing new challenges in fine-grained GPU resource management. Currently, the native Kubernetes framework lacks support for allocating fractional GPU resources to containers, permitting only exclusive access to entire physical GPUs, which results in low cluster-wide GPU utilization. To enable efficient GPU sharing for HPC tasks in containerized environments, GPU virtualization — allocating precise amounts of GPU compute and memory resources to different containers with isolation guarantees — becomes essential. However, current GPU virtualization technologies remain in their infancy and exhibit the following limitations: (1) NVIDIA GPUs enforce strict closed-source policies at the driver and lower layers, forcing existing solutions to rely on reverse engineering approaches for virtualization; (2) current implementations fail to address GPU idle cycles during HPC task execution, leading to computational resource wastage. To address these issues, this paper proposes a GPU virtualization system for container cloud environments. The key contributions include: (a) profiling the workflow and invocation mechanisms of CUDA-based HPC tasks (e.g., deep learning training) to characterize their GPU memory and compute usage patterns; (b) developing formal models for HPC resource utilization processes; and (c) implementing a resource isolation and quota mechanism via API interception and forwarding. Experimental results demonstrate that our system achieves superior virtualization efficiency with lower overhead. The proposed adaptive elastic GPU allocation method yields 37% higher average GPU utilization and 26% greater cluster throughput under heavy loads compared to static allocation in KubeShare.

Authors

  • Ming Lu Henan Open University School of Information Engineering and Artificial Intelligence
  • Heng Wu Institute of Software Research, Chinese Academy of Sciences
  • Rongzhou Luo Institute of Software Research, Chinese Academy of Sciences

DOI:

https://doi.org/10.31449/inf.v50i12.12099

Downloads

Published

05/13/2026

How to Cite

Lu, M., Wu, H., & Luo, R. (2026). API Interception-Based GPU Virtualization for Containerized HPC Workloads in Cloud Environments. Informatica, 50(12). https://doi.org/10.31449/inf.v50i12.12099