State of GPU Virtualization for CUDA Applications 2014
Wide spread corporate adoption of virtualization technologies have led some users to rely on Virtual Machines (VMs). When these users or IT administrators wish to start using CUDA, often the first thought is to spin up a new VM. Success is not guaranteed as not all virtualization technologies support CUDA. A survey of GPU virtualization technologies for running CUDA applications is presented. To support CUDA, a VM must be able to present a supported CUDA device to the VM’s operating system and install the NVIDIA graphics driver.
GPU Virtualization Terms
- Device Pass-Through: This is the simplest virtualization model where the entire GPU is presented to the VM as if directly connected. The virtual GPU is usable by only one VM. The CPU equivalent is assigning a single core for exclusive use by a VM. VMware calls this mode virtual Direct Graphics Accelerator (vDGA).
- Partitioning: A GPU is split into virtual GPUs that are used independently by a VM.
- Timesharing: Timesharing involves sharing the GPU or portion of between multiple VMs. Also known as oversubscription or multiplexing, the technology for timesharing CPUs is mature while GPU timesharing is being introduced.
- Live Migration: The ability to move a running VM from one VM host to another without downtime.
Virtualization Support for CUDA
CUDA support from five virtualization technology vendors accounting for most of the virtualization market was examined. The five major vendors are VMWare, Microsoft, Oracle, Citrix and Red Hat. A summary is shown in the table below.
|VMware vSphere||Microsoft Hyper-V||Oracle Virtual Box||Citrix XenServer||Red Hat Enterprise Linux|
|Pass-Through||✔||✕||✔ (Linux host server only)||✔||✔|
Table 1 GPU Virtualization CUDA Support Summer 2014
- VMware supports the Pass-Through model for GPU virtualization. GPU partitioning for the NVIDIA Grid will be available as a technology preview in late 2014 and supported in 2015.
- Microsoft Hyper-V based virtualization software does not support Pass-Through or other GPU Virtualization options with CUDA support. As a result, it is the only major vendor without CUDA VM support.
- Oracle Virtual Box supports the Pass-Through Model for GPU virtualization. Note: Oracle Virtual Box can run on either Linux or Windows based host server. Only VMs running with Oracle Virtual Box on a Linux host server can use Pass-Through model.
- Citrix XenServer supports partitioning a GPU in addition to Pass-Through. Currently, the technology is limited to support for the NVIDIA Grid K1 and K2 cards. Another limitation with the technology is the NVIDIA Grid card has to be divided into identical virtual GPUs. One cannot choose ½ the Grid GPU for one VM and ¼ Grid GPU for two more VMs. The Grid K1 and K2 can be divided into at most 32 virtual GPUs and all virtual GPUs must be the same size.
- Red Hat Enterprise Linux (RHEL) 7 officially supports the Pass-Through model for virtualization. Red Hat Enterprise Virtualization (RHEV) does not support Pass-Through yet. As both product lines use the same underlying technology, it is a matter of time before RHEV has the feature added.
GPU Pass-Through is the common way to support CUDA on VMs. It is hardware intensive option with minimum of one physical GPU per VM required. Today, there is no virtualization technology that supports GPU timesharing or oversubscription for CUDA applications. Live migration for GPU VMs is also unavailable.
VDIs vs CUDA applications
Virtual Desktop Infrastructure (VDI) technologies dominate news releases for GPU virtualization and are reflected in supported features. VDI is used to replace desktops and laptops with thin clients. Typical thin client users do not run CUDA applications. There are several VDI solutions built around NVIDIA’s Grid K1 and K2 GPUs. Microsoft’s RemoteFX, supports DirectX 9 through 11 and OpenGL 1.1. VMware’s Horizon View supports DirectX9 through 11 and OpenGL 2.1. Only Citrix XenDesktop allows full CUDA support in a VDI environment. When reviewing literature about GPU Virtualization, VDI features should not be confused with those that enable CUDA.
GPU Usage Model
Most CUDA applications use the hardware to the full capacity for extended periods of time or in other words, they don’t like to share. Many virtualization benefits result from running multiple VMs on underutilized hardware by sharing it. This is one reason GPU virtualization is often not ideal from a purely performance standpoint.
Clusters/Grid vs GPU Virtualization
An alternate to GPU Virtualization is the use of cluster resource managers and grid computing software. Clusters and grid computing provide many of the benefits of GPU virtualization with a different approach. Clusters and grid computing allow a server with several GPUs to run separate isolated jobs on each GPU. Jobs from multiple users share a GPU one after another. Jobs can move from one node to another in the event of hardware failure (with check pointing software). In this way the benefits of partitioning, time-sharing and live-migration are realized within a large cluster or grid environment. However, virtualization would be the simpler route for those with a few GPU servers.
Pass-Through is the most common mode for GPU virtualization today. It is supported by VMware, Oracle, Red Hat and Citrix. Citrix is by far the leader in GPU virtualization technology with the ability to take advantage of the NVIDIA Grid platform. For now, Microsoft is lagging with no support for CUDA in Hyper-V. GPU Virtualization is advancing rapidly with major improvements expected. In 2015, VMware will support GPU partitioning. Red Hat’s roadmap includes GPU oversubscription with no timelines. Looks like next year’s survey will have a lot more to cover!
Read more about GPU virtualization in the following links:
- Paper on GPU Virtualization on VMware’s Hosted I/O Architecture (Link)
- NVIDIA Grid Page (Link)
- Citrix Blog on GPU sharing (Link)
- Understanding Virtual Desktop (VDI) GPU Technologies (Link)
- Whitepaper on VMware® Horizon View™ 5.2 and Hardware Accelerated 3D Graphics (Link)
- Paper on Live Migration of PCI Devices (Link)
- Virtualization Technology Tabular Comparison (Link)
- VMware Announcement for GPU Virtualization Features in 2015 (Link)