Google Compute Engine provides graphics processing units (GPUs) that you can add to your virtual machine instances. You can use these GPUs to accelerate specific workloads on your instances such as machine learning and data processing.
For more information about what you can do with GPUs and what types of GPU hardware are available, read GPUs on Compute Engine.
Before you begin
- If you want to use the command-line examples in this guide:
- Install or update to the latest version of the gcloud command-line tool.
- Set a default region and zone.
- If you want to use the API examples in this guide, set up API access.
- Read about GPU pricing on Compute Engine to understand the cost to use GPUs on your instances.
- Read about restrictions for instances with GPUs to learn how these instances function differently than most instances.
- Learn how instances function when you schedule them to terminate for host maintenance events. If you add GPUs to your instances, you must also set the instance to terminate during host maintenance.
- Check the quotas page to ensure that you have enough GPUs available for your project. If you need additional GPU quota, request a quota increase.
Creating an instance with a GPU
Before you create an instance with a GPU, select which boot disk image you want to use for the instance, and ensure that the appropriate GPU driver is installed.
If you are using GPUs for machine learning, you can use a Deep Learning VM image for your instance. The Deep Learning VM images have GPU drivers pre-installed, and include packages such as TensorFlow and PyTorch. You can also use the Deep Learning VM images for general GPU workloads. For information on the images available, and the packages installed on the images, see the Deep Learning VM documentation.
You can also use any public image or custom image, but some images might require a unique driver or install process that is not covered in this guide. You must identify what drivers are appropriate for your images.
For steps to install drivers, see installing GPU drivers.
When you create an instance with one or more GPUs, you must set the instance to terminate on host maintenance. Instances with GPUs cannot live migrate because they are assigned to specific hardware devices. See GPU restrictions for details.
Create an instance with one or more GPUs using the Google Cloud Platform Console,
the gcloud command-line tool, or the
API.
Console
- Go to the VM instances page.
- Click Create instance.
- Select a zone where GPUs are available. See the list of available zones with GPUs.
- In the Machine type section, select the machine type that you want to use for this instance. Alternatively, you can specify custom machine type settings later.
- In the Machine type section, click Customize to see advanced machine type options and available GPUs.
- Click GPUs to see the list of available GPUs.
- Specify the GPU type and the number of GPUs that you need.
- If necessary, adjust the machine type to accommodate your desired GPU settings. If you leave these settings as they are, the instance uses the predefined machine type that you specified before opening the machine type customization screen.
- To configure your boot disk, in the Boot disk section, click Change.
- In the OS images tab, choose an image.
- Click Select to confirm your boot disk options.
- Optionally, you can include a startup script to install the GPU driver while the instance starts up. In the Automation section, include the contents of your startup script under Startup script. See installing GPU drivers for example scripts.
- Configure any other instance settings that you require. For example, you can change the Preemptibility settings to configure your instance as a preemptible instance. This reduces the cost of your instance and the attached GPUs. Read GPUs on preemptible instances to learn more.
- At the bottom of the page, click Create to create the instance.
gcloud
Use the regions describe command to ensure that you have sufficient
GPU quota in the region where you want to create instances with GPUs.
gcloud compute regions describe [REGION]
where [REGION] is the
region where you want to
check for GPU quota.
Start an instance with the latest image from an image family:
gcloud compute instances create [INSTANCE_NAME] \
--machine-type [MACHINE_TYPE] --zone [ZONE] \
--accelerator type=[ACCELERATOR_TYPE],count=[ACCELERATOR_COUNT] \
--image-family [IMAGE_FAMILY] --image-project [IMAGE_PROJECT] \
--maintenance-policy TERMINATE --restart-on-failure \
--metadata startup-script='[STARTUP_SCRIPT]' \
[--preemptible]
where:
[INSTANCE_NAME]is the name for the new instance.[MACHINE_TYPE]is the machine type that you selected for the instance. See GPUs on Compute Engine to see what machine types are available based on your desired GPU count.[ZONE]is the zone for this instance.[IMAGE_FAMILY]is one of the available image families.[ACCELERATOR_COUNT]is the number of GPUs that you want to add to your instance. See GPUs on Compute Engine for a list of GPU limits based on the machine type of your instance.[ACCELERATOR_TYPE]is the GPU model that you want to use. Use one of the following values:- NVIDIA® Tesla® T4:
nvidia-tesla-t4 - NVIDIA® Tesla® T4 Virtual Workstation with NVIDIA®
GRID®:
nvidia-tesla-t4-vws - NVIDIA® Tesla® P4:
nvidia-tesla-p4 - NVIDIA® Tesla® P4 Virtual Workstation with NVIDIA®
GRID®:
nvidia-tesla-p4-vws - NVIDIA® Tesla® P100:
nvidia-tesla-p100 - NVIDIA® Tesla® P100 Virtual Workstation with NVIDIA®
GRID®:
nvidia-tesla-p100-vws - NVIDIA® Tesla® V100:
nvidia-tesla-v100 - NVIDIA® Tesla® K80:
nvidia-tesla-k80
See GPUs on Compute Engine for a list of available GPU models.
- NVIDIA® Tesla® T4:
[IMAGE_PROJECT]is the image project that the image family belongs to.[STARTUP_SCRIPT]is an optional startup script that you can use to install the GPU driver while the instance is starting up. See installing GPU drivers for examples.--preemptibleis an optional flag that configures your instance as a preemptible instance. This reduces the cost of your instance and the attached GPUs. Read GPUs on preemptible instances to learn more.
For example, you can use the following gcloud command to start an
Ubuntu 16.04 instance with one NVIDIA® Tesla® K80 GPU and 2 vCPUs in the
us-east1-d zone. The startup-script metadata instructs the
instance to install the
CUDA Toolkit
with its recommended driver version.
gcloud compute instances create gpu-instance-1 \
--machine-type n1-standard-2 --zone us-east1-d \
--accelerator type=nvidia-tesla-k80,count=1 \
--image-family ubuntu-1604-lts --image-project ubuntu-os-cloud \
--maintenance-policy TERMINATE --restart-on-failure \
--metadata startup-script='#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-10-0; then
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
apt-get update
apt-get install cuda-10-0 -y
fi'
This example command starts the instance, but CUDA and the driver will take several minutes to finish installing.
API
Identify the GPU type that you want to add to your instance. Submit a GET request to list the GPU types that are available to your project in a specific zone.
GET https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/acceleratorTypes
where:
[PROJECT_ID]is your project ID.[ZONE]is the zone where you want to list the available GPU types.
In the API, create a POST request to create a new instance. Include the
acceleratorType parameter to specify which GPU type you want to use, and
include the acceleratorCount parameter to specify how many GPUs you want
to add. Also set the onHostMaintenance parameter to TERMINATE.
POST https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/instances?key={YOUR_API_KEY}
{
"machineType": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/machineTypes/n1-highmem-2",
"disks":
[
{
"type": "PERSISTENT",
"initializeParams":
{
"diskSizeGb": "[DISK_SIZE]",
"sourceImage": "https://www.googleapis.com/compute/v1/projects/[IMAGE_PROJECT]/global/images/family/[IMAGE_FAMILY]"
},
"boot": true
}
],
"name": "[INSTANCE_NAME]",
"networkInterfaces":
[
{
"network": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/global/networks/[NETWORK]"
}
],
"guestAccelerators":
[
{
"acceleratorCount": [ACCELERATOR_COUNT],
"acceleratorType": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/acceleratorTypes/[ACCELERATOR_TYPE]"
}
],
"scheduling":
{
"onHostMaintenance": "terminate",
"automaticRestart": true,
["preemptible": true]
},
"metadata":
{
"items":
[
{
"key": "startup-script",
"value": "[STARTUP_SCRIPT]"
}
]
}
}
where:
[INSTANCE_NAME]is the name of the instance.[PROJECT_ID]is your project ID.[ZONE]is the zone for this instance.[MACHINE_TYPE]is the machine type that you selected for the instance. See GPUs on Compute Engine to see what machine types are available based on your desired GPU count.[IMAGE_PROJECT]is the image project that the image belongs to.[IMAGE_FAMILY]is a boot disk image for your instance. Specify an image family from the list of available public images.[DISK_SIZE]is the size of your boot disk in GB.[NETWORK]is the VPC network that you want to use for this instance. Specifydefaultto use your default network.[ACCELERATOR_COUNT]is the number of GPUs that you want to add to your instance. See GPUs on Compute Engine for a list of GPU limits based on the machine type of your instance.[ACCELERATOR_TYPE]is the GPU model that you want to use. See GPUs on Compute Engine for a list of available GPU models.[STARTUP_SCRIPT]is an optional startup script that you can use to install the GPU driver while the instance is starting up. See installing GPU drivers for examples."preemptible": trueis an optional parameter that configures your instance as a preemptible instance. This reduces the cost of your instance and the attached GPUs. Read GPUs on preemptible instances to learn more.
If you used a startup script to automatically install the GPU device driver, verify that the GPU driver installed correctly.
If you did not use a startup script to install the GPU driver during instance creation, manually install the GPU driver on your instance so that your system can use the device.
Adding or removing GPUs on existing instances
You can add or detach GPUs on your existing instances, but you must first stop the instance and change its host maintenance setting so that it terminates rather than live-migrating. Instances with GPUs cannot live migrate because they are assigned to specific hardware devices. See GPU restrictions for details.
Also be aware that you must install GPU drivers on this instance after you add a GPU. The boot disk image that you used to create this instance determines what drivers you need. You must identify what drivers are appropriate for the operating system on your instance's persistent boot disk images. Read installing GPU drivers for details.
You can add or remove GPUs from an instance using the Google Cloud Platform Console or the API.
Console
You can add or remove GPUs from your instance by stopping the instance and editing your instance's configuration.
Verify that all of your critical applications are stopped on the instance. You must stop the instance before you can add a GPU.
Go to the VM instances page to see your list of instances.
On the list of instances, click the name of the instance where you want to add GPUs. The instance details page opens.
At the top of the instance details page, click Stop to stop the instance.
After the instance stops running, click Edit to change the instance properties.
If the instance has a shared-core machine type, you must change the machine type to have one or more vCPUs. You cannot add accelerators to instances with shared-core machine types.
In the Machine type section, click Customize to see advanced machine type options and available GPUs.
Click GPUs to see the list of available GPUs.
Select the number of GPUs and the GPU model that you want to add to your instance. Alternatively, you can set the number of GPUs to None to remove existing GPUs from the instance.
If you added GPUs to an instance, set the host maintenance setting to Terminate. If you removed GPUs from the instance, you can optionally set the host maintenance setting back to Migrate VM instance.
At the bottom of the instance details page, click Save to apply your changes.
After the instance settings are saved, click Start at the top of the instance details page to start the instance again.
API
You can add or remove GPUs from your instance by stopping the instance and changing your instance's configuration through the API.
Verify that all of your critical applications are stopped on the instance and then create a POST command to stop the instance so it can move to a host system where GPUs are available.
POST https://www.googleapis.com/compute/v1/projects/compute/zones/[ZONE]/instances/[INSTANCE_NAME]/stopwhere:
[INSTANCE_NAME]is the name of the instance where you want to add GPUs.[ZONE]is the zone for where the instance is located.
Identify the GPU type that you want to add to your instance. Submit a GET request to list the GPU types that are available to your project in a specific zone.
GET https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/acceleratorTypeswhere:
[PROJECT_ID]is your project ID.[ZONE]is the zone where you want to list the available GPU types.
If the instance has a shared-core machine type, you must change the machine type to have one or more vCPUs. You cannot add accelerators to instances with shared-core machine types.
After the instance stops, create a POST request to add or remove one or more GPUs to your instance.
POST https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/instances/[INSTANCE_NAME]/setMachineResources { "guestAccelerators": [ { "acceleratorCount": [ACCELERATOR_COUNT], "acceleratorType": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/acceleratorTypes/[ACCELERATOR_TYPE]" } ] }where:
[INSTANCE_NAME]is the name of the instance.[PROJECT_ID]is your project ID.[ZONE]is the zone for this instance.[ACCELERATOR_COUNT]is the number of GPUs that you want on your instance. See GPUs on Compute Engine for a list of GPU limits based on the machine type of your instance.[ACCELERATOR_TYPE]is the GPU model that you want to use. See GPUs on Compute Engine for a list of available GPU models.
Create a POST command to set the scheduling options for the instance. If you are adding GPUs to an instance, you must specify
"onHostMaintenance": "TERMINATE". Optionally, if you are removing GPUs from an instance you can specify"onHostMaintenance": "MIGRATE".POST https://www.googleapis.com/compute/v1/projects/compute/zones/[ZONE]/instances/[INSTANCE_NAME]/setScheduling { "onHostMaintenance": "[MAINTENANCE_TYPE]", "automaticRestart": true }where:
[INSTANCE_NAME]is the name of the instance where you want to add GPUs.[ZONE]is the zone for where the instance is located.[MAINTENANCE_TYPE]is the action you want your instance to take when host maintenance is necessary. SpecifyTERMINATEif you are adding GPUs to your instance. Alternatively, you can specify"onHostMaintenance": "MIGRATE"if you have removed all of the GPUs from your instance and want the instance to resume migration on host maintenance events.
Start the instance.
POST https://www.googleapis.com/compute/v1/projects/compute/zones/[ZONE]/instances/[INSTANCE_NAME]/startwhere:
[INSTANCE_NAME]is the name of the instance where you want to add GPUs.[ZONE]is the zone for where the instance is located.
Next install the GPU driver on your instance so that your system can use the device.
Creating groups of GPU instances using instance templates
You can use instance templates to create managed instance groups with GPUs added to each instance. Managed instance groups use the template to create multiple identical instances. You can scale the number of instances in the group to match your workload.
For steps to create an instance template, see Creating instance templates.
If you create the instance template using the Console, customize the machine type, and select the type and number of GPUs that you want to add to the instance template.
If you are using the gcloud command-line tool,
include the --accelerators and --maintenance-policy TERMINATE flags.
Optionally, include the --metadata startup-script flag and
specify a startup script to install the GPU driver while the instance
starts up. For sample scripts that work on GPU instances, see
installing GPU drivers.
The following example creates an instance template with 2 vCPUs, a 250GB boot disk with Ubuntu 16.04, an NVIDIA® Tesla® K80 GPU, and a startup script. The startup script installs the CUDA Toolkit with its recommended driver version.
gcloud beta compute instance-templates create gpu-template \
--machine-type n1-standard-2 \
--boot-disk-size 250GB \
--accelerator type=nvidia-tesla-k80,count=1 \
--image-family ubuntu-1604-lts --image-project ubuntu-os-cloud \
--maintenance-policy TERMINATE --restart-on-failure \
--metadata startup-script='#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-10-0; then
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
apt-get update
apt-get install cuda-10-0 -y
fi'
After you create the template, use the template to create an instance group. Every time you add an instance to the group, it starts that instance using the settings in the instance template.
If you are creating a regional managed instance group, be sure to
select zones
that specifically support the GPU model that you want. For a list of GPU models
and available zones, see GPUs on Compute Engine.
The following example creates a regional managed instance group across two
zones that support the nvidia-tesla-k80 model.
gcloud beta compute instance-groups managed create example-rmig \
--template gpu-template --base-instance-name example-instances \
--size 30 --zones us-east1-c,us-east1-d
Note: If you are choosing specific
zones, use the gcloud beta component because the zone
selection feature is currently in Beta.
To learn more about managing and scaling groups of instances, read Creating Groups of Managed Instances.
Installing GPU drivers
After you create an instance with one or more GPUs, your system requires device drivers so that your applications can access the device. This guide shows the ways to install NVIDIA proprietary drivers on instances with public images.
You can install GPU drivers through one of the following options:
- Use sample scripts to install the drivers. You can either specify these scripts as startup scripts on your instances or run these scripts directly on your instances through the terminal.
- For Windows Server instances or instances where you cannot use these scripts, install the driver manually.
Installing GPU drivers using scripts
Each version of CUDA requires a minimum GPU driver version or a later version. To check the minimum driver required for your version of CUDA, see CUDA Toolkit and Compatible Driver Versions.
NVIDIA GPUs running on Google Compute Engine must use the following driver versions:
Linux instances:
- NVIDIA 410.79 driver or greater
Windows Server instances:
- NVIDIA 411.98 driver or greater
For most driver installs, you can obtain these drivers by installing the NVIDIA CUDA Toolkit.
On some images, you can use scripts to simplify the driver install process.
You can either specify these scripts as
startup scripts
on your instances or copy these scripts to your instances and run them through
the terminal as a user with sudo privileges.
You must prepare the script so that it works with the boot disk image that you selected. If you imported a custom boot disk image for your instances, you might need to customize the startup script to work correctly with that custom image.
For Windows Server instances and SLES 12 instances where you cannot automate the driver installation process, install the driver manually.
The following samples are startup scripts that install CUDA and the associated drivers for NVIDIA® GPUs on public images. If the software you are using requires a specific version of CUDA, modify the script to download the version of CUDA that you need.
For information on support for CUDA, and for steps to modify your CUDA installation, see the CUDA Tooltkit Documentation.
To verify if a startup script completes, you can review the logs or check the serial console.
CentOS
These sample scripts check for an existing CUDA install and then installs the full CUDA 10 package and its associated proprietary driver.
CentOS 7 - CUDA 10:
#!/bin/bash
# Install kernel headers and development packages
echo "Installing kernel headers and development packages."
yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! rpm -q cuda-10-0; then
curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.0.130-1.x86_64.rpm
rpm -i --force ./cuda-repo-rhel7-10.0.130-1.x86_64.rpm
yum clean all
rm -rf /var/cache/yum
# Install Extra Packages for Enterprise Linux (EPEL) for dependencies
yum install epel-release -y
yum update -y
yum install cuda-10-0 -y
fi
# Verify that CUDA installed; retry if not.
if ! rpm -q cuda-10-0; then
yum install cuda-10-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1
On instances with CentOS 7 images, you might need to reboot the instance
after the script finishes installing the drivers and the CUDA packages.
Reboot the instance if the script is finished and the nvidia-smi command
returns the following error:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
CentOS 6 - CUDA 10:
#!/bin/bash
# Install kernel headers and development packages
echo "Installing kernel headers and development packages."
yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! rpm -q cuda-10-0; then
curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/cuda-repo-rhel6-10.0.130-1.x86_64.rpm
rpm -i --force ./cuda-repo-rhel6-10.0.130-1.x86_64.rpm
yum clean all
# Install Extra Packages for Enterprise Linux (EPEL) for dependencies
yum install epel-release -y
yum update -y
yum install cuda-10-0 -y
fi
# Verify that CUDA installed; retry if not.
if ! rpm -q cuda-10-0; then
yum install cuda-10-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1
On instances with CentOS 6 images, you might need to reboot the instance
after the script finishes installing the drivers and the CUDA packages.
Reboot the instance if the script is finished and the nvidia-smi command
returns the following error:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
RHEL
These sample scripts check for an existing CUDA install and then installs the full CUDA 10 package and its associated proprietary driver.
RHEL 7 - CUDA 10:
#!/bin/bash
# Install kernel headers and development packages
echo "Installing kernel headers and development packages."
yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! rpm -q cuda-10-0; then
curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.0.130-1.x86_64.rpm
rpm -i --force ./cuda-repo-rhel7-10.0.130-1.x86_64.rpm
yum clean all
rm -rf /var/cache/yum
# Install Extra Packages for Enterprise Linux (EPEL) for dependencies
yum install epel-release -y
yum update -y
yum install cuda-10-0 -y
fi
# Verify that CUDA installed; retry if not.
if ! rpm -q cuda-10-0; then
yum install cuda-10-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1
On instances with RHEL 7 images, you might need to reboot the instance
after the script finishes installing the drivers and the CUDA packages.
Reboot the instance if the script is finished and the nvidia-smi command
returns the following error:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
RHEL 6 - CUDA 10:
#!/bin/bash
# Install kernel headers and development packages
echo "Installing kernel headers and development packages."
yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! rpm -q cuda-10-0; then
curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/cuda-repo-rhel6-10.0.130-1.x86_64.rpm
rpm -i --force ./cuda-repo-rhel6-10.0.130-1.x86_64.rpm
yum clean all
# Install Extra Packages for Enterprise Linux (EPEL) for dependencies
yum install epel-release -y
yum update -y
yum install cuda-10-0 -y
fi
# Verify that CUDA installed; retry if not.
if ! rpm -q cuda-10-0; then
yum install cuda-10-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1
On instances with RHEL 6 images, you might need to reboot the instance
after the script finishes installing the drivers and the CUDA packages.
Reboot the instance if the script is finished and the nvidia-smi command
returns the following error:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
To verify if the script is finsished, you can review the serial console.
SLES
SLES 15 - CUDA 10:
This sample script checks for an existing CUDA install and then installs the full CUDA 10 package and its associated proprietary driver.
#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! rpm -q cuda-10-0; then
curl -O http://developer.download.nvidia.com/compute/cuda/repos/sles15/x86_64/cuda-repo-sles15-10.0.130-1.x86_64.rpm
rpm -i --force ./cuda-repo-sles15-10.0.130-1.x86_64.rpm
zypper --gpg-auto-import-keys refresh
zypper install -ny cuda-10-0
fi
# Verify that CUDA installed; retry if not.
if ! rpm -q cuda-10-0; then
zypper install -ny cuda-10-0
fi
# Enable persistence mode
nvidia-smi -pm 1
SLES 12 Service Pack 3 - CUDA 9.1:
This sample script checks for an existing CUDA install and then installs the full CUDA 9.1 package and its associated proprietary driver.
#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! rpm -q cuda-9-1; then
curl -O http://developer.download.nvidia.com/compute/cuda/repos/sles123/x86_64/cuda-repo-sles123-9.1.85-1.x86_64.rpm
rpm -i --force ./cuda-repo-sles123-9.1.85-1.x86_64.rpm
zypper --gpg-auto-import-keys refresh
zypper install -ny cuda-9-1
fi
# Verify that CUDA installed; retry if not.
if ! rpm -q cuda-9-1; then
zypper install -ny cuda-9-1
fi
# Enable persistence mode
nvidia-smi -pm 1
SLES 12:
On other SLES 12 instances, install the driver manually.
Ubuntu
Ubuntu 18.04 - Cuda 10:
This sample script checks for an existing CUDA install and then installs the full CUDA 10 package and its associated proprietary driver.
#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-10-0; then
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
apt-get update
apt-get install cuda-10-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1
Ubuntu 17.04 and 17.10 - CUDA 9:
This sample script checks for an existing CUDA install and then installs the full CUDA 9 package and its associated proprietary driver.
#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-9-0; then
# The 17.04 installer works with 17.10.
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1704_9.0.176-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1704_9.0.176-1_amd64.deb
apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
apt-get update
apt-get install cuda-9-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1
Ubuntu 16.04 LTS - CUDA 10:
This sample script checks for an existing CUDA install and then installs the full CUDA 10 package and its associated proprietary driver.
#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-10-0; then
# The 16.04 installer works with 16.10.
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
apt-get update
apt-get install cuda-10-0 -y
fi
# Enable persistence mode
nvidia-smi -pm 1
Windows Server
On Windows Server instances, you must install the driver manually.
After your script finishes running, you can verify that GPU driver installed.
Manually installing GPU drivers
If you cannot use a script to install the driver for your GPUs, you can manually install the driver yourself. You are responsible for selecting the installer and driver version that works best for your applications. Use this install method if you require a specific driver or you need to install the driver on a custom image or a public image that does not work with one of the install scripts.
You can use this process to manually install drivers on instances with most public images. For custom images, you might need to modify the process to function in your unique environment.
CentOS
Connect to the instance where you want to install the driver.
Install kernel headers and development packages.
yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -ySelect a driver repository and add it to your instance. For example, use
curlto download the CUDA Toolkit and use therpmcommand to add the repository to your system:CentOS 7
$ curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.0.130-1.x86_64.rpm$ sudo rpm -i cuda-repo-rhel7-10.0.130-1.x86_64.rpmCentOS 6
$ curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/cuda-repo-rhel6-10.0.130-1.x86_64.rpm$ sudo rpm -i cuda-repo-rhel6-10.0.130-1.x86_64.rpm
Install the
epel-releaserepository. This repository includes the DKMS packages, which are required to install NVIDIA drivers on CentOS.$ sudo yum install epel-releaseClean the Yum cache:
$ sudo yum clean allInstall CUDA 10, which includes the NVIDIA driver.
$ sudo yum install cuda-10-0On instances with CentOS images, you might need to reboot the instance after you install the drivers and the CUDA packages. Reboot the instance if the script is finished and the
nvidia-smicommand returns the following error:NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Enable persistence mode.
$ sudo nvidia-smi -pm 1 Enabled persistence mode for GPU 00000000:00:04.0. Enabled persistence mode for GPU 00000000:00:05.0. All done.
RHEL
Connect to the instance where you want to install the driver.
Install kernel headers and development packages.
yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -ySelect a driver repository and add it to your instance. For example, use
curlto download the CUDA Toolkit and use therpmcommand to add the repository to your system:RHEL 7
$ curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.0.130-1.x86_64.rpm$ sudo rpm -i cuda-repo-rhel7-10.0.130-1.x86_64.rpmRHEL 6
$ curl -O http://developer.download.nvidia.com/compute/cuda/repos/rhel6/x86_64/cuda-repo-rhel6-10.0.130-1.x86_64.rpm$ sudo rpm -i cuda-repo-rhel6-10.0.130-1.x86_64.rpm
Install the
epel-releaserepository. This repository includes the DKMS packages, which are required to install NVIDIA drivers. On RHEL, you must download the.rpmfor this repository fromfedoraproject.organd add it to your system.RHEL 7
$ curl -O https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm$ sudo rpm -i epel-release-latest-7.noarch.rpmRHEL 6
$ curl -O https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm$ sudo rpm -i epel-release-latest-6.noarch.rpm
Clean the Yum cache:
$ sudo yum clean allInstall CUDA 10, which includes the NVIDIA driver.
$ sudo yum install cuda-10-0On instances with RHEL images, you might need to reboot the instance after you install the drivers and the CUDA packages. Reboot the instance if the script is finished and the
nvidia-smicommand returns the following error:NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Enable persistence mode.
$ sudo nvidia-smi -pm 1 Enabled persistence mode for GPU 00000000:00:04.0. Enabled persistence mode for GPU 00000000:00:05.0. All done.
SLES
Connect to the instance where you want to install the driver.
Select a driver repository and add it to your instance. For example, use
curlto download the CUDA Toolkit and use therpmcommand to add the repository to your system:SLES 15
$ curl -O https://developer.download.nvidia.com/compute/cuda/repos/sles15/x86_64/cuda-repo-sles15-10.0.130-1.x86_64.rpm$ sudo rpm -i cuda-repo-sles15-10.0.130-1.x86_64.rpmSLES 12 with Service Pack 3
$ curl -O https://developer.download.nvidia.com/compute/cuda/repos/sles123/x86_64/cuda-repo-sles123-9.1.85-1.x86_64.rpm$ sudo rpm -i cuda-repo-sles123-9.1.85-1.x86_64.rpmSLES 12 with Service Pack 2
$ curl -O https://developer.download.nvidia.com/compute/cuda/repos/sles122/x86_64/cuda-repo-sles122-9.0.176-1.x86_64.rpm$ sudo rpm -i cuda-repo-sles122-9.0.176-1.x86_64.rpm
Refresh Zypper:
$ sudo zypper refreshInstall CUDA, which includes the NVIDIA driver.
$ zypper install cudaEnable persistence mode.
$ sudo nvidia-smi -pm 1 Enabled persistence mode for GPU 00000000:00:04.0. Enabled persistence mode for GPU 00000000:00:05.0. All done.
Ubuntu
Connect to the instance where you want to install the driver.
Select a driver repository and add it to your instance. For example, use
curlto download the CUDA Toolkit and use thedpkgcommand to add the repository to your system. Then, use theapt-keycommand to authenticate the download:Ubuntu 18.04
$ curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb$ sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb $ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/7fa2af80.pub
Ubuntu 17.04 and 17.10
$ curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/cuda-repo-ubuntu1704_9.0.176-1_amd64.deb$ sudo dpkg -i cuda-repo-ubuntu1704_9.0.176-1_amd64.deb $ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/7fa2af80.pub
Ubuntu 16.04 and 16.10 LTS
$ curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_10.0.130-1_amd64.deb$ sudo dpkg -i cuda-repo-ubuntu1604_10.0.130-1_amd64.deb $ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
Update the package lists:
$ sudo apt-get updateInstall CUDA, which includes the NVIDIA driver.
Ubuntu 16.04, 16.10 and 18.04
$ sudo apt-get install cuda-10-0Ubuntu 17.04 and 17.10
$ sudo apt-get install cuda-9-0
Enable persistence mode.
$ sudo nvidia-smi -pm 1 Enabled persistence mode for GPU 00000000:00:04.0. Enabled persistence mode for GPU 00000000:00:05.0. All done.
Windows Server
Connect to the instance where you want to install the driver.
Download an
.exeinstaller file to your instance that includes the R384 branch: NVIDIA 386.07 driver or greater. For most Windows Server instances, you can use one of the following options:- Download the CUDA Toolkit with NVIDIA driver included
- Download only the NVIDIA driver
For example in Windows Server 2016, you can open a PowerShell terminal as an administrator and use the
wgetcommand to download the driver installer that you need.PS C:\> wget https://developer.nvidia.com/compute/cuda/10.0/Prod/network_installers/cuda_10.0.130_win10_network -o cuda_10.0.130_win10_network.exeRun the
.exeinstaller. For example, you can open a PowerShell terminal as an administrator and run the following command.PS C:\> .\\cuda_10.0.130_win10_network.exe
After your installer finishes running, verify that GPU driver installed.
Verifying the GPU driver install
After the driver finishes installing, verify that the driver installed and initialized properly.
Linux
Connect to the Linux instance
and use the nvidia-smi command to verify that the driver is running properly.
$ nvidia-smi Wed Jan 2 19:51:51 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P4 Off | 00000000:00:04.0 Off | 0 | | N/A 42C P8 7W / 75W | 62MiB / 7611MiB | 0% Default | +-------------------------------+----------------------+----------------------+
Windows Server
Connect to the Windows Server instance
and use the nvidia-smi.exe tool to verify that the driver is running properly.
PS C:\> & 'C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe' Fri Jan 04 16:47:42 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 411.31 Driver Version: 411.31 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P4 TCC | 00000000:00:04.0 Off | 0 | | N/A 31C P8 6W / 75W | 0MiB / 7611MiB | 0% Default | +-------------------------------+----------------------+----------------------+
If the driver is not functioning and you used a script to install the driver, check the startup script logs ensure that the script has finished and that it did not fail during the install process.
Installing GRID® drivers for virtual workstations
For a full list of NVIDIA drivers that you can use on Compute Engine, see the contents of the NVIDIA drivers Cloud Storage bucket.
Linux
Download the GRID driver, using the following command:
curl -O https://storage.googleapis.com/nvidia-drivers-us-public/GRID/GRID7.1/NVIDIA-Linux-x86_64-410.92-grid.runUse the following command to start the installer:
sudo bash NVIDIA-Linux-x86_64-410.92-grid.runDuring the installation, choose the following options:
- If you are prompted to install 32-bit binaries, select Yes.
- If you are prompted to modify the
x.orgfile, select No.
Windows Server
Depending on your version of Windows Server, download one of the following NVIDIA GRID drivers:
Run the installer, and choose the Express installation.
After the installation is complete, restart the VM. When you restart, you are disconnected from your session.
Reconnect to your instance using RDP or a PCoIP client.
Verifying that the GRID driver has been installed
Linux
Run the following commands:
sudo nvidia-smi --persistence-mode=1
nvidia-smi
The output of the command looks similar to the following:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.92 Driver Version: 410.92 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:00:04.0 Off | 0 |
| N/A 34C P0 26W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Windows Server
Connect to your Windows instance using RDP or a PCoIP client.
Right-click the desktop, and select NVIDIA Control Panel.
In the NVIDIA Control Panel, from the Help menu, select System Information. The information shows the GPU that the VM is using, and the driver version.
Optimizing GPU performance
In general, you can optimize the performance of your GPU devices on Linux instances using the following settings:
Enable persistence mode. This setting applies to all of the GPUs on your instance.
$ sudo nvidia-smi -pm 1 Enabled persistence mode for GPU 00000000:00:04.0. Enabled persistence mode for GPU 00000000:00:05.0. All done.
On instances with NVIDIA® Tesla® K80 GPUs, disable autoboost:
$ sudo nvidia-smi --auto-boost-default=DISABLED All done.
Handling host maintenance events
GPU instances cannot be live migrated. GPU instances must terminate for host maintenance events, but can automatically restart. These maintenance events typically occur once per week, but can occur more frequently when necessary.
You can deal with maintenance events using the following processes:
- Avoid these disruptions by regularly restarting your instances on a schedule that is more convenient for your applications.
- Identify when your instance is scheduled for host maintenance and prepare your workload to transition through the system restart.
To receive advanced notice of host maintenance events, monitor the
/computeMetadata/v1/instance/maintenance-event metadata value.
If the request to the metadata server returns NONE, the instance is not
scheduled to terminate. For example, run the following command from within
an instance:
$ curl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event -H "Metadata-Flavor: Google" NONE
If the metadata server returns TERMINATE_ON_HOST_MAINTENANCE, then your
instance is scheduled for termination. Compute Engine gives GPU instances a one hour termination notice, while normal instances receive only
a 60 second notice. Configure your application to transition through the
maintenance event. For example, you might use one of the following techniques:
Configure your application to temporarily move work in progress to a Google Cloud Storage bucket, then retrieve that data after the instance restarts.
Write data to a secondary persistent disk. When the instance automatically restarts, the persistent disk can be reattached and your application can resume work.
You can also receive notification of changes in this metadata value without polling. For examples of how to receive advanced notice of host maintenance events without polling, read getting live migration notices.
What's next?
- Learn more about GPUs on Compute Engine
- Add Local SSDs to your instances. Local SSD devices pair well with GPUs when your applications require high-performance storage.
- Try the Running TensorFlow Inference Workloads at Scale with TensorRT5 and NVIDIA T4 GPU tutorial.


