BERJAYA

Persistent disks are the most common storage option due to their price, performance, and predictability. However, you can create instances with local SSDs for even greater performance and lower latency, but without the data redundancy and durability that you get from persistent disks. When you configure a storage option for applications that run on your instances, use the following processes:

Determine how much space you need.
Determine what performance characteristics your applications require.
Configure your instances to optimize storage performance.

This document discusses block storage options that you can attach to Compute Engine instances. To see a complete list of storage options on Google Cloud Platform, read Choosing a storage option.

Block storage performance comparison

Consider your storage size and performance requirements to help you determine the correct disk type and size for your instances. Performance requirements for a given application are typically separated into two distinct I/O patterns.

Small reads and writes
Large reads and writes

For small reads and writes, the limiting factor is random input/output operations per second IOPS.

For large reads and writes, the limiting factor is throughput.

The IOPS per GB and throughput numbers represent the total aggregate performance for data on a single disk, whether attached to a single instance or shared across multiple instances. In the case of multiple instances reading from the same disk, the aggregate throughput and IOPS capacity of the disk is shared among the instances. You might see higher aggregate performance, but use these IOPS per GB and throughput rates for planning purposes.

	Zonal Standard persistent disks	Regional Standard persistent disks	Zonal SSD persistent disks	Regional SSD persistent disks	Local SSD (SCSI)	Local SSD (NVMe)
Maximum sustained IOPS
Read IOPS per GB	0.75	0.75	30	30	266.7	453.3
Write IOPS per GB	1.5	1.5	30	30	186.7	240
Read IOPS per instance	3,000*	3,000*	15,000 - 60,000*	15,000 - 60,000*	400,000	680,000
Write IOPS per instance	15,000*	15,000*	15,000 - 30,000*	15,000 - 30,000*	280,000	360,000
Maximum sustained throughput (MB/s)
Read throughput per GB	0.12	0.12	0.48	0.48	1.04	1.77
Write throughput per GB	0.12	0.12	0.48	0.48	0.73	0.94
Read throughput per instance	240*	240*	240 - 1200*	240 - 1200*	1,560	2,650
Write throughput per instance	76-240**	38-200**	76 - 400*	38 - 200*	1,090	1,400

* Persistent disk IOPS and throughput performance depends on the number of instance vCPUs and IO block size. Read SSD persistent disk performance limits for details on SSD persistent disk, and Standard persistent disk performance limits for details on standard persistent disk.

** SSD and standard persistent disks can achieve greater throughput performance on instances with greater numbers of vCPUs. Read Network egress caps on write throughput for details.

Comparing persistent disk to a physical hard drive

When you specify the size of your persistent disks, consider how these disks compare to traditional physical hard drives. The following tables compare standard persistent disks and SSD persistent disks to the typical performance that you would expect from a 7200 RPM SATA drive, which typically achieves 75 IOPS or 120 MB/s.

I/O Type	I/O Pattern	Size required to match a 7200 RPM SATA drive
		Standard persistent disk	SSD persistent disk
Small random reads	75 small random reads	100 GB	3 GB
Small random writes	75 small random writes	50 GB	3 GB
Streaming large reads	120 MB/s streaming reads	1000 GB	250 GB
Streaming large writes	120 MB/s streaming writes	1000 GB	250 GB

Size, Price, and Performance Summary

While you have several inputs to consider when you select a volume type and size for your application, one factor you do not need to consider is the price of using your volume. Persistent Disk has no per-I/O costs, so there is no need to estimate monthly I/O to calculate budget for what you will spend on disks.

The following pricing calculation examples, use US persistent disk pricing. In these examples, consider only the relative costs of standard persistent disks compared to SSD persistent disks. Standard persistent disks are priced at $0.040 per GB and SSD persistent disks are priced at $0.170 per GB. However, performance caps increase with the size of the volume, so look at the price per IOPS for IOPS-oriented workloads.

Standard persistent disks are approximately $0.053 per random read IOPS and $0.0266 per random write IOPS. SSD persistent disks are $0.0057 per random read IOPS and $0.0057 per random write IOPS. The price per IOPS for SSD persistent disks is true up to the point where they reach the IOPS limits of the instance or the vCPU count for that instance.

SSD persistent disks reach their limit of 60,000 random read IOPS at 2000 GB and 30,000 random write IOPS at 1000 GB. In contrast, standard persistent disks reach their limit of 3,000 random read IOPS at 4 TB and 15,000 random write IOPS at 10 TB.

SSD persistent disks are designed for single-digit millisecond latencies. The observed latency is application-specific.

Standard persistent disk

Standard persistent disk performance scales linearly up to the VM performance limits. A vCPU count of 4 or more for your instance does not limit the performance of standard persistent disks.

A vCPU count of less than 4 for your instance will experience a reduced write limit especially in terms of IOPS, because it is constrained by the network egress limits which are proportional to the vCPU count. The write limit also depends on the size of IOs (16 KB IOs consume more bandwidth than 8 KB IOs at the same IOPS level).

Standard persistent disk IOPS and throughput performance increases linearly with the size of the disk until it reaches the following per-instance limits:

Read throughput: Up to 240 MB/s at a 2 TB disk size.
Write throughput: Up to 240 MB/s at a 2 TB disk size.
Read IOPS: Up to 3,000 IOPS at a 4 TB disk size.
Write IOPS: Up to 15,000 IOPS at a 10 TB disk size.

To gain persistent disk performance benefits on your existing instances, resize your persistent disks to increase IOPS and throughput per persistent disk.

Volume Size (GB)	Sustained Random IOPS			Sustained Throughput (MB/s)
	Read (<=16 KB/IO)	Write (<=8 KB/IO)	Write (16 KB/IO)	Read	Write
10	*	*	*	*	*
32	24	48	48	3	3
64	48	96	96	7	7
128	96	192	192	15	15
256	192	384	384	30	30
512	384	768	768	61	61
1000	750	1500	1500	120	120
1500	1125	2250	2250	180	180
2048	1536	3072	3072	240	240
4000	3000	6000	6000	240	240
5000	3000	7500	7500	240	240
8192	3000	12288	7500	240	240
10000	3000	15000	7500	240	240
16384	3000	15000	7500	240	240
32768	3000	15000	7500	240	240
65536	3000	15000	7500	240	240

* Use this volume size only for boot volumes. I/O bursting will be relied upon for any meaningful tasks.

SSD persistent disk

Unlike standard persistent disks, the IOPS performance of SSD persistent disks also depends on the number of vCPUs in the instance in addition to disk size.

Lower core VMs have lower write IOPS and throughput limits due to network egress limitations on write throughput. See the Network egress caps on write throughput section for details.

SSD persistent disk performance scales linearly until it reaches either the limits of the volume or the limits of each Compute Engine instance. SSD read bandwidth and/or IOPS consistency near the maximum limits largely depends on network ingress utilization; some variability is to be expected, especially for 16 KB IOs near the maximum IOPS limits. See the table below for more details.

Instance vCPU count	Sustained Random IOPS			Sustained Throughput (MB/s)
	Read (<=16 KB/IO)	Write (<=8 KB/IO)	Write (16 KB/IO)	Read*	Write
1 vCPU	15,000	9,000	4,500	240	72
2 to 3 vCPUs	15,000	15,000	4,500/vCPU	240	72/vCPU
4 to 7 vCPUs	15,000	15,000	15,000	240	240
8 to 15 vCPUs	15,000	15,000	15,000	800	400
16 to 31 vCPUs	25,000	25,000	25,000	1,200	400
32 to 63 vCPUs	60,000	30,000	25,000	1,200	400
64+ vCPUs**	60,000	30,000	25,000	1,200	400

* Maximum throughput based on IO block sizes of 256 KB or larger.

** Maximum performance might not be achievable at full CPU utilization.

To improve SSD persistent disk performance on your existing instances, change the machine type of the instance to increase the per-vm limits and resize your persistent disks to increase IOPS and throughput per persistent disk.

Volume Size (GB)	Sustained Random IOPS			Sustained Throughput (MB/s)
	Read (<=16 KB/IO)	Write (<=8 KB/IO)	Write (16 KB/IO)	Read	Write
10	300	300	300	4.8	4.8
32	960	960	960	15	15
64	1920	1920	1920	30	30
128	3840	3840	3840	61	61
256	7680	7680	7680	122	122
500	15000	15000	15000	240	240
834	25000	25000	25000	400	400
1000	30000	30000	25000	480	400
1334	40000	30000	25000	640	400
1667	50000	30000	25000	800	400
2048	60000	30000	25000	983	400
4096	60000	30000	25000	1200	400
8192	60000	30000	25000	1200	400
16384	60000	30000	25000	1200	400
32768	60000	30000	25000	1200	400
65536	60000	30000	25000	1200	400

Simultaneous reads and writes

For standard persistent disks, simultaneous reads and writes share the same performance limits. As your instance uses more read throughput or IOPS, it will be able to perform fewer writes. Instances that use more write throughput will be able to make fewer reads.

SSD persistent disks are capable of achieving their maximum throughput limits for both reads and writes simultaneously. For IOPS, however, SSD persistent disks cannot reach their maximum read and write limits simultaneously. To achieve maximum throughput limits during simultaneous reads and writes, optimize your I/O size so that the volume can meet its throughput limits without reaching an IOPS bottleneck.

Instance IOPS limits for simultaneous reads and writes:

The IOPS numbers in the following table are based on 8 KB IO size. Other IO sizes, like 16 KB, might have different IOPS numbers, but will have the same read/write distribution.

Standard persistent disk		SSD persistent disk (8 vCPUs)		SSD persistent disk (32+ vCPUs)
Read	Write	Read	Write	Read	Write
3000 IOPS	0 IOPS	15000 IOPS	0 IOPS	60000 IOPS	0 IOPS
2250 IOPS	3750 IOPS	11250 IOPS	3750 IOPS	45000 IOPS	7500 IOPS
1500 IOPS	7500 IOPS	7500 IOPS	7500 IOPS	30000 IOPS	15000 IOPS
750 IOPS	11250 IOPS	3750 IOPS	11250 IOPS	15000 IOPS	22500 IOPS
0 IOPS	15000 IOPS	0 IOPS	15000 IOPS	0 IOPS	30000 IOPS

Instance throughput limits for simultaneous reads and writes:

Standard persistent disk		SSD persistent disk (8 vCPUs)		SSD persistent disk (16+ vCPUs)
Read	Write	Read	Write	Read	Write
240 MB/s	0 MB/s	800 MB/s*	400 MB/s*	1200 MB/s*	400 MB/s*
180 MB/s	60 MB/s
120 MB/s	120 MB/s
60 MB/s	180 MB/s
0 MB/s	240 MB/s

* For SSD persistent disks, the max read throughput and max write throughput are independent of each other, so these limits are constant. You might notice increased SSD persistent disk write throughput per instance over the published limits due to ongoing improvements.

Network egress caps on write throughput

Each persistent disk write operation contributes to your virtual machine instance's cumulative network egress cap.

To calculate the maximum persistent disk write traffic that a virtual machine instance can issue, subtract an instance's other network egress traffic from its 2 Gbit/s/vCPU network cap. The remaining throughput represents the throughput available to you for persistent disk write traffic.

Compute Engine stores data on persistent disks so that they have built-in redundancy. Instances write data to persistent disk three times in parallel to achieve this redundancy. Additionally, each write request has a certain amount of overhead, which uses egress bandwidth.

Each virtual machine instance has a persistent disk write limit based on the network egress cap for the virtual machine. In a situation where persistent disk is competing for network egress with IP traffic, 60% of the network egress cap will go to persistent disk traffic, leaving 40% for IP traffic. The following table shows the expected persistent disk write bandwidth with and without additional IP traffic:

	Standard persistent disk			Solid-state persistent disks
Number of vCPUs	Standard persistent disk write limit (MB/s)	Standard persistent disk write allocation (MB/s)	Standard volume size needed to reach limit (GB)	SSD persistent disk write limit (MB/s)	SSD persistent disk write allocation (MB/s)	SSD persistent disk size needed to reach limit (GB)
1	72	43	600	72	43	150
2	144	86	1200	144	86	300
4	240	173	2000	240	173	500
8+	240	240	2000	400	346	834

To understand how the values in this table were created, take a simple example with 1 vCPU and standard persistent disk. In this example, we approximate that the bandwidth multiplier for every write request is 3.3x, which means that data is written out 3 times and has a total overhead of 10%. To calculate the egress cap, divide the network egress cap — 2 Gbit/s, which is equivalent to 238 MB/s, by 3.3:

Max write bandwidth for one vCPU = 238 / 3.3 = ~72 MB/s to your standard persistent disk

In addition, using the standard persistent disk write throughput/GB figure provided in the performance chart presented earlier, you can now derive the required disk capacity to achieve this performance:

Required disk capacity to achieve max write bandwidth for 1 vCPU = 72 / 0.12 = ~600 GB

Similar to zonal persistent disks, write traffic from regional persistent disks contribute to a virtual machine instance's cumulative network egress cap. To calculate the egress network available for regional persistent disks, use the factor of 6.6 rather than the 3.3 that is used for zonal persistent disks.

Max write bandwidth for one vCPU = 238 / 6.6 = ~36 MB/s to your standard replicated persistent disk.

Optimizing persistent disk and local SSD performance

You can optimize persistent disks and local SSDs to handle your data more efficiently.

Optimizing persistent disks

Persistent disks can give you the performance described in the disk type chart, but the virtual machine must drive sufficient usage to reach the performance caps. After you size your persistent disk volumes appropriately for your performance needs, your application and operating system might need some tuning.

In this section, we describe a few key elements that can be tuned for better performance and follow with discussion of how to apply some of them to specific types of workloads.

Disable lazy initialization and enable DISCARD commands

Persistent disks support DISCARD or TRIM commands, which allow operating systems to inform the disks when blocks are no longer in use. DISCARD support allows the operating system to mark disk blocks as no longer needed, without incurring the cost of zeroing out the blocks.

On most Linux operating systems, you enable DISCARD when you mount a persistent disk to your instance. Windows 2012 R2 instances enable DISCARD by default when you mount a persistent disk. Windows 2008 R2 does not support DISCARD.

Enabling DISCARD can boost general runtime performance, and it can also speed up the performance of your disk when it is first mounted. Formatting an entire disk volume can be time consuming. As such, so-called "lazy formatting" is a common practice. The downside of lazy formatting is that the cost is often then paid the first time the volume is mounted. By disabling lazy initialization and enabling DISCARD commands, you can get fast format and mount.

Disable lazy initialization and enable DISCARD during format by passing the following parameters to mkfs.ext4:
```
-E lazy_itable_init=0,lazy_journal_init=0,discard
```
The lazy_journal_init=0 parameter does not work on instances with CentOS 6 or RHEL 6 images. For those instances, format persistent disks without that parameter.
```
-E lazy_itable_init=0,discard
```
Enable DISCARD commands on mount, pass the following flag to the mount command:
```
-o discard
```

Persistent disks work well with the discard option enabled. However, you can optionally run fstrim periodically in addition to, or instead of using the discard option. If you do not use the discard option, run fstrim before you create a snapshot of your disk. Trimming the file system allows you to create smaller snapshot images, which reduces the cost of storing snapshots.

I/O queue depth

Many applications have setting that influence their I/O queue depth to tune performance. Higher queue depths increase IOPS, but can also increase latency. Lower queue depths decrease per-I/O latency, but sometimes at the expense of IOPS.

Readahead cache

To improve I/O performance, operating systems employ techniques such as readahead where more of a file than was requested is read into memory with the assumption that subsequent reads are likely to need that data. Higher readahead increases throughput, but at the expense of memory and IOPS. Lower readahead increases IOPS, but at the expense of throughput.

On linux systems, you can get and set the readahead value with the blockdev command:

$ sudo blockdev --getra /dev/[DEVICE_ID]

$ sudo blockdev --setra [VALUE] /dev/[DEVICE_ID]

The readahead value is <desired_readahead_bytes> / 512 bytes.

For example, if you desire a 8 MB readahead, 8 MB is 8388608 bytes (8 * 1024 * 1024).

8388608 bytes / 512 bytes = 16384

And you would set:

$ sudo blockdev --setra 16384 /dev/[DEVICE_ID]

Free CPUs

Reading and writing to Persistent Disk requires CPU cycles from your virtual machine. To achieve very high, consistent IOPS levels requires having CPUs free to process I/O.

IOPS-oriented workloads

Databases, whether SQL or NoSQL, have usage patterns of random access to data. The following are suggested for IOPS-oriented workloads:

Lower readahead values are typically suggested in best practices documents for MongoDB, Apache Cassandra, and other database applications
I/O queue depth values of 1 per each 400-800 IOPS, up to a limit of 64 on large volumes
One free CPU for every 2000 random read IOPS and 1 free CPU for every 2500 random write IOPS

Throughput-oriented workloads

Streaming operations, such as a Hadoop job, benefit from fast sequential reads. As such, larger I/O sizes can increase streaming performance. For throughput-oriented workloads, I/O sizes of 256 KB or above are recommended.

Optimizing SSD persistent disk performance

The performance by disk type chart describes the expected, achievable performance limits for solid-state persistent disks. To optimize your applications and virtual machine instances to achieve these speeds, use the following best practices:

Make sure your application is issuing enough I/O

If your application is issuing less IOPS than the limit described in the chart above, you won't reach that level of IOPS. For example, on a 500 GB disk, the expected IOPS limit is 15,000 IOPS. However, if you issue less than that, or if you issue I/O operations that are larger than 8 KB, you won't achieve 15,000 IOPS.
Make sure to issue I/O with enough parallelism

Use a high-enough queue depth that you are leveraging the parallelism of the operating system. If you issue 1000 IOPS but do so in a synchronous manner with a queue depth of 1, you will achieve far less IOPS than the limit described in the chart. At a minimum, your application should have a queue depth of at least 1 per every 400-800 IOPS.
Make sure there is enough available CPU on the virtual machine instance issuing the I/O

If your virtual machine instance is starved for CPU, your application won't be able to manage the IOPS described above. As a rule of thumb, you should have one available CPU for every 2000-2500 IOPS of expected traffic.
Make sure your application is optimized for a reasonable temporal data locality on large disks

If your application accesses data distributed across different parts of a disk over short period of time (hundreds of GB per vCPU), you won't achieve optimal IOPS. For best performance, optimize for temporal data locality, weighing factors like the fragmentation of the disk and the randomness of accessed parts of the disk.
Make sure the I/O scheduler in the operating system is configured to meet your specific needs

On Linux-based systems, you can set the I/O scheduler to noop to achieve the highest number of IOPS on SSD-backed devices.

Benchmarking SSD persistent disk performance

Commands below assume a 2500 GB PD-SSD device, if your device size is different modify the --filesize argument. This disk size is necessary to achieve the current 32 vCPU VM throughput limits.

    # Install dependencies
    sudo apt-get update
    sudo apt-get install -y fio

Fill the disk with non-zero data. Persistent disk reads from empty blocks have a latency profile that is different from blocks that contain data. We recommend filling the disk before running any read latency benchmarks.

# Running this command will cause you to lose data on the second device.
# We strongly recommend using a throwaway VM and disk.
sudo fio --name=fill_disk \
  --filename=/dev/sdb --filesize=2500G \
  --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
  --bs=128K --iodepth=64 --rw=randwrite

Testing write bandwidth.

# Running this command will cause you to lose data on the second device.
# We strongly recommend using a throwaway VM and disk.
sudo fio --name=write_bandwidth_test \
  --filename=/dev/sdb --filesize=2500G \
  --time_based --ramp_time=2s --runtime=1m \
  --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
  --bs=1M --iodepth=32 --rw=randwrite

Testing write IOPS. To achieve the PD IOPS limit it is important to maintain a deep IO queue. If, for example, the write latency is 1 millisecond the VM will be able to achieve at most 1000 IOPS for each IO in flight. To achieve 15,000 IOPS the VM will have to maintain at least 15 IOs in flight. If your disk and VM are able to achieve 30,000 IOPS the number of IOs in flight will have to be at least 30 IOs. If the IO size is larger than 4 KB, the VM might hit the bandwidth limit before it reaches the IOPS limit.
```
# Running this command will cause you to lose data on the second device.
# We strongly recommend using a throwaway VM and disk.
sudo fio --name=write_iops_test \
  --filename=/dev/sdb --filesize=2500G \
  --time_based --ramp_time=2s --runtime=1m \
  --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
  --bs=4K --iodepth=64 --rw=randwrite
```
Testing write latency. While testing IO latency it is important that the VM not reach bandwidth or IOPS limit, otherwise the latency will not reflect actual persistent disk IO latency. For example: If the IOPS limit is reached at an IO depth of 30 and the fio command has double that, then the total IOPS will stay the same and the reported IO latency will double.
```
# Running this command will cause you to lose data on the second device.
# We strongly recommend using a throwaway VM and disk.
sudo fio --name=write_latency_test \
  --filename=/dev/sdb --filesize=2500G \
  --time_based --ramp_time=2s --runtime=1m \
  --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
  --bs=4K --iodepth=4 --rw=randwrite
```

Testing read bandwidth.

sudo fio --name=read_bandwidth_test \
  --filename=/dev/sdb --filesize=2500G \
  --time_based --ramp_time=2s --runtime=1m \
  --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
  --bs=256K --iodepth=64 --rw=randread

Testing read IOPS. To achieve the PD IOPS limit it is important to maintain a deep-enough IO queue. If the IO size is larger than 4 KB, the VM might hit the bandwidth limit before it reaches the IOPS limit.

sudo fio --name=read_iops_test \
  --filename=/dev/sdb --filesize=2500G \
  --time_based --ramp_time=2s --runtime=1m \
  --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
  --bs=4K --iodepth=64 --rw=randread

Testing read latency. It's important to fill the disk with data to get a realistic latency measurement. It's important that the VM does not reach IOPS or throughput limits during this test. Once persistent disk reaches its saturation limit it will push back on incoming IOs and this will be reflected as an artificial increase in IO latency.
```
sudo fio --name=read_latency_test \
  --filename=/dev/sdb --filesize=2500G \
  --time_based --ramp_time=2s --runtime=1m \
  --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
  --bs=4K --iodepth=4 --rw=randread
```

Testing sequential read bandwidth.

sudo fio --name=read_bandwidth_test \
  --filename=/dev/sdb --filesize=2500G \
  --time_based --ramp_time=2s --runtime=1m \
  --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
  --numjobs=4 --thread --offset_increment=500G \
  --bs=1M --iodepth=64 --rw=read

Testing sequential write bandwidth.

sudo fio --name=write_bandwidth_test \
  --filename=/dev/sdb --filesize=2500G \
  --time_based --ramp_time=2s --runtime=1m \
  --ioengine=libaio --direct=1 --verify=0 --randrepeat=0 \
  --numjobs=4 --thread --offset_increment=500G \
  --bs=1M --iodepth=64 --rw=write

Optimizing Local SSDs

The performance by disk type chart describes the expected, achievable performance limits for local SSD devices. To optimize your applications and virtual machine instances to achieve these speeds, use the following best practices:

Use the guest environment optimizations for Local SSD

By default, most Compute Engine-provided Linux images will automatically run an optimization script that configures the instance for peak local SSD performance. The script enables certain Queue sysfs files settings that enhance the overall performance of your machine and masks interrupt requests (IRQs) to specific virtual CPUs (vCPUs). This script only optimizes performance for Compute Engine local SSD devices.

Ubuntu, SLES, and older images might not be configured to include this performance optimization. If you are using any of these images, or an image older than v20141218, you can install the guest environment to enable these optimizations instead.

Select the best image for NVMe or SCSI interfaces

Local SSDs operate best with either the NVMe or SCSI interface type depending on the image that you use on the boot disk for your instance. Choose an interface for your local SSD devices that works best with your boot disk image. If your instances connect to local SSDs using SCSI interfaces, you can enable multi-queue SCSI on the guest operating system to achieve optimal performance over the SCSI interface.

Enable multi-queue SCSI on instances with custom images and local SSDs

Some public images support multi-queue SCSI. If you require multi-queue SCSI capability on custom images that you import to your project, you must enable it yourself. Your imported Linux images can use multi-queue SCSI only if they include kernel version 3.19 or later.

To enable multi-queue SCSI on a custom image, import the image with the VIRTIO_SCSI_MULTIQUEUE guest operating system feature enabled and add an entry to your GRUB config:

CentOS

For CentOS7 only.

Import your custom image using the API and include a guestOsFeatures item with a type value of VIRTIO_SCSI_MULTIQUEUE.
Create an instance using your custom image and attach one or more local SSDs.
Connect to your instance through SSH.
Check the value of the /sys/module/scsi_mod/parameters/use_blk_mq file
```
$ cat /sys/module/scsi_mod/parameters/use_blk_mq
```
If the value of this file is Y, then multi-queue SCSI is already enabled on your imported image. If the value of the file is N, include scsi_mod.use_blk_mq=Y in the GRUB_CMDLINE_LINUX entry in your GRUB config file and restart the system.
1. Open the /etc/default/grub GRUB config file in a text editor.
```
$ sudo vi /etc/default/grub
```
2. Add scsi_mod.use_blk_mq=Y to the GRUB_CMDLINE_LINUX entry.
```
GRUB_CMDLINE_LINUX=" vconsole.keymap=us console=ttyS0,38400n8 vconsole.font=latarcyrheb-sun16 scsi_mod.use_blk_mq=Y"
```
3. Save the config file.
4. Run the grub2-mkconfig command to regenerate the GRUB file and complete the configuration.
```
$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
```
5. Reboot the instance.
```
$ sudo reboot
```

Ubuntu

Import your custom image using the API and include a guestOsFeatures item with a type value of VIRTIO_SCSI_MULTIQUEUE.
Create an instance using your custom image and attach one or more local SSDs using the SCSI interface.
Connect to your instance through SSH.
Check the value of the /sys/module/scsi_mod/parameters/use_blk_mq file.
```
$ cat /sys/module/scsi_mod/parameters/use_blk_mq
```
If the value of this file is Y, then multi-queue SCSI is already enabled on your imported image. If the value of the file is N, include scsi_mod.use_blk_mq=Y in the GRUB_CMDLINE_LINUX entry in your GRUB config file and restart the system.
1. Open the sudo nano /etc/default/grub GRUB config file in a text editor.
```
$ sudo nano /etc/default/grub
```
2. Add scsi_mod.use_blk_mq=Y to the GRUB_CMDLINE_LINUX entry.
```
GRUB_CMDLINE_LINUX="scsi_mod.use_blk_mq=Y"
```
3. Save the config file.
4. Run the update-grub command to regenerate the GRUB file and complete the configuration.
```
$ sudo update-grub
```
5. Reboot the instance.
```
$ sudo reboot
```

Disable write cache flushing

Filesystems, databases, and other applications use cache flushing to ensure that data is committed to durable storage at various checkpoints. For most storage devices, this default makes sense. However, write cache flushes are fairly slow on local SSDs. You can increase the write performance for some applications by disabling automatic flush commands in those applications or by disabling flush options at the file system level.

Local SSDs always flush cached writes within two seconds regardless of the flush commands that you set for your filesystems and applications, so temporary hardware failures can cause you to lose only two seconds of cached writes at most. Permanent hardware failures can still cause loss of all data on the device whether the data is flushed or not, so you should still backup critical data to persistent disks or Cloud Storage buckets.

To disable write cache flushing on ext4 file systems, include the nobarrier in your mount options or in your /etc/fstab entries. For example:

$ sudo mount -o discard,defaults,nobarrier /dev/[LOCAL_SSD_ID] /mnt/disks/[MNT_DIR]

where: [LOCAL_SSD_ID] is the device ID for the local SSD that you want to mount.

Benchmarking local SSD performance

The local SSD performance figures provided in the Performance section were achieved using specific settings on the local SSD instance. If your instance is having trouble reaching these performance limits and you have already configured the instance using the recommended local SSD settings, you can compare your performance limits against the published limits by replicating the settings used by the Compute Engine team.

Create a local SSD instance that has four or eight vCPUs for each device, depending on your workload. For example, if you want to attach four local SSD devices to an instance, use a machine type with 16 or 32 vCPUs.

The following command creates a virtual machine with 8 vCPUs, and a single local SSD:
```
gcloud compute instances create ssd-test-instance \
--machine-type "n1-standard-8" \
--local-ssd interface="SCSI"
```

Run the following script on your machine, which replicates the settings used to achieve these speeds. Note that the --bs parameter defines the block size, which affects the results for different types of read and write operations.

# install dependencies
sudo apt-get update -y
sudo apt-get install -y build-essential git libtool gettext autoconf \
libgconf2-dev libncurses5-dev python-dev fio bison autopoint

# blkdiscard
git clone https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git
cd util-linux/
./autogen.sh
./configure --disable-libblkid
make
sudo mv blkdiscard /usr/bin/
sudo blkdiscard /dev/disk/by-id/google-local-ssd-0

# full write pass - measures write bandwidth with 1M blocksize
sudo fio --name=writefile --size=100G --filesize=100G \
--filename=/dev/disk/by-id/google-local-ssd-0 --bs=1M --nrfiles=1 \
--direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 \
--iodepth=200 --ioengine=libaio

# rand read - measures max read IOPS with 4k blocks
sudo fio --time_based --name=benchmark --size=100G --runtime=30 \
--filename=/dev/disk/by-id/google-local-ssd-0 --ioengine=libaio --randrepeat=0 \
--iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 \
--numjobs=4 --rw=randread --blocksize=4k --group_reporting

# rand write - measures max write IOPS with 4k blocks
sudo fio --time_based --name=benchmark --size=100G --runtime=30 \
--filename=/dev/disk/by-id/google-local-ssd-0 --ioengine=libaio --randrepeat=0 \
--iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 \
--numjobs=4 --rw=randwrite --blocksize=4k --group_reporting

What's next

Learn about persistent disk pricing.
Learn about local SSD pricing.

Jun	JUL	Aug
	01
2018	2019	2020

Optimizing Persistent Disk and Local SSD Performance