BERJAYA

You can set up autoscaler to scale based on the following metric types:

Scale using per-instance metrics where the selected metric provides data for each instance in the managed instance group indicating resource utilization.
Scale using per-group metrics (Beta) where the group scales based on a metric that provides a value related to the whole managed instance group.

These metrics can be either standard metrics provided by the Stackdriver Monitoring service, or custom Stackdriver Monitoring metrics that you create.

Before you begin

If you want to use the command-line examples in this guide:
1. Install or update to the latest version of the gcloud command-line tool.
2. Set a default region and zone.
If you want to use the API examples in this guide, set up API access.
Read the Before you begin section of the Autoscaling Overview topic for important setup steps.

Per-instance metrics

Per-instance metrics provide data for each instance in a group separately. The metrics provide data for each instance in the managed instance group indicating resource utilization. For per-instance metrics, the instance group cannot scale below a size of 1 because the autoscaler requires metrics about at least one running instance in order to operate.

If you need to scale using other Stackdriver metrics that are not specific to individual instances or scale your instance groups down to zero instances from time to time, you can configure your instances to scale using per-group metrics instead.

Standard per-instance metrics

Stackdriver Monitoring has a set of standard metrics that you can use to monitor your virtual machine instances. However, not all standard metrics are a valid utilization metric that the autoscaler can use.

A valid utilization metric for scaling meets the following criteria:

The standard metric must contain data for a gce_instance monitored resource. You can use the timeSeries.list API call to verify whether a specific metric exports data for this resource.
The standard metric describes how busy an instance is, and the metric value increases or decreases proportionally to the number of virtual machine instances in the group.

The following is an invalid metric because the value does not change based on utilization and the autoscaler cannot use the value to scale proportionally:

compute.googleapis.com/instance/cpu/reserved_cores

After you select a standard metric you want to use for your autoscaler, you can configure autoscaling using that metric.

Custom metrics

You can create custom metrics using Stackdriver Monitoring and write your own monitoring data to the Stackdriver Monitoring service. This gives you side-by-side access to standard Cloud Platform data and your custom monitoring data, with a familiar data structure and consistent query syntax. If you have a custom metric, you can choose to scale based on the data from these metrics.

Prerequisites

In order to use custom metrics, you must have done the following:

Created a custom metric. For information on creating a custom metric, see the Custom Metrics documentation.
Set up your managed instance group to export the custom metric from all instances in the managed instance group.

Choose a valid custom metric

Not all custom metrics can be used by the autoscaler. To choose a valid custom metric, the metric must have all of the following properties:

The metric must be a per-instance metric. The metric must export data relevant to each specific Compute Engine instance separately.
The exported per-instance values must be associated with a gce_instance
monitored resource, which contains the following labels:
- zone with the name of the zone the instance is in.
- instance_id with the value of unique numerical ID assigned to the instance.
The metric must export data at least every 60 seconds. You can export data more often than 60 seconds and the autoscaler will be able to respond faster to load changes. If you export your data less than every 60 seconds, the autoscaler might not be able to respond quickly enough to load changes.
The metric must be a valid utilization metric, which means that data from the metric can be used to proportionally scale up or down the number of virtual machines.
The metric must export int64 or double data values.

For autoscaler to work with your custom metric, you must export data for this custom metric from all the instances in the managed instance group.

Note: You can get an instance's numerical ID by making a request for the metadata server's ID property from within the instance. For example, you can do this in curl:

curl http://metadata.google.internal/computeMetadata/v1/instance/id -H Metadata-Flavor:Google

For more information on using the metadata server, see Metadata Server.

Configuring autoscaling using per-instance monitoring metrics

The process of setting up an autoscaler for a standard or custom metric is the same. To create an autoscaler that uses Stackdriver Monitoring metrics, you must provide the metric identifier, the desired target utilization level, and the utilization target type. Each of these properties are described briefly below:

Metric identifier: The name of the metric to use. If you use a custom metric, you defined this name when you initially created the metric. The identifier has the following format:
```
custom.googleapis.com/path/to/metric
```
See Using Custom Metrics for more information about creating, browsing, and reading metrics.
Target utilization level: The target utilization level that the autoscaler must maintain for this metric. This must be a positive number. For example, both 24.5 and 1100 are acceptable values. Note that this is different from CPU and load balancing utilization, which must be a float value between 0.0 and 1.0.
Target type: This defines how the autoscaler computes the data collected from the instances. The possible target types are:
- GAUGE: The autoscaler computes the average value of the data collected in last couple minutes and compares that to the target utilization value of the autoscaler.
- DELTA_PER_MINUTE: The autoscaler calculates the average rate of growth per minute and compares that to the target utilization.
- DELTA_PER_SECOND: The autoscaler calculates the average rate of growth per second and compares that to the target utilization.
If you expressed your desired target utilization in seconds, you will want to use DELTA_PER_SECOND and likewise, use DELTA_PER_MINUTE if you expressed your target utilization in minutes, so the autoscaler can perform accurate comparisons.

Console

The instructions for configuring autoscaling are different for regional versus single-zone managed instance groups. Regional managed instance groups do not support filtering for per-instance metrics.

To configure autoscaling for a regional (multi-zone) managed instance group:

Go to the Instance Groups page.
If you do not have an instance group, create one. Otherwise, click the name of an instance group from the list to open the instance group details page. The group must be a regional group.
On the instance group details page, click the Edit Group button.
Under Autoscaling, select On to enable autoscaling.
In the Autoscale based on section, select Stackdriver monitoring metric.
In the Metric identifier section, enter the metric name in the following format: example.googleapis.com/path/to/metric.
In the Target section, specify the target value.
In the Target type section, specify the target type that corresponds to the metric's kind of measurement.
Save your changes when you are ready.

To configure autoscaling for a single-zone managed instance group:

Go to the Instance Groups page.
If you do not have an instance group, create one. Otherwise, click the name of an instance group to open the instance group details page. The instance group must be single-zone.
On the instance group details page, click the Edit Group button.
Under Autoscaling, select On to enable autoscaling.
In the Autoscale based on section, select Stackdriver monitoring metric.
In the Metric export scope section, select Time series per instance to configure autoscaling using per-instance metrics.
In the Metric identifier section, enter the metric name in the following format: example.googleapis.com/path/to/metric.
In the Additional filter expression section, optionally enter a filter to use individual values from metrics with multiple streams or labels. See Filtering per-instance metrics for more information.
In the Utilization target section, specify the target value.
In the Utilization target type section, verify that the target type corresponds to the metric's kind of measurement.
Save your changes when you are ready.

gcloud

For example, in gcloud, the following command creates an autoscaler that uses the GAUGE target type. Along with the --custom-metric-utilization parameter, the --max-num-replicas parameter is also required when creating an autoscaler:

gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
    --custom-metric-utilization metric=example.googleapis.com/path/to/metric,utilization-target-type=GAUGE,utilization-target=10 \
    --max-num-replicas 20 \
    --cool-down-period 90

Optionally, you can use the --cool-down-period flag, which tells the autoscaler how many seconds to wait after a new virtual machine has started before the autoscaler starts collecting usage information from it. This accounts for the amount of time it might take for the virtual machine to initialize, during which the collected usage is not reliable for autoscaling. The default cool down period is 60 seconds.

For multi-zonal managed instance groups, use the --region flag to specify where to find the instance group. For example:

gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
    --custom-metric-utilization metric=example.googleapis.com/path/to/metric,utilization-target-type=GAUGE,utilization-target=10 \
    --max-num-replicas 20 \
    --cool-down-period 90 \
    --region us-central1

To see a full list of available gcloud commands and flags, see the gcloud reference.

API

Note: Although autoscaling is a feature of managed instance groups, it is a separate API resource. Keep that in mind when you construct API requests for autoscaling.

In the API, make a POST request to the following URL, replacing myproject with your own project ID and us-central1-f with the zone of your choice:

POST https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers/

Your request body must contain the name, target, and autoscalingPolicy fields. In autoscalingPolicy, provide the maxNumReplicas and the customMetricUtilizations properties.

Optionally, you can use the coolDownPeriodSec parameter, which tells the autoscaler how many seconds to wait after a new instance has started before it starts to collect usage. After the cool-down period passes, the autoscaler begins to collect usage information from the new instance and determines if the group requires additional instances. This accounts for the amount of time it might take for the instance to initialize, during which the collected usage is not reliable for autoscaling. The default cool-down period is 60 seconds.

POST https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers

{
 "name": "example-autoscaler",
 "target": "zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
 "autoscalingPolicy": {
  "maxNumReplicas": 10,
  "coolDownPeriodSec": 90,
  "customMetricUtilizations": [
   {
    "metric": "example.googleapis.com/some/metric/name",
    "utilizationTarget": 10,
    "utilizationTargetType": "GAUGE"
   }          ]
 }
}

Filtering per-instance metrics

You can apply filters to per-instance Stackdriver metrics, which allows you to scale single-zone managed instance groups using individual values from metrics with multiple streams or labels.

Per-instance metric filtering requirements

Autoscaler filtering is compatible with the Stackdriver Monitoring filter syntax. The filters for per-instance metrics must meet the following requirements:

You can use only the AND operator for joining selectors.
You can use only the = direct equality comparison operator, but you cannot use the operator with any functions. For example, you cannot use the startswith() function with the = comparison operator.
You must not set the resource.type or resource.label.* selectors. Per-instance metrics always use all of instance resources from the group.
For best results, the filter should be specific enough to return a single time series for each instance. If the filter returns multiple time series, they are added together.

Configuring autoscalers to filter metrics

Use the Google Cloud Platform Console, the gcloud beta command-line tool, or the Compute Engine Beta API to add metric filters for autoscaling of a single-zone managed instance group.

Console

The process for creating an autoscaler that filters a per-instance metric is similar to creating a normal per-instance autoscaler, but you also specify a metric filter. For example, the compute.googleapis.com/instance/network/received_bytes_count metric includes the instance_name and loadbalanced labels. To filter based on the loadbalanced boolean:

Go to the Instance Groups page.
If you do not have an instance group, create one. Otherwise, click the name of an instance group to open the instance group details page. The instance group must be single-zone.
On the instance group details page, click the Edit Group button.
Under Autoscaling, select On to enable autoscaling.
In the Autoscale based on section, select Stackdriver monitoring metric.
In the Metric export scope section, select Time series per instance to configure autoscaling using per-instance metrics.
In the Metric identifier section, enter the metric name. For example, compute.googleapis.com/instance/network/received_bytes_count.
In the Additional filter expression section, enter a filter. For example, 'metric.label.loadbalanced = true' .
Save your changes when you are ready.

gcloud

The process for creating an autoscaler that filters a per-instance metric is similar to creating a normal per-instance autoscaler, but you must specify a metric filter and individual flags for the utilization target and target type. For example, the compute.googleapis.com/instance/network/received_bytes_count metric includes the instance_name and loadbalanced labels. To filter based on the loadbalanced boolean, specify the --stackdriver-metric-filter filter flag with the 'metric.label.loadbalanced = true' value. Include the utilization target and target type flags individually.

gcloud beta compute instance-groups managed set-autoscaling example-managed-instance-group \
    --update-stackdriver-metric=compute.googleapis.com/instance/network/received_bytes_count \
    --stackdriver-metric-utilization-target-utilization-target=10 \
    --stackdriver-metric-utilization-target-type=DELTA_PER_SEC \
    --stackdriver-metric-filter='metric.label.loadbalanced = true' \
    --max-num-replicas 20 \
    --cool-down-period 90

This example configures autoscaling to use only the loadbalanced traffic data as part of the utilization target.

To see a full list of available gcloud commands and flags, see the gcloud beta reference.

API

Note: Although autoscaling is a feature of managed instance groups, it is a separate API resource. Keep that in mind when you construct API requests for autoscaling.

The process for creating an autoscaler that filters a per-instance metric is similar to creating a normal per-instance autoscaler, but you must specify a metric filter and individual flags for the utilization target and target type. For example, the compute.googleapis.com/instance/network/received_bytes_count metric includes the instance_name and loadbalanced labels. To filter based on the loadbalanced boolean, specify the filter parameter with the "metric.label.loadbalanced = true" value.

In the API, make a POST request to the following URL, replacing myproject with your own project ID and us-central1-f with the zone of your choice. The request body must contain the name, target, and autoscalingPolicy fields. In autoscalingPolicy, provide the maxNumReplicas and the customMetricUtilizations properties.

POST https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers

{
 "name": "example-autoscaler",
 "target": "zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
 "autoscalingPolicy": {
  "maxNumReplicas": 10,
  "coolDownPeriodSec": 90,
  "customMetricUtilizations": [
   {
    "metric": "compute.googleapis.com/instance/network/received_bytes_count",
    "filter": "metric.label.loadbalanced = true",
    "utilizationTarget": 10,
    "utilizationTargetType": "DELTA_PER_SEC"
   }
  ]
 }
}

This example configures autoscaling to use only the loadbalanced traffic data as part of the utilization target.

Per-group metrics

Per-group metrics allow autoscaling with a standard or custom metric that does not export per-instance utilization data. Instead, the group scales based on a value that applies to the whole group and corresponds to how much work is available for the group or how busy the group is. The group scales based on the fluctuation of that group metric value and the configuration that you define.

When you configure autoscaling on per-group metrics, you must indicate how you want the autoscaler to provision instances relative to the metric:

Instance assignment: Specify an instance assignment to indicate that you want the autoscaler to add or remove instances depending on how much work is available to assign to each instance. Specify a value for this parameter that represents how much work you expect each instance can handle. For example, specify 2 to assign two units of work to each instance, or specify 0.5 to assign half a unit of work to each instance. The autoscaler adds enough instances to the managed instance group to ensure that there are enough instances to complete the available work as indicated by the metric. If the metric value is 10 and you assigned 0.5 units of work to each instance, the autoscaler creates 20 instances in the managed instance group. Scaling with instance assignment allows the instance group to shrink to 0 instance when the metric value drops down to 0 - and back up again when it rises above 0. The following diagram shows the proportional relationship between metric value and number of instances when scaling with an instance assignment policy.
Utilization target: Specify a utilization target to indicate that you want the autoscaler to add or remove instances to try and maintain the metric at a specified value. When the metric is above the specified target, autoscaler gradually adds instances until the metric decreases to the target value. When the metric is below the specified target value, autoscaler gradually removes instances until the metric increases to the target value. Scaling with a utilization target cannot shrink the group to 0 instances. The following diagram shows how autoscaler adds and removes instances in response to a metric value in order to maintain a utilization target.

Each option has the following use cases:

Instance assignment: Scale the size of your managed instance groups based on the number of unacknowledged messages in a Google Pub/Sub subscription or a total QPS rate of a network endpoint.
Utilization target: Scale the size of your managed instance groups based on a utilization target for a custom metric that does not come from the standard per-instance CPU or memory use metrics. For example, you might scale the group based on a custom latency metric.

When you configure autoscaling with per-group metrics and you specify an instance assignment, your instance groups can scale down to 0 instances. If your metric indicates that there is no work for your instance group to complete, the group will scale down to 0 instances until the metric detects that new work is available. In contrast to per-group instance assignment, per-instance autoscaling requires resource utilization metrics from at least one instance, so the group cannot scale below a size of 1.

Filtering per-group metrics

You can apply filters to per-group Stackdriver metrics, which allows you to scale managed instance groups using individual values from metrics that have multiple streams or labels.

Per-group metric filtering requirements

Autoscaler filtering is compatible with the Stackdriver Monitoring filter syntax. The filters for per-group metrics must meet the following requirements:

You can use only the AND operator for joining selectors.
You cannot use the = direct equality comparison operator with any functions for each selector.
You can specify a metric type selector of metric.type = "..." in the filter and also include the original metric field. Optionally, you can use only the metric field. The metric must meet the following requirements:
- The metric must be specified at least in one place.
- The metric can be specified in both places, but must be equal.
You must specify the resource.type selector, but you cannot set it to gce_instance if you want to scale using per-group metrics.
For best results, the filter should be specific enough to return a single time series for the group. If the filter returns multiple time series, they are added together.

Configuring autoscaling using per-group monitoring metrics

Use the Google Cloud Platform Console, the gcloud beta command-line tool, or the Compute Engine Beta API to configure autoscaling with per-group metrics for a single-zone managed instance group.

Console

Go to the Instance Groups page.
If you do not have an instance group, create one. Otherwise, click the name of an instance group to open the instance group details page. The instance group must be single-zone.
On the instance group details page, click the Edit Group button.
Under Autoscaling, select On to enable autoscaling.
In the Autoscale based on section, select Stackdriver monitoring metric.
In the Metric export scope section, select Single time series per group.
In the Metric identifier section, specify the metric name in the following format: example.googleapis.com/path/to/metric.
Specify the Metric resource type.
Provide an additional filter expression to use individual values from metrics that have multiple streams or labels. The filter must meet the autoscaler filtering requirements.
In the Scaling policy section, select either Instance assignment or Utilization target.
- If you select an instance assignment policy, then provide a Single instance assignment value that represents the amount of work to assign to each instance in the managed instance group. For example, specify 2 to assign two units of work to each instance. The autoscaler maintains enough instances to complete the available work (as indicated by the metric). If the metric value is 10 and you assigned 2 units of work to each instance, the autoscaler creates 5 instances in the managed instance group.
- If you select a utilization target policy:
  - Provide a Utilization target value that represents the metric value that the autoscaler should try to maintain.
  - Select the Utilization target type that represents the value type for the metric.
Save your changes when you are ready.

gcloud

Create an autoscaler for a managed instance group similarly to the per-instance autoscaler, but specify the --update-stackdriver-metric flag. You can specify how you want the autoscaler to provision instances by including one of the following flags:

Instance assignment: Specify the --stackdriver-metric-single-instance-assignment flag.
Utilization target: Specify the --stackdriver-metric-utilization-target flag.

Instance assignment:

Specify a metric that you want to measure and specify the --stackdriver-metric-single-instance-assignment flag to indicate the amount of work that you expect each instance to handle. You must also specify a filter for the metric using the --stackdriver-metric-filter flag.

gcloud beta compute instance-groups managed set-autoscaling [GROUP_NAME] \
    --zone=[ZONE] \
    --max-num-replicas=[MAX_INSTANCES] \
    --min-num-replicas=[MIN_INSTANCES] \
    --update-stackdriver-metric='[METRIC_URL]' \
    --stackdriver-metric-filter='[METRIC_FILTER]' \
    --stackdriver-metric-single-instance-assignment=[INSTANCE_ASSIGNMENT]

where:

[GROUP_NAME] is the name of the managed instance group where you want to add an autoscaler.
[ZONE] is the zone where the managed instance group is located. You cannot specify a region for autoscalers on per-group metrics.
[MAX_INSTANCES] is the limit on the number of instances that the autoscaler can add to the managed instance group.
[MIN_INSTANCES] is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.
[METRIC_URL] is a protocol-free URL of a Google Cloud Monitoring metric.
[METRIC_FILTER] is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevant TimeSeries and a MonitoredResource. The filter must meet the autoscaler filtering requirements.
[INSTANCE_ASSIGNMENT] is the amount of work to assign to each instance in the managed instance group. For example, specify 2 to assign two units of work to each instance, or specify 0.5 to assign half a unit of work to each instance. The autoscaler adds enough instances to the managed instance group to ensure that there are enough instances to complete the available work, which is indicated by the metric. If the metric value is 10 and you've assigned 0.5 units of work to each instance, the autoscaler provisions 20 instances in the managed instance group.

Utilization target:

In some situations, you might want to use utilization targets with per-group metrics rather than specify a number of instances relative to the value of the metric that your autoscaler measures. You can still point the autoscaler to a per-group metric, but the autoscaler attempts to maintain the specified utilization target. Specify the target and target type with the --stackdriver-metric-utilization-target flag. You must also specify a filter for the metric using the --stackdriver-metric-filter flag.

 gcloud beta compute instance-groups managed set-autoscaling [GROUP_NAME] \
     --zone=[ZONE] \
     --max-num-replicas=[MAX_INSTANCES] \
     --min-num-replicas=[MIN_INSTANCES] \
     --update-stackdriver-metric='[METRIC_URL]' \
     --stackdriver-metric-filter='[METRIC_FILTER]' \
     --stackdriver-metric-utilization-target=[TARGET_VALUE] \
     --stackdriver-metric-utilization-target-type=[TARGET_TYPE]

where:

[GROUP_NAME] is the name of the managed instance group where you want to add an autoscaler.
[ZONE] is the zone where the managed instance group is located. You cannot specify a region for autoscalers on per-group metrics.
[MAX_INSTANCES] is the limit on the number of instances that the autoscaler can add to the managed instance group.
[MIN_INSTANCES] is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.
[METRIC_URL] is a protocol-free URL of a Google Cloud Monitoring metric.
[METRIC_FILTER] is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevant TimeSeries and a MonitoredResource. You must specify a resource.type value, but you cannot specify gce_instance if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.
[TARGET_VALUE] is the metric value that the autoscaler attempts to maintain.
[TARGET_TYPE] is the value type for the metric. You can set the autoscaler to monitor the metric as a GAUGE, by the delta-per-minute of the value, or by the delta-per-second of the value.

To see a full list of available autoscaler gcloud commands and flags that work with per-group autoscaling, see the gcloud beta reference.

API

Note: Although autoscaling is a feature of managed instance groups, autoscalers are a separate API resource. Keep that in mind when you construct API requests for autoscaling.

Create an autoscaler for a managed instance group. You can specify how you want the autoscaler to provision instances by including one of the following parameters:

Instance assignment: Specify the singleInstanceAssignment parameter.
Utilization target: Specify the utilizationTarget parameter.

Instance assignment:

In the API, make a POST request to create an autoscaler. In the request body, include the normal parameters that you would use to create a per-instance autoscaler, but specify the single-instance-assignment parameter. The parameter specifies the amount of work that you expect each instance to handle.

POST https://www.googleapis.com/compute/beta/projects/[PROJECT_ID]/zones/[ZONE]/autoscalers

{
 "name": "example-autoscaler",
 "target": "zones/[ZONE]/instanceGroupManagers/[GROUP_NAME]",
 "autoscalingPolicy": {
  "maxNumReplicas": [MAX_INSTANCES],
  "minNumReplicas": [MIN_INSTANCES],
  "customMetricUtilizations": [
    {
      "metric": "[METRIC_URL]",
      "filter": "[METRIC_FILTER]",
      "singleInstanceAssignment": [INSTANCE_ASSIGNMENT]
    }
  ],
 }
}

where:

[PROJECT_ID] is your project ID.
[ZONE] is the zone where the managed instance group is located.
[GROUP_NAME] is the name of the managed instance group where you want to add an autoscaler.
[MAX_INSTANCES] is the limit on the number of instances that the autoscaler can add to the managed instance group.
[MIN_INSTANCES] is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.
[METRIC_URL] is a protocol-free URL of a Google Cloud Monitoring metric.
[METRIC_FILTER] is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevant TimeSeries and a MonitoredResource. You must specify a resource.type value, but you cannot specify gce_instance if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.
[INSTANCE_ASSIGNMENT] is the amount of work to assign to each instance in the managed instance group. For example, specify 2 to assign two units of work to each instance, or specify 0.5 to assign half a unit of work to each instance. The autoscaler adds enough instances to the managed instance group to ensure that there are enough instances to complete the available work, which is indicated by the metric. If the metric value is 10 and you've assigned 0.5 units of work to each instance, the autoscaler provisions 20 instances in the managed instance group.

Utilization target:

In some situations, you might want to use utilization targets with per-group metrics rather than specify a number of instances relative to the value of the metric that your autoscaler measures. You can still point the autoscaler to a per-group metric, but the autoscaler attempts to maintain the specified utilization target. Specify those targets with the utilizationTarget parameter. You must also specify a filter for the metric using the filter parameter.

POST https://www.googleapis.com/compute/beta/projects/[PROJECT_ID]/zones/[ZONE]/autoscalers

{
 "name": "example-autoscaler",
 "target": "zones/[ZONE]/instanceGroupManagers/[GROUP_NAME]",
 "autoscalingPolicy": {
  "maxNumReplicas": [MAX_INSTANCES],
  "minNumReplicas": [MIN_INSTANCES],
  "customMetricUtilizations": [
    {
      "metric": "[METRIC_URL]",
      "filter": "[METRIC_FILTER]",
      "utilizationTarget": [TARGET_VALUE],
      "utilizationTargetType": [TARGET_TYPE]
    }
  ],
 }
}

where:

[GROUP_NAME] is the name of the managed instance group where you want to add an autoscaler.
[ZONE] is the zone where the managed instance group is located.
[MAX_INSTANCES] is the limit on the number of instances that the autoscaler can add to the managed instance group.
[MIN_INSTANCES] is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.
[METRIC_URL] is a protocol-free URL of a Google Cloud Monitoring metric.
[METRIC_FILTER] is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevant TimeSeries and a MonitoredResource. You must specify a resource.type value, but you cannot specify gce_instance if you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.
[TARGET_VALUE] is the metric value that the autoscaler attempts to maintain.
[TARGET_TYPE] is the value type for the metric. You can set the autoscaler to monitor the metric as a GAUGE, by the DELTA_PER_MINUTE of the value, or by the DELTA_PER_SECOND of the value.

Example: Using instance assignment to scale based on a Pub/Sub queue

Assume the following setup:

An active Google Cloud Pub/Sub topic receives messages from some source.
An active Google Cloud Pub/Sub subscription is connected to the topic in a pull configuration. The subscription is named our-subscription.
A pool of workers is pulling messages from that subscription and processing them. The pool is a single-zone managed instance group named our-instance-group and is located in zone us-central1-a. The pool must not exceed 100 workers, and should scale down to 0 workers when there are no messages in the queue.
On average, a worker processes a single message in one minute.

To determine the optimal instance assignment value, consider several approaches:

To process all messages in the queue as fast as possible, you can choose 1 as the instance assignment value. This creates one instance for each message in the queue (limited to the maximum number of instances in our group). However, this can cause overprovisioning. In the worst case, an instance is created to process just one message before the autoscaler shuts it down, which consumes resources for much longer than doing actual work.
- Note that if the workers were able to process multiple messages concurrently, it would make sense to increase the value to the number of concurrent processes.
- Note that, in this example, it does not make sense to set the value below 1 because one message cannot be processed by more than one worker.
Alternatively, if processing latency is less important than resource utilization and overhead costs, you can calculate how many messages each instance must process within its lifetime to be considered efficiently utilized. Take into account startup and shutdown time and the fact that autoscaling does not immediately delete instances. For example, assuming that startup and shutdown time takes about 5 minutes in total and assuming that autoscaling deletes instances only after a period of approximately 10 minutes, you calculate that it is efficient to create an additional instance in the group as long as it can process at least 15 messages before the autoscaler shuts it down, which results in at most 25% overhead due to the total time it takes to create, start, and shutdown the instance. In this case, you can choose 15 as the instance assignment value.
Both approaches can be balanced out, resulting in a number between 1 and 15, depending on which factor takes priority, processing latency versus resource utilization.

Looking at the available Pub/Sub metrics, we find a metric that represents the subscription queue length: subscription/num_undelivered_messages.

Note that this metric exports the total number of messages in the queue, including messages that are currently being processed but that are not yet acknowledged. Using a metric that does not include the messages being processed is not recommended because such a metric can drop down to 0 when there is still work being done, which prompts autoscaling to scale down and possibly interrupt the actual work.

You can now configure autoscaling for the queue:

gcloud beta compute instance-groups managed set-autoscaling \
    our-instance-group \
    --zone=us-central1-a \
    --max-num-replicas=100 \
    --min-num-replicas=0 \
    --update-stackdriver-metric=pubsub.googleapis.com/subscription/num_undelivered_messages \
    --stackdriver-metric-filter="resource.type = pubsub_subscription AND resource.label.subscription_id = our-subscription" \
    --stackdriver-metric-single-instance-assignment=15

Example: Using a utilization target to scale based on average latency

There might be a situation when the metric providing a relevant signal does not represent a total amount of available work or another resource applicable to the group, as in the previous example, but instead an average, a percentile, or some other statistical property. For this example, assume you will scale based on the group's average processing latency.

Assume the following setup:

A managed instance group named our-instance-group is assigned to perform a particular task. The group is located in zone us-central1-a.
You have a Stackdriver Monitoring custom metric that exports a value that you would like to maintain at a particular level. For this example, assume the metric represents the average latency of processing queries assigned to the group.
- The custom metric is named: custom.googleapis.com/example_average_latency.
- The custom metric has a label with a key named group_name and value equal to the instance group's name, our-instance-group.
- The custom metric exports data for the global Monitored Resource, that is, it is not associated with any specific instance.

You have determined that when the metric value goes above some specific value, you need to add more instances to the group to handle the load, while when it goes below that value, you can free up some resources. Autoscaling gradually add or remove instances at a rate that is proportional to how much the metric is above or below the target. For this example, assume that the calculated target value is 100.

You can now configure autoscaling for the group using a per-group utilization target of 100, which represents the metric value that the autoscaler must attempt to maintain:

gcloud beta compute instance-groups managed set-autoscaling \
    our-instance-group \
    --zone=us-central1-a \
    --max-num-replicas=100 \
    --min-num-replicas=0 \
    --update-stackdriver-metric=custom.googleapis.com/example_average_latency \
    --stackdriver-metric-filter "resource.type = global AND metric.label.group_name = our-instance-group" \
    --stackdriver-metric-utilization-target=100 \
    --stackdriver-metric-utilization-target-type=delta-per-second

Jun	JUL	Aug
	01
2018	2019	2020

Scaling Based on Stackdriver Monitoring Metrics

Before you begin

Per-instance metrics

Standard per-instance metrics

Custom metrics

Prerequisites

Choose a valid custom metric

Configuring autoscaling using per-instance monitoring metrics

Console

gcloud

API

Filtering per-instance metrics

Per-instance metric filtering requirements

Configuring autoscalers to filter metrics

Console

gcloud

API

Per-group metrics

Filtering per-group metrics

Per-group metric filtering requirements

Configuring autoscaling using per-group monitoring metrics

Console

gcloud

API

Example: Using instance assignment to scale based on a Pub/Sub queue

Example: Using a utilization target to scale based on average latency

Send feedback about...