You can set up autoscaler to scale based on the following metric types:
- Scale using per-instance metrics where the selected metric provides data for each instance in the managed instance group indicating resource utilization.
- Scale using per-group metrics (Beta) where the group scales based on a metric that provides a value related to the whole managed instance group.
These metrics can be either standard metrics provided by the Stackdriver Monitoring service, or custom Stackdriver Monitoring metrics that you create.
Before you begin
- If you want to use the command-line examples in this guide:
- Install or update to the latest version of the gcloud command-line tool.
- Set a default region and zone.
- If you want to use the API examples in this guide, set up API access.
- Read the Before you begin section of the Autoscaling Overview topic for important setup steps.
Per-instance metrics
Per-instance metrics provide data for each instance in a group separately. The metrics provide data for each instance in the managed instance group indicating resource utilization. For per-instance metrics, the instance group cannot scale below a size of 1 because the autoscaler requires metrics about at least one running instance in order to operate.
If you need to scale using other Stackdriver metrics that are not specific to individual instances or scale your instance groups down to zero instances from time to time, you can configure your instances to scale using per-group metrics instead.
Standard per-instance metrics
Stackdriver Monitoring has a set of standard metrics that you can use to monitor your virtual machine instances. However, not all standard metrics are a valid utilization metric that the autoscaler can use.
A valid utilization metric for scaling meets the following criteria:
-
The standard metric must contain data for a
gce_instancemonitored resource. You can use thetimeSeries.listAPI call to verify whether a specific metric exports data for this resource. -
The standard metric describes how busy an instance is, and the metric value increases or decreases proportionally to the number of virtual machine instances in the group.
The following is an invalid metric because the value does not change based on utilization and the autoscaler cannot use the value to scale proportionally:
compute.googleapis.com/instance/cpu/reserved_cores
After you select a standard metric you want to use for your autoscaler, you can configure autoscaling using that metric.
Custom metrics
You can create custom metrics using Stackdriver Monitoring and write your own monitoring data to the Stackdriver Monitoring service. This gives you side-by-side access to standard Cloud Platform data and your custom monitoring data, with a familiar data structure and consistent query syntax. If you have a custom metric, you can choose to scale based on the data from these metrics.
Prerequisites
In order to use custom metrics, you must have done the following:
- Created a custom metric. For information on creating a custom metric, see the Custom Metrics documentation.
- Set up your managed instance group to export the custom metric from all instances in the managed instance group.
Choose a valid custom metric
Not all custom metrics can be used by the autoscaler. To choose a valid custom metric, the metric must have all of the following properties:
- The metric must be a per-instance metric. The metric must export data relevant to each specific Compute Engine instance separately.
- The exported per-instance values must be associated with a
gce_instance
monitored resource, which contains the following labels:zonewith the name of the zone the instance is in.instance_idwith the value of unique numerical ID assigned to the instance.
- The metric must export data at least every 60 seconds. You can export data more often than 60 seconds and the autoscaler will be able to respond faster to load changes. If you export your data less than every 60 seconds, the autoscaler might not be able to respond quickly enough to load changes.
- The metric must be a valid utilization metric, which means that data from the metric can be used to proportionally scale up or down the number of virtual machines.
- The metric must export
int64ordoubledata values.
For autoscaler to work with your custom metric, you must export data for this custom metric from all the instances in the managed instance group.
curl http://metadata.google.internal/computeMetadata/v1/instance/id -H Metadata-Flavor:GoogleFor more information on using the metadata server, see Metadata Server.
Configuring autoscaling using per-instance monitoring metrics
The process of setting up an autoscaler for a standard or custom metric is the same. To create an autoscaler that uses Stackdriver Monitoring metrics, you must provide the metric identifier, the desired target utilization level, and the utilization target type. Each of these properties are described briefly below:
-
Metric identifier: The name of the metric to use. If you use a custom metric, you defined this name when you initially created the metric. The identifier has the following format:
custom.googleapis.com/path/to/metricSee Using Custom Metrics for more information about creating, browsing, and reading metrics.
-
Target utilization level: The target utilization level that the autoscaler must maintain for this metric. This must be a positive number. For example, both
24.5and1100are acceptable values. Note that this is different from CPU and load balancing utilization, which must be a float value between 0.0 and 1.0. -
Target type: This defines how the autoscaler computes the data collected from the instances. The possible target types are:
GAUGE: The autoscaler computes the average value of the data collected in last couple minutes and compares that to the target utilization value of the autoscaler.DELTA_PER_MINUTE: The autoscaler calculates the average rate of growth per minute and compares that to the target utilization.DELTA_PER_SECOND: The autoscaler calculates the average rate of growth per second and compares that to the target utilization.
If you expressed your desired target utilization in seconds, you will want to use
DELTA_PER_SECONDand likewise, useDELTA_PER_MINUTEif you expressed your target utilization in minutes, so the autoscaler can perform accurate comparisons.
Console
The instructions for configuring autoscaling are different for regional versus single-zone managed instance groups. Regional managed instance groups do not support filtering for per-instance metrics.
To configure autoscaling for a regional (multi-zone) managed instance group:
- Go to the Instance Groups page.
- If you do not have an instance group, create one. Otherwise, click the name of an instance group from the list to open the instance group details page. The group must be a regional group.
- On the instance group details page, click the Edit Group button.
- Under Autoscaling, select On to enable autoscaling.
- In the Autoscale based on section, select Stackdriver monitoring metric.
- In the Metric identifier section, enter the metric name in the
following format:
example.googleapis.com/path/to/metric. - In the Target section, specify the target value.
- In the Target type section, specify the target type that corresponds to the metric's kind of measurement.
- Save your changes when you are ready.
To configure autoscaling for a single-zone managed instance group:
- Go to the Instance Groups page.
- If you do not have an instance group, create one. Otherwise, click the name of an instance group to open the instance group details page. The instance group must be single-zone.
- On the instance group details page, click the Edit Group button.
- Under Autoscaling, select On to enable autoscaling.
- In the Autoscale based on section, select Stackdriver monitoring metric.
- In the Metric export scope section, select Time series per instance to configure autoscaling using per-instance metrics.
- In the Metric identifier section, enter the metric name in the
following format:
example.googleapis.com/path/to/metric. - In the Additional filter expression section, optionally enter a filter to use individual values from metrics with multiple streams or labels. See Filtering per-instance metrics for more information.
- In the Utilization target section, specify the target value.
- In the Utilization target type section, verify that the target type corresponds to the metric's kind of measurement.
- Save your changes when you are ready.
gcloud
For example, in gcloud, the following command creates an autoscaler that
uses the GAUGE target type. Along with the --custom-metric-utilization
parameter, the --max-num-replicas parameter is also required when creating
an autoscaler:
gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
--custom-metric-utilization metric=example.googleapis.com/path/to/metric,utilization-target-type=GAUGE,utilization-target=10 \
--max-num-replicas 20 \
--cool-down-period 90
Optionally, you can use the --cool-down-period flag, which tells the
autoscaler how many seconds to wait after a new virtual machine has started
before the autoscaler starts collecting usage information from it. This
accounts for the amount of time it might take for the virtual machine to
initialize, during which the collected usage is not reliable for
autoscaling. The default cool down period is 60 seconds.
For multi-zonal managed instance groups, use the --region flag to specify
where to find the instance group. For example:
gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
--custom-metric-utilization metric=example.googleapis.com/path/to/metric,utilization-target-type=GAUGE,utilization-target=10 \
--max-num-replicas 20 \
--cool-down-period 90 \
--region us-central1
To see a full list of available gcloud commands and flags, see the
gcloud reference.
API
Note: Although autoscaling is a feature of managed instance groups, it is a separate API resource. Keep that in mind when you construct API requests for autoscaling.
In the API, make a POST request to the following URL, replacing
myproject with your own project ID and us-central1-f with the
zone of your choice:
POST https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers/
Your request body must contain the name, target, and autoscalingPolicy
fields. In autoscalingPolicy, provide the maxNumReplicas and the
customMetricUtilizations properties.
Optionally, you can use the coolDownPeriodSec parameter, which tells the
autoscaler how many seconds to wait after a new instance has started before
it starts to collect usage. After the cool-down period passes, the
autoscaler begins to collect usage information from the new instance and
determines if the group requires additional instances. This accounts for
the amount of time it might take for the instance to initialize, during
which the collected usage is not reliable for autoscaling. The
default cool-down period is 60 seconds.
POST https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers
{
"name": "example-autoscaler",
"target": "zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
"autoscalingPolicy": {
"maxNumReplicas": 10,
"coolDownPeriodSec": 90,
"customMetricUtilizations": [
{
"metric": "example.googleapis.com/some/metric/name",
"utilizationTarget": 10,
"utilizationTargetType": "GAUGE"
} ]
}
}
Filtering per-instance metrics
You can apply filters to per-instance Stackdriver metrics, which allows you to scale single-zone managed instance groups using individual values from metrics with multiple streams or labels.
Per-instance metric filtering requirements
Autoscaler filtering is compatible with the Stackdriver Monitoring filter syntax. The filters for per-instance metrics must meet the following requirements:
- You can use only the
ANDoperator for joining selectors. - You can use only the
=direct equality comparison operator, but you cannot use the operator with any functions. For example, you cannot use thestartswith()function with the=comparison operator. - You must not set the
resource.typeorresource.label.*selectors. Per-instance metrics always use all of instance resources from the group. - For best results, the filter should be specific enough to return a single time series for each instance. If the filter returns multiple time series, they are added together.
Configuring autoscalers to filter metrics
Use the Google Cloud Platform Console, the
gcloud beta command-line tool,
or the
Compute Engine Beta API to
add metric filters for autoscaling of a single-zone managed instance group.
Console
The process for creating an autoscaler that filters a per-instance metric is
similar to creating a normal per-instance
autoscaler, but you also specify a metric filter. For example, the
compute.googleapis.com/instance/network/received_bytes_count
metric includes the instance_name and loadbalanced labels. To filter
based on the loadbalanced boolean:
- Go to the Instance Groups page.
- If you do not have an instance group, create one. Otherwise, click the name of an instance group to open the instance group details page. The instance group must be single-zone.
- On the instance group details page, click the Edit Group button.
- Under Autoscaling, select On to enable autoscaling.
- In the Autoscale based on section, select Stackdriver monitoring metric.
- In the Metric export scope section, select Time series per instance to configure autoscaling using per-instance metrics.
- In the Metric identifier section, enter the metric name. For example,
compute.googleapis.com/instance/network/received_bytes_count. - In the Additional filter expression section, enter a
filter. For example,
'metric.label.loadbalanced = true'. - Save your changes when you are ready.
gcloud
The process for creating an autoscaler that filters a per-instance metric is
similar to creating a normal per-instance
autoscaler, but you must specify a metric filter and individual flags for
the utilization target and target type. For example, the
compute.googleapis.com/instance/network/received_bytes_count
metric includes the instance_name and loadbalanced labels. To filter
based on the loadbalanced boolean, specify the
--stackdriver-metric-filter filter flag with the
'metric.label.loadbalanced = true' value. Include the
utilization target and target type flags individually.
gcloud beta compute instance-groups managed set-autoscaling example-managed-instance-group \
--update-stackdriver-metric=compute.googleapis.com/instance/network/received_bytes_count \
--stackdriver-metric-utilization-target-utilization-target=10 \
--stackdriver-metric-utilization-target-type=DELTA_PER_SEC \
--stackdriver-metric-filter='metric.label.loadbalanced = true' \
--max-num-replicas 20 \
--cool-down-period 90
This example configures autoscaling to use only the loadbalanced
traffic data as part of the utilization target.
To see a full list of available gcloud commands and flags, see the
gcloud beta reference.
API
Note: Although autoscaling is a feature of managed instance groups, it is a separate API resource. Keep that in mind when you construct API requests for autoscaling.
The process for creating an autoscaler that filters a per-instance metric is
similar to creating a normal per-instance
autoscaler, but you must specify a metric filter and individual flags for
the utilization target and target type. For example, the
compute.googleapis.com/instance/network/received_bytes_count
metric includes the instance_name and loadbalanced labels. To filter
based on the loadbalanced boolean, specify the filter parameter
with the "metric.label.loadbalanced = true" value.
In the API, make a POST request to the following URL, replacing
myproject with your own project ID and us-central1-f with the
zone of your choice. The request body must contain the name, target,
and autoscalingPolicy fields. In autoscalingPolicy, provide the
maxNumReplicas and the customMetricUtilizations properties.
POST https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/autoscalers
{
"name": "example-autoscaler",
"target": "zones/us-central1-f/instanceGroupManagers/example-managed-instance-group",
"autoscalingPolicy": {
"maxNumReplicas": 10,
"coolDownPeriodSec": 90,
"customMetricUtilizations": [
{
"metric": "compute.googleapis.com/instance/network/received_bytes_count",
"filter": "metric.label.loadbalanced = true",
"utilizationTarget": 10,
"utilizationTargetType": "DELTA_PER_SEC"
}
]
}
}
This example configures autoscaling to use only the loadbalanced
traffic data as part of the utilization target.
Per-group metrics
Per-group metrics allow autoscaling with a standard or custom metric that does not export per-instance utilization data. Instead, the group scales based on a value that applies to the whole group and corresponds to how much work is available for the group or how busy the group is. The group scales based on the fluctuation of that group metric value and the configuration that you define.
When you configure autoscaling on per-group metrics, you must indicate how you want the autoscaler to provision instances relative to the metric:
- Instance assignment: Specify an instance assignment to indicate that you
want the autoscaler to add or remove instances depending on how much work
is available to assign to each instance. Specify a value for this parameter
that represents how much work you expect each instance can handle.
For example, specify
2to assign two units of work to each instance, or specify0.5to assign half a unit of work to each instance. The autoscaler adds enough instances to the managed instance group to ensure that there are enough instances to complete the available work as indicated by the metric. If the metric value is10and you assigned0.5units of work to each instance, the autoscaler creates 20 instances in the managed instance group. Scaling with instance assignment allows the instance group to shrink to0instance when the metric value drops down to0- and back up again when it rises above0. The following diagram shows the proportional relationship between metric value and number of instances when scaling with an instance assignment policy. - Utilization target: Specify a utilization target to indicate that you
want the autoscaler to add or remove instances to try and maintain the metric
at a specified value. When the metric is above the specified target,
autoscaler gradually adds instances until the metric decreases to the target
value. When the metric is below the specified target value, autoscaler
gradually removes instances until the metric increases to the target value.
Scaling with a utilization target cannot shrink the group to
0instances. The following diagram shows how autoscaler adds and removes instances in response to a metric value in order to maintain a utilization target.
Each option has the following use cases:
- Instance assignment: Scale the size of your managed instance groups based on the number of unacknowledged messages in a Google Pub/Sub subscription or a total QPS rate of a network endpoint.
- Utilization target: Scale the size of your managed instance groups based on a utilization target for a custom metric that does not come from the standard per-instance CPU or memory use metrics. For example, you might scale the group based on a custom latency metric.
When you configure autoscaling with per-group metrics and you specify an instance assignment, your instance groups can scale down to 0 instances. If your metric indicates that there is no work for your instance group to complete, the group will scale down to 0 instances until the metric detects that new work is available. In contrast to per-group instance assignment, per-instance autoscaling requires resource utilization metrics from at least one instance, so the group cannot scale below a size of 1.
Filtering per-group metrics
You can apply filters to per-group Stackdriver metrics, which allows you to scale managed instance groups using individual values from metrics that have multiple streams or labels.
Per-group metric filtering requirements
Autoscaler filtering is compatible with the Stackdriver Monitoring filter syntax. The filters for per-group metrics must meet the following requirements:
- You can use only the
ANDoperator for joining selectors. - You cannot use the
=direct equality comparison operator with any functions for each selector. - You can specify a metric type selector of
metric.type = "..."in the filter and also include the originalmetricfield. Optionally, you can use only themetricfield. The metric must meet the following requirements:- The metric must be specified at least in one place.
- The metric can be specified in both places, but must be equal.
- You must specify the
resource.typeselector, but you cannot set it togce_instanceif you want to scale using per-group metrics. - For best results, the filter should be specific enough to return a single time series for the group. If the filter returns multiple time series, they are added together.
Configuring autoscaling using per-group monitoring metrics
Use the Google Cloud Platform Console, the
gcloud beta command-line tool,
or the
Compute Engine Beta API to
configure autoscaling with per-group metrics for a single-zone managed instance
group.
Console
- Go to the Instance Groups page.
- If you do not have an instance group, create one. Otherwise, click the name of an instance group to open the instance group details page. The instance group must be single-zone.
- On the instance group details page, click the Edit Group button.
- Under Autoscaling, select On to enable autoscaling.
- In the Autoscale based on section, select Stackdriver monitoring metric.
- In the Metric export scope section, select Single time series per group.
- In the Metric identifier section, specify the metric name in the
following format:
example.googleapis.com/path/to/metric. - Specify the Metric resource type.
- Provide an additional filter expression to use individual values from metrics that have multiple streams or labels. The filter must meet the autoscaler filtering requirements.
- In the Scaling policy section, select either Instance assignment
or Utilization target.
- If you select an instance assignment policy, then provide a Single
instance assignment value that represents the amount of work to assign
to each instance in the managed instance group. For example, specify
2to assign two units of work to each instance. The autoscaler maintains enough instances to complete the available work (as indicated by the metric). If the metric value is10and you assigned2units of work to each instance, the autoscaler creates5instances in the managed instance group. - If you select a utilization target policy:
- Provide a Utilization target value that represents the metric value that the autoscaler should try to maintain.
- Select the Utilization target type that represents the value type for the metric.
- If you select an instance assignment policy, then provide a Single
instance assignment value that represents the amount of work to assign
to each instance in the managed instance group. For example, specify
- Save your changes when you are ready.
gcloud
Create an autoscaler for a managed instance group similarly to the
per-instance autoscaler, but specify the
--update-stackdriver-metric flag. You can specify how you want the
autoscaler to provision instances by including one of the following
flags:
- Instance assignment: Specify the
--stackdriver-metric-single-instance-assignmentflag. - Utilization target: Specify the
--stackdriver-metric-utilization-targetflag.
Instance assignment:
Specify a metric that you want to measure and specify the
--stackdriver-metric-single-instance-assignment flag to indicate
the amount of work that you expect each instance to handle. You must also
specify a filter for the metric using the
--stackdriver-metric-filter flag.
gcloud beta compute instance-groups managed set-autoscaling [GROUP_NAME] \
--zone=[ZONE] \
--max-num-replicas=[MAX_INSTANCES] \
--min-num-replicas=[MIN_INSTANCES] \
--update-stackdriver-metric='[METRIC_URL]' \
--stackdriver-metric-filter='[METRIC_FILTER]' \
--stackdriver-metric-single-instance-assignment=[INSTANCE_ASSIGNMENT]
where:
[GROUP_NAME]is the name of the managed instance group where you want to add an autoscaler.[ZONE]is the zone where the managed instance group is located. You cannot specify a region for autoscalers on per-group metrics.[MAX_INSTANCES]is the limit on the number of instances that the autoscaler can add to the managed instance group.[MIN_INSTANCES]is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.[METRIC_URL]is a protocol-free URL of a Google Cloud Monitoring metric.[METRIC_FILTER]is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevantTimeSeriesand aMonitoredResource. The filter must meet the autoscaler filtering requirements.[INSTANCE_ASSIGNMENT]is the amount of work to assign to each instance in the managed instance group. For example, specify2to assign two units of work to each instance, or specify0.5to assign half a unit of work to each instance. The autoscaler adds enough instances to the managed instance group to ensure that there are enough instances to complete the available work, which is indicated by the metric. If the metric value is10and you've assigned0.5units of work to each instance, the autoscaler provisions20instances in the managed instance group.
Utilization target:
In some situations, you might want to use utilization targets with
per-group metrics rather than specify a number of instances relative
to the value of the metric that your autoscaler measures. You can
still point the autoscaler to a per-group metric, but the autoscaler
attempts to maintain the specified utilization target. Specify the target
and target type with the --stackdriver-metric-utilization-target flag.
You must also specify a filter for the metric using the
--stackdriver-metric-filter flag.
gcloud beta compute instance-groups managed set-autoscaling [GROUP_NAME] \
--zone=[ZONE] \
--max-num-replicas=[MAX_INSTANCES] \
--min-num-replicas=[MIN_INSTANCES] \
--update-stackdriver-metric='[METRIC_URL]' \
--stackdriver-metric-filter='[METRIC_FILTER]' \
--stackdriver-metric-utilization-target=[TARGET_VALUE] \
--stackdriver-metric-utilization-target-type=[TARGET_TYPE]
where:
[GROUP_NAME]is the name of the managed instance group where you want to add an autoscaler.[ZONE]is the zone where the managed instance group is located. You cannot specify a region for autoscalers on per-group metrics.[MAX_INSTANCES]is the limit on the number of instances that the autoscaler can add to the managed instance group.[MIN_INSTANCES]is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.[METRIC_URL]is a protocol-free URL of a Google Cloud Monitoring metric.[METRIC_FILTER]is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevantTimeSeriesand aMonitoredResource. You must specify aresource.typevalue, but you cannot specifygce_instanceif you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.[TARGET_VALUE]is the metric value that the autoscaler attempts to maintain.[TARGET_TYPE]is the value type for the metric. You can set the autoscaler to monitor the metric as aGAUGE, by thedelta-per-minuteof the value, or by thedelta-per-secondof the value.
To see a full list of available autoscaler gcloud commands and flags
that work with per-group autoscaling, see the
gcloud beta reference.
API
Note: Although autoscaling is a feature of managed instance groups, autoscalers are a separate API resource. Keep that in mind when you construct API requests for autoscaling.
Create an autoscaler for a managed instance group. You can specify how you want the autoscaler to provision instances by including one of the following parameters:
- Instance assignment: Specify the
singleInstanceAssignmentparameter. - Utilization target: Specify the
utilizationTargetparameter.
Instance assignment:
In the API, make a POST request to create an autoscaler.
In the request body, include the normal parameters that you would use to
create a per-instance autoscaler, but specify the
single-instance-assignment parameter. The parameter specifies the amount
of work that you expect each instance to handle.
POST https://www.googleapis.com/compute/beta/projects/[PROJECT_ID]/zones/[ZONE]/autoscalers
{
"name": "example-autoscaler",
"target": "zones/[ZONE]/instanceGroupManagers/[GROUP_NAME]",
"autoscalingPolicy": {
"maxNumReplicas": [MAX_INSTANCES],
"minNumReplicas": [MIN_INSTANCES],
"customMetricUtilizations": [
{
"metric": "[METRIC_URL]",
"filter": "[METRIC_FILTER]",
"singleInstanceAssignment": [INSTANCE_ASSIGNMENT]
}
],
}
}
where:
[PROJECT_ID]is your project ID.[ZONE]is the zone where the managed instance group is located.[GROUP_NAME]is the name of the managed instance group where you want to add an autoscaler.[MAX_INSTANCES]is the limit on the number of instances that the autoscaler can add to the managed instance group.[MIN_INSTANCES]is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.[METRIC_URL]is a protocol-free URL of a Google Cloud Monitoring metric.[METRIC_FILTER]is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevantTimeSeriesand aMonitoredResource. You must specify aresource.typevalue, but you cannot specifygce_instanceif you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.[INSTANCE_ASSIGNMENT]is the amount of work to assign to each instance in the managed instance group. For example, specify2to assign two units of work to each instance, or specify0.5to assign half a unit of work to each instance. The autoscaler adds enough instances to the managed instance group to ensure that there are enough instances to complete the available work, which is indicated by the metric. If the metric value is10and you've assigned0.5units of work to each instance, the autoscaler provisions20instances in the managed instance group.
Utilization target:
In some situations, you might want to use utilization targets with
per-group metrics rather than specify a number of instances relative
to the value of the metric that your autoscaler measures. You can
still point the autoscaler to a per-group metric, but the autoscaler
attempts to maintain the specified utilization target. Specify
those targets with the utilizationTarget parameter. You must also
specify a filter for the metric using the filter parameter.
POST https://www.googleapis.com/compute/beta/projects/[PROJECT_ID]/zones/[ZONE]/autoscalers
{
"name": "example-autoscaler",
"target": "zones/[ZONE]/instanceGroupManagers/[GROUP_NAME]",
"autoscalingPolicy": {
"maxNumReplicas": [MAX_INSTANCES],
"minNumReplicas": [MIN_INSTANCES],
"customMetricUtilizations": [
{
"metric": "[METRIC_URL]",
"filter": "[METRIC_FILTER]",
"utilizationTarget": [TARGET_VALUE],
"utilizationTargetType": [TARGET_TYPE]
}
],
}
}
where:
[GROUP_NAME]is the name of the managed instance group where you want to add an autoscaler.[ZONE]is the zone where the managed instance group is located.[MAX_INSTANCES]is the limit on the number of instances that the autoscaler can add to the managed instance group.[MIN_INSTANCES]is the limit on the minimum number of instances that the autoscaler can have in the managed instance group.[METRIC_URL]is a protocol-free URL of a Google Cloud Monitoring metric.[METRIC_FILTER]is a Stackdriver Monitoring filter where you specify a monitoring filter with a relevantTimeSeriesand aMonitoredResource. You must specify aresource.typevalue, but you cannot specifygce_instanceif you want to scale using per-group metrics. The filter must meet the autoscaler filtering requirements.[TARGET_VALUE]is the metric value that the autoscaler attempts to maintain.[TARGET_TYPE]is the value type for the metric. You can set the autoscaler to monitor the metric as aGAUGE, by theDELTA_PER_MINUTEof the value, or by theDELTA_PER_SECONDof the value.
Example: Using instance assignment to scale based on a Pub/Sub queue
Assume the following setup:
- An active Google Cloud Pub/Sub topic receives messages from some source.
- An active Google Cloud Pub/Sub subscription is connected to the topic in a
pull configuration. The subscription is named
our-subscription. - A pool of workers is pulling messages from that subscription and processing
them. The pool is a single-zone managed instance group named
our-instance-groupand is located in zoneus-central1-a. The pool must not exceed 100 workers, and should scale down to 0 workers when there are no messages in the queue. - On average, a worker processes a single message in one minute.
To determine the optimal instance assignment value, consider several approaches:
- To process all messages in the queue as fast as possible, you can choose
1as the instance assignment value. This creates one instance for each message in the queue (limited to the maximum number of instances in our group). However, this can cause overprovisioning. In the worst case, an instance is created to process just one message before the autoscaler shuts it down, which consumes resources for much longer than doing actual work.- Note that if the workers were able to process multiple messages concurrently, it would make sense to increase the value to the number of concurrent processes.
- Note that, in this example, it does not make sense to set the value below
1because one message cannot be processed by more than one worker.
- Alternatively, if processing latency is less important than resource
utilization and overhead costs, you can calculate how many messages each
instance must process within its lifetime to be considered efficiently
utilized. Take into account startup and shutdown time and the fact that
autoscaling does not immediately delete instances. For example, assuming that
startup and shutdown time takes about 5 minutes in total and assuming that
autoscaling deletes instances only after a period of approximately 10 minutes,
you calculate that it is efficient to create an additional instance in the
group as long as it can process at least 15 messages before the autoscaler
shuts it down, which results in at most 25% overhead due to the total time
it takes to create, start, and shutdown the instance. In this case, you can
choose
15as the instance assignment value. - Both approaches can be balanced out, resulting in a number between
1and15, depending on which factor takes priority, processing latency versus resource utilization.
Looking at the available Pub/Sub metrics,
we find a metric that represents the subscription queue length:
subscription/num_undelivered_messages.
Note that this metric exports the total number of messages in the queue, including messages that are currently being processed but that are not yet acknowledged. Using a metric that does not include the messages being processed is not recommended because such a metric can drop down to 0 when there is still work being done, which prompts autoscaling to scale down and possibly interrupt the actual work.
You can now configure autoscaling for the queue:
gcloud beta compute instance-groups managed set-autoscaling \
our-instance-group \
--zone=us-central1-a \
--max-num-replicas=100 \
--min-num-replicas=0 \
--update-stackdriver-metric=pubsub.googleapis.com/subscription/num_undelivered_messages \
--stackdriver-metric-filter="resource.type = pubsub_subscription AND resource.label.subscription_id = our-subscription" \
--stackdriver-metric-single-instance-assignment=15
Example: Using a utilization target to scale based on average latency
There might be a situation when the metric providing a relevant signal does not represent a total amount of available work or another resource applicable to the group, as in the previous example, but instead an average, a percentile, or some other statistical property. For this example, assume you will scale based on the group's average processing latency.
Assume the following setup:
- A managed instance group named
our-instance-groupis assigned to perform a particular task. The group is located in zoneus-central1-a. - You have a Stackdriver Monitoring custom metric
that exports a value that you would like to maintain at a particular level. For
this example, assume the metric represents the average latency of processing
queries assigned to the group.
- The custom metric is named:
custom.googleapis.com/example_average_latency. - The custom metric has a label with a key named
group_nameand value equal to the instance group's name,our-instance-group. - The custom metric exports data for the global Monitored Resource, that is, it is not associated with any specific instance.
- The custom metric is named:
You have determined that when the metric value goes above some specific value,
you need to add more instances to the group to handle the load, while when it
goes below that value, you can free up some resources. Autoscaling gradually
add or remove instances at a rate that is proportional to how much the metric is
above or below the target. For this example, assume that the calculated target
value is 100.
You can now configure autoscaling for the group using a per-group utilization
target of 100, which represents the metric value that the autoscaler must
attempt to maintain:
gcloud beta compute instance-groups managed set-autoscaling \
our-instance-group \
--zone=us-central1-a \
--max-num-replicas=100 \
--min-num-replicas=0 \
--update-stackdriver-metric=custom.googleapis.com/example_average_latency \
--stackdriver-metric-filter "resource.type = global AND metric.label.group_name = our-instance-group" \
--stackdriver-metric-utilization-target=100 \
--stackdriver-metric-utilization-target-type=delta-per-second


