BERJAYA

Monitoring a Cloud Bigtable Instance

You can monitor your Cloud Bigtable instance visually, using the charts that are available in the Google Cloud Platform Console and Stackdriver Monitoring, or programmatically, using Stackdriver Monitoring.

The data available through the Google Cloud Platform Console and Stackdriver Monitoring provides a high-level overview of your Cloud Bigtable usage. You can also use the Key Visualizer tool to drill down into your access patterns by row key and troubleshoot specific performance issues. For details, see Getting Started with Key Visualizer.

Understanding CPU and disk usage

No matter what tools you use to monitor your instance, it's essential to monitor the CPU and disk usage for each cluster in the instance. If a cluster's CPU or disk usage exceeds certain thresholds, the cluster will not perform well, and it might return errors when you try to read or write data.

CPU usage

The nodes in your clusters use CPU resources to handle reads, writes, and administrative tasks. To learn more about how the number of nodes affects a cluster's performance, see Performance for typical workloads.

Cloud Bigtable reports the following metrics for CPU usage:

Metric	Description
Average CPU utilization	The average CPU utilization across all nodes in the cluster. The recommended maximum values provide headroom for brief spikes in usage. If a cluster exceeds the recommended maximum value for your configuration for more than a few minutes, add nodes to the cluster.
CPU utilization of hottest node	CPU utilization for the busiest node in the cluster. If the hottest node is frequently above the recommended value, even when your average CPU utilization is reasonable, you might be accessing a small part of your data much more frequently than the rest of your data. Use the Key Visualizer tool to identify hotspots in your table that might be causing spikes in CPU utilization. Check your schema design to make sure it supports an even distribution of reads and writes across each table.

Metric

Description

Average CPU utilization

The average CPU utilization across all nodes in the cluster.

The recommended maximum values provide headroom for brief spikes in usage.

If a cluster exceeds the recommended maximum value for your configuration for more than a few minutes, add nodes to the cluster.

CPU utilization of hottest node

CPU utilization for the busiest node in the cluster.

If the hottest node is frequently above the recommended value, even when your average CPU utilization is reasonable, you might be accessing a small part of your data much more frequently than the rest of your data.

Use the Key Visualizer tool to identify hotspots in your table that might be causing spikes in CPU utilization.
Check your schema design to make sure it supports an even distribution of reads and writes across each table.

The values for these metrics should not exceed the following:

Configuration	Recommended maximum values
Single cluster	70% average CPU utilization 90% CPU utilization of the hottest node
Any number of clusters with single-cluster routing	70% average CPU utilization 90% CPU utilization of hottest node
2 clusters with multi-cluster routing	35% average CPU utilization 45% CPU utilization of hottest node
3 or more clusters with multi-cluster routing	Depends on your configuration. See the examples of replication settings for common use cases.

Disk usage

For each cluster in your instance, Cloud Bigtable stores a separate copy of all of the tables in that instance.

Cloud Bigtable tracks disk usage in binary units, such as binary gigabytes (GB), where 1 GB is 2³⁰ bytes. This unit of measurement is also known as a gibibyte (GiB).

Cloud Bigtable reports the following metrics for disk usage:

Metric	Description
Storage utilization (bytes)	The amount of data stored in the cluster. This value affects your costs. Also, as described below, you might need to add nodes to each cluster as the amount of data increases.
Storage utilization (% max)	The percentage of the cluster's storage capacity that is being used. The capacity is based on the number of nodes in your cluster. In general, do not use more than 70% of the hard limit on total storage, so you have room to add more data. If you do not plan to add significant amounts of data to your instance, you can use up to 100% of the hard limit. Important: If any cluster in an instance exceeds the hard limit on the amount of storage per node, writes to all clusters in that instance will fail until you add nodes to each cluster that is over the limit. Also, if you try to remove nodes from a cluster, and the change would cause the cluster to exceed the hard limit on storage, Cloud Bigtable will deny the request. If you are using more than the recommended percentage of the storage limit, add nodes to the cluster. You can also delete existing data, but deleted data takes up more space, not less, until a compaction occurs. For details about how this value is calculated, see Storage utilization per node.
Disk load	The percentage your cluster is using of the maximum possible bandwidth for HDD reads and writes. Available only for HDD clusters. If this value is frequently at 100%, you might experience increased latency. Add nodes to the cluster to reduce the disk load percentage.

Metric

Description

Storage utilization (bytes)

The amount of data stored in the cluster.

This value affects your costs. Also, as described below, you might need to add nodes to each cluster as the amount of data increases.

Storage utilization (% max)

The percentage of the cluster's storage capacity that is being used. The capacity is based on the number of nodes in your cluster.

In general, do not use more than 70% of the hard limit on total storage, so you have room to add more data. If you do not plan to add significant amounts of data to your instance, you can use up to 100% of the hard limit.

If you are using more than the recommended percentage of the storage limit, add nodes to the cluster. You can also delete existing data, but deleted data takes up more space, not less, until a compaction occurs.

For details about how this value is calculated, see Storage utilization per node.

Disk load

The percentage your cluster is using of the maximum possible bandwidth for HDD reads and writes. Available only for HDD clusters.

If this value is frequently at 100%, you might experience increased latency. Add nodes to the cluster to reduce the disk load percentage.

Getting a performance overview with the GCP Console

Use your instance's overview page to understand the current health of your instance's clusters.

The overview page shows the current values of several key metrics for each cluster:

Metric	Description
CPU utilization average	The average CPU utilization across all nodes in the cluster.
CPU utilization of hottest node	CPU utilization for the busiest node in the cluster. Exceeding the recommended maximum for the busiest node can cause latency and other issues for the cluster.
Rows read	The number of rows read per second.
Rows written	The number of rows written per second.
Read throughput	The number of uncompressed bytes per second that were read.
Write throughput	The number of uncompressed bytes per second that were written.
System error rate	The percentage of all requests that failed on the Cloud Bigtable server side.
Replication latency for input	The average amount of time at the 99th percentile, in seconds, between a write to another cluster and the same write being replicated to this cluster.
Replication latency for output	The average amount of time at the 99th percentile, in seconds, between a write to this cluster and the same write being replicated to another cluster.

To see an overview of these key metrics:

Open the list of Cloud Bigtable instances in the GCP Console.

Open the instance list
Click the instance whose metrics you want to view. The GCP Console displays the current metrics for your instance's clusters.

Monitoring performance over time with the GCP Console

Use your instance's monitoring page to understand the past performance of your instance. You can analyze the performance of each cluster, and you can break down the metrics for different types of Cloud Bigtable resources. Charts can display a period ranging from the past 1 hour to the past 30 days.

Charts for Cloud Bigtable resources

The monitoring page provides charts for the following types of Cloud Bigtable resources:

Instances
Tables
Application profiles

Charts are available for the following metrics:

Metric	Available for	Description
CPU utilization	Instances	The average CPU utilization across all nodes in the cluster.
CPU utilization (hottest node)	Instances	CPU utilization for the busiest node in the cluster. Exceeding the recommended maximum for the busiest node can cause latency and other issues for the cluster.
User error rate	Instances	The rate of errors caused by the content of a request, as opposed to errors on the Cloud Bigtable server side. User errors are typically caused by a configuration issue, such as a request that specifies the wrong cluster, table, or app profile. Note: To view this chart, you must group the monitoring data by instance. In the View metrics for drop-down list, select Instance. Then, under Group by, click Instance.
System error rate	Instances Tables App profiles	The percentage of all requests that failed on the Cloud Bigtable server side.
Storage utilization (bytes)	Instances Tables	The amount of data stored in the cluster. This metric reflects the fact that Cloud Bigtable compresses your data when it is stored.
Storage utilization (% max)	Instances	The percentage of the cluster's storage capacity that is being used. The capacity is based on the number of nodes in your cluster. For details about how this value is calculated, see Storage utilization per node.
Disk load	Instances	The percentage your cluster is using of the maximum possible bandwidth for HDD reads and writes. Available only for HDD clusters.
Rows read	Instances Tables App profiles	The number of rows read per second. This metric provides a more useful view of Cloud Bigtable's overall throughput than the number of read requests, because a single request can read a large number of rows.
Rows written	Instances Tables App profiles	The number of rows written per second. This metric provides a more useful view of Cloud Bigtable's overall throughput than the number of write requests, because a single request can write a large number of rows.
Read requests	Instances Tables App profiles	The number of random reads and scan requests per second.
Write requests	Instances Tables App profiles	The number of write requests per second.
Read throughput	Instances Tables App profiles	The number of uncompressed bytes per second that were read.
Write throughput	Instances Tables App profiles	The number of uncompressed bytes per second that were written.
Node count	Instances	The number of nodes in the cluster.

To view metrics for these resources:

Open the list of Cloud Bigtable instances in the GCP Console.

Open the instance list
Click the instance whose metrics you want to view.
In the left pane, click Monitoring. The GCP Console displays a series of charts for the instance, as well as a tabular view of the instance's metrics. By default, the GCP Console shows metrics for the past hour, and it shows separate metrics for each cluster in the instance.

To view all of the charts, scroll through the pane where the charts are displayed.

To view metrics for individual tables or application profiles, click the View metrics for drop-down list, then select Tables or Application profiles.

To view combined metrics for the instance as a whole, find the Group by section above the charts, then click Instance.

To view metrics for a longer period of time, click one of the time scales to the upper right of the charts.

Charts for replication

The monitoring page provides a chart that shows replication latency over time. You can view the average latency for replicating writes at the 50th, 99th, and 100th percentiles.

To view the replication latency over time:

Open the list of Cloud Bigtable instances in the GCP Console.

Open the instance list
Click the instance whose metrics you want to view.
In the left pane, click Monitoring.
In the View metrics for drop-down list, select Replication. The GCP Console displays replication latency over time. By default, the GCP Console shows replication latency for the past hour.

You may see a gray bar covering part of the graph. The bar indicates that replication was not occurring during that period of time, either because there were no incoming writes or because of an issue with the Cloud Bigtable service. Latency metrics during these periods may not be accurate.

To change whether the metrics are aggregated for the instance as a whole or presented separately for each cluster, click one of the buttons under Group by.

To change which percentile to view, click one of the buttons under Percentile.

To view metrics for a longer period of time, click one of the time scales to the upper right of the charts.

Monitoring an instance with Stackdriver Monitoring

Cloud Bigtable exports usage metrics that you can monitor programmatically using Stackdriver Monitoring. You can use the Stackdriver Monitoring API or the Metrics Explorer to track Cloud Bigtable usage metrics. In addition, you can set up alerting policies based on usage metrics, and you can add charts for Cloud Bigtable usage metrics to a custom dashboard.

To view usage metrics in the Metrics Explorer:

Open the Monitoring page in the GCP Console.

Open the Monitoring page

If you are prompted to choose an account, choose the account that you use to access Google Cloud Platform.
Click Resources, then click Metrics Explorer.
Under Find resource type and metric, type bigtable. A list of Cloud Bigtable resources and metrics appears.
Click a metric to view a chart for that metric.

You can also use a graphing library, such as Matplotlib for Python, to plot and analyze the usage metrics for Cloud Bigtable. To learn more, see the tutorial on using Matplotlib with Stackdriver Monitoring and Cloud Bigtable.

For additional information about using Stackdriver Monitoring, see the Stackdriver Monitoring documentation.

What's next

Learn how to programmatically scale your Cloud Bigtable cluster.
Find out how to troubleshoot issues with Key Visualizer.
Learn more about Cloud Bigtable performance.
Read about client-side metrics for the HBase client for Java.
Try the Stackdriver Monitoring quickstart.
Learn about creating alerts based on Cloud Bigtable metrics.

Was this page helpful? Let us know how we did:

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated April 8, 2019.

Mar	APR	May
	12
2018	2019	2020