You can monitor your Cloud Bigtable instance visually, using the charts that are available in the Google Cloud Platform Console and Stackdriver Monitoring, or programmatically, using Stackdriver Monitoring.
The data available through the Google Cloud Platform Console and Stackdriver Monitoring provides a high-level overview of your Cloud Bigtable usage. You can also use the Key Visualizer tool to drill down into your access patterns by row key and troubleshoot specific performance issues. For details, see Getting Started with Key Visualizer.
Understanding CPU and disk usage
No matter what tools you use to monitor your instance, it's essential to monitor the CPU and disk usage for each cluster in the instance. If a cluster's CPU or disk usage exceeds certain thresholds, the cluster will not perform well, and it might return errors when you try to read or write data.
CPU usage
The nodes in your clusters use CPU resources to handle reads, writes, and administrative tasks. To learn more about how the number of nodes affects a cluster's performance, see Performance for typical workloads.
Cloud Bigtable reports the following metrics for CPU usage:
| Metric | Description |
|---|---|
| Average CPU utilization |
The average CPU utilization across all nodes in the cluster. The recommended maximum values provide headroom for brief spikes in usage. If a cluster exceeds the recommended maximum value for your configuration for more than a few minutes, add nodes to the cluster. |
| CPU utilization of hottest node |
CPU utilization for the busiest node in the cluster. If the hottest node is frequently above the recommended value, even when your average CPU utilization is reasonable, you might be accessing a small part of your data much more frequently than the rest of your data.
|
The values for these metrics should not exceed the following:
| Configuration | Recommended maximum values |
|---|---|
| Single cluster |
70% average CPU utilization |
| Any number of clusters with single-cluster routing |
70% average CPU utilization |
| 2 clusters with multi-cluster routing |
35% average CPU
utilization |
| 3 or more clusters with multi-cluster routing |
Depends on your configuration. See the examples of replication settings for common use cases. |
Disk usage
For each cluster in your instance, Cloud Bigtable stores a separate copy of all of the tables in that instance.
Cloud Bigtable tracks disk usage in binary units, such as binary gigabytes (GB), where 1 GB is 230 bytes. This unit of measurement is also known as a gibibyte (GiB).
Cloud Bigtable reports the following metrics for disk usage:
| Metric | Description |
|---|---|
| Storage utilization (bytes) |
The amount of data stored in the cluster. This value affects your costs. Also, as described below, you might need to add nodes to each cluster as the amount of data increases. |
| Storage utilization (% max) |
The percentage of the cluster's storage capacity that is being used. The capacity is based on the number of nodes in your cluster. In general, do not use more than 70% of the hard limit on total storage, so you have room to add more data. If you do not plan to add significant amounts of data to your instance, you can use up to 100% of the hard limit. If you are using more than the recommended percentage of the storage limit, add nodes to the cluster. You can also delete existing data, but deleted data takes up more space, not less, until a compaction occurs. For details about how this value is calculated, see Storage utilization per node. |
| Disk load |
The percentage your cluster is using of the maximum possible bandwidth for HDD reads and writes. Available only for HDD clusters. If this value is frequently at 100%, you might experience increased latency. Add nodes to the cluster to reduce the disk load percentage. |
Getting a performance overview with the GCP Console
Use your instance's overview page to understand the current health of your instance's clusters.
The overview page shows the current values of several key metrics for each cluster:
| Metric | Description |
|---|---|
| CPU utilization average | The average CPU utilization across all nodes in the cluster. |
| CPU utilization of hottest node |
CPU utilization for the busiest node in the cluster. Exceeding the recommended maximum for the busiest node can cause latency and other issues for the cluster. |
| Rows read | The number of rows read per second. |
| Rows written | The number of rows written per second. |
| Read throughput | The number of uncompressed bytes per second that were read. |
| Write throughput | The number of uncompressed bytes per second that were written. |
| System error rate | The percentage of all requests that failed on the Cloud Bigtable server side. |
| Replication latency for input | The average amount of time at the 99th percentile, in seconds, between a write to another cluster and the same write being replicated to this cluster. |
| Replication latency for output | The average amount of time at the 99th percentile, in seconds, between a write to this cluster and the same write being replicated to another cluster. |
To see an overview of these key metrics:
Open the list of Cloud Bigtable instances in the GCP Console.
Click the instance whose metrics you want to view. The GCP Console displays the current metrics for your instance's clusters.
Monitoring performance over time with the GCP Console
Use your instance's monitoring page to understand the past performance of your instance. You can analyze the performance of each cluster, and you can break down the metrics for different types of Cloud Bigtable resources. Charts can display a period ranging from the past 1 hour to the past 30 days.
Charts for Cloud Bigtable resources
The monitoring page provides charts for the following types of Cloud Bigtable resources:
- Instances
- Tables
- Application profiles
Charts are available for the following metrics:
| Metric | Available for | Description |
|---|---|---|
| CPU utilization | Instances | The average CPU utilization across all nodes in the cluster. |
| CPU utilization (hottest node) | Instances |
CPU utilization for the busiest node in the cluster. Exceeding the recommended maximum for the busiest node can cause latency and other issues for the cluster. |
| User error rate | Instances |
The rate of errors caused by the content of a request, as opposed to errors on the Cloud Bigtable server side. User errors are typically caused by a configuration issue, such as a request that specifies the wrong cluster, table, or app profile. |
| System error rate |
Instances Tables App profiles |
The percentage of all requests that failed on the Cloud Bigtable server side. |
| Storage utilization (bytes) |
Instances Tables |
The amount of data stored in the cluster. This metric reflects the fact that Cloud Bigtable compresses your data when it is stored. |
| Storage utilization (% max) | Instances |
The percentage of the cluster's storage capacity that is being used. The capacity is based on the number of nodes in your cluster. For details about how this value is calculated, see Storage utilization per node. |
| Disk load | Instances | The percentage your cluster is using of the maximum possible bandwidth for HDD reads and writes. Available only for HDD clusters. |
| Rows read |
Instances Tables App profiles |
The number of rows read per second. This metric provides a more useful view of Cloud Bigtable's overall throughput than the number of read requests, because a single request can read a large number of rows. |
| Rows written |
Instances Tables App profiles |
The number of rows written per second. This metric provides a more useful view of Cloud Bigtable's overall throughput than the number of write requests, because a single request can write a large number of rows. |
| Read requests |
Instances Tables App profiles |
The number of random reads and scan requests per second. |
| Write requests |
Instances Tables App profiles |
The number of write requests per second. |
| Read throughput |
Instances Tables App profiles |
The number of uncompressed bytes per second that were read. |
| Write throughput |
Instances Tables App profiles |
The number of uncompressed bytes per second that were written. |
| Node count | Instances | The number of nodes in the cluster. |
To view metrics for these resources:
Open the list of Cloud Bigtable instances in the GCP Console.
Click the instance whose metrics you want to view.
In the left pane, click Monitoring. The GCP Console displays a series of charts for the instance, as well as a tabular view of the instance's metrics. By default, the GCP Console shows metrics for the past hour, and it shows separate metrics for each cluster in the instance.
To view all of the charts, scroll through the pane where the charts are displayed.
To view metrics for individual tables or application profiles, click the View metrics for drop-down list, then select Tables or Application profiles.
To view combined metrics for the instance as a whole, find the Group by section above the charts, then click Instance.
To view metrics for a longer period of time, click one of the time scales to the upper right of the charts.
Charts for replication
The monitoring page provides a chart that shows replication latency over time. You can view the average latency for replicating writes at the 50th, 99th, and 100th percentiles.
To view the replication latency over time:
Open the list of Cloud Bigtable instances in the GCP Console.
Click the instance whose metrics you want to view.
In the left pane, click Monitoring.
In the View metrics for drop-down list, select Replication. The GCP Console displays replication latency over time. By default, the GCP Console shows replication latency for the past hour.
You may see a gray bar covering part of the graph. The bar indicates that replication was not occurring during that period of time, either because there were no incoming writes or because of an issue with the Cloud Bigtable service. Latency metrics during these periods may not be accurate.
To change whether the metrics are aggregated for the instance as a whole or presented separately for each cluster, click one of the buttons under Group by.
To change which percentile to view, click one of the buttons under Percentile.
To view metrics for a longer period of time, click one of the time scales to the upper right of the charts.
Monitoring an instance with Stackdriver Monitoring
Cloud Bigtable exports usage metrics that you can monitor programmatically using Stackdriver Monitoring. You can use the Stackdriver Monitoring API or the Metrics Explorer to track Cloud Bigtable usage metrics. In addition, you can set up alerting policies based on usage metrics, and you can add charts for Cloud Bigtable usage metrics to a custom dashboard.
To view usage metrics in the Metrics Explorer:
Open the Monitoring page in the GCP Console.
If you are prompted to choose an account, choose the account that you use to access Google Cloud Platform.
Click Resources, then click Metrics Explorer.
Under Find resource type and metric, type
bigtable. A list of Cloud Bigtable resources and metrics appears.Click a metric to view a chart for that metric.
You can also use a graphing library, such as Matplotlib for Python, to plot and analyze the usage metrics for Cloud Bigtable. To learn more, see the tutorial on using Matplotlib with Stackdriver Monitoring and Cloud Bigtable.
For additional information about using Stackdriver Monitoring, see the Stackdriver Monitoring documentation.
What's next
- Learn how to programmatically scale your Cloud Bigtable cluster.
- Find out how to troubleshoot issues with Key Visualizer.
- Learn more about Cloud Bigtable performance.
- Read about client-side metrics for the HBase client for Java.
- Try the Stackdriver Monitoring quickstart.
- Learn about creating alerts based on Cloud Bigtable metrics.


