Stream analytics solution
Ingest, process, and analyze event streams in real-time on fully-managed infrastructure
Try it Free Contact salesIntegrated and open stream analytics
Stream analytics has emerged as a simpler, faster alternative to batch ETL for getting maximum value from user-interaction events and application and machine logs. Ingesting, processing, and analyzing these data streams quickly and efficiently is critical in fraud detection, clickstream analysis, and online recommendations, among many examples. For such use cases, Google Cloud offers an integrated and open stream analytics solution that is easy to adopt, scale, and manage.
Respond to events as they happen
Ingest millions of streaming events per second from anywhere in the world with Cloud Pub/Sub, powered by Google's unique, high-speed private network. Process the streams with Cloud Dataflow to ensure reliable, exactly-once, low-latency data transformation. Stream the transformed data into BigQuery, the cloud-native data warehousing service, for immediate analysis via SQL or popular visualization tools. Finally, bring predictive analytics to fraud detection, real-time personalization and similar use cases by integrating TensorFlow-based Cloud Machine Learning models and APIs into your streaming data pipelines.
Accelerate development, with no compromises
Stream analytics on GCP simplifies ETL pipelines without compromising robustness, accuracy, or functionality. Cloud Dataflow supports fast pipeline development via expressive Java and Python APIs in the Apache Beam SDK, which provides a rich set of windowing and session analysis primitives as well as an ecosystem of source and sink connectors. Plus, Beam’s unique, unified development model lets you reuse more code across streaming and batch pipelines.
Simplify operations and management
Once your streaming data processing pipelines are deployed, GCP’s serverless approach removes operational overhead with performance, scaling, availability, security and compliance handled automatically. Integration with Stackdriver, GCP’s unified logging and monitoring solution, lets you monitor and troubleshoot your pipelines as they are running. Rich visualization, logging, and advanced alerting help you identify and respond to potential issues.
Keep your favorite tools and systems
Stream analytics on GCP is open and interoperable by design. Cloud Pub/Sub’s open API and multiple clients enable multi-cloud and hybrid deployments. For Apache Kafka users, Confluent is Google’s recommended way to run managed Kafka, and a Cloud Dataflow connector makes do-it-yourself integration with GCP easy. BigQuery works seamlessly with the ETL and BI tools you know and love via standard SQL. Data processing pipelines written with the Beam-based Cloud Dataflow 2.x SDK are portable across Cloud Dataflow, Apache Spark, and Apache Flink. Finally, Spark support is available via Cloud Dataproc for streaming and batch workloads.
SOLUTION COMPONENTS
| Service | Use Case for Stream Analytics | |
|---|---|---|
| Cloud Pub/Sub | ||
| Cloud Pub/Sub | For large-scale ingestion of streaming data originating anywhere in the world. (Open source alternative in this solution: Apache Kafka) | |
| Cloud Dataflow | ||
| Cloud Dataflow | For transforming and enriching ingested data in streaming and batch modes with equal reliability and expressiveness. (Open source alternative in this solution: Spark on Cloud Dataproc) | |
| BigQuery | ||
| BigQuery | Fully-managed data warehouse service that supports 100,000 streaming row inserts per second and allows ad hoc analysis on real-time data with standard SQL. | |
| Apache Beam | ||
| Apache Beam | Unified development framework for programming streaming and batch pipelines. Shipped by Google as Cloud Dataflow SDK 2.x. | |
| Cloud Machine Learning | ||
| Cloud Machine Learning | Add an extra layer of intelligence to your pipeline by running the event streams through custom (Cloud Machine Learning Engine) or pre-built (Cloud APIs) TensorFlow-based machine-learning models. | |
| Cloud Bigtable | ||
| Cloud Bigtable | Low-latency wide-column key-value store, ideal for high-volume time series and read latency-sensitive applications. | |
Customers
Additional Resources
Exactly-once Processing
Learn the meaning of “exactly once” processing in Cloud Dataflow.
View Blog PostCloud Dataflow: Sample Pipelines
Understand how pipelines work through mobile gaming examples.
View DocumentationCodelab: NYC Taxi Tycoon
Step through a guided hands-on coding experience on how to process streaming data with Dataflow and Pub/Sub.
Explore Sample AppFinancial Services Solution
Build a near real-time analytics system that can scale to thousands of simultaneous data streams.
Read Solution PaperArchitecture Diagram
Review the architecture for optimizing large-scale analytics ingestion on Google Cloud Platform.
Read Article

