The AI Platform training service manages computing resources in the cloud to train your models. This page describes the process to train a model with scikit-learn and XGBoost using AI Platform.
Overview
In this tutorial, you train a simple model to predict the species of flowers, using the Iris dataset. After you adjust your model training code to download data from Cloud Storage and upload your saved model file to Cloud Storage, you create a training application package and use it to run training on AI Platform.
This tutorial uses Python 2.7.
How to train your model on AI Platform
After you complete the initial setup process, you can train your model on AI Platform in three steps:
- Create your Python training module
- Add code to download your data from Cloud Storage so that AI Platform can use it
- Add code to export and save the model to Cloud Storage after AI Platform finishes training the model
- Prepare a training application package
- Submit the training job
The initial setup process includes creating a Google Cloud Platform project, enabling billing and APIs, setting up a Cloud Storage bucket to use with AI Platform, and installing scikit-learn or XGBoost locally. If you already have everything set up and installed, skip to creating your model training code.
Before you begin
Complete the following steps to set up a GCP account, activate the AI Platform API, and install and activate the Cloud SDK.
Set up your GCP project
-
Sign in to your Google Account.
If you don't already have one, sign up for a new account.
-
Select or create a GCP project.
-
Make sure that billing is enabled for your Google Cloud Platform project.
- Enable the AI Platform ("Cloud Machine Learning Engine") and Compute Engine APIs.
- Install and initialize the Cloud SDK.
Set up your environment
Choose one of the options below to set up your environment locally on macOS or in a remote environment on Cloud Shell.
For macOS users, we recommend that you set up your environment using the MACOS tab below. Cloud Shell, shown on the CLOUD SHELL tab, is available on macOS, Linux, and Windows. Cloud Shell provides a quick way to try AI Platform, but isn’t suitable for ongoing development work.
macOS
-
Check Python installation
Confirm that you have Python installed and, if necessary, install it.python -V
-
Check
pipinstallation
pipis Python’s package manager, included with current versions of Python. Check if you already havepipinstalled by runningpip --version. If not, see how to installpip.You can upgrade
pipusing the following command:pip install -U pip
See the pip documentation for more details.
-
Install
virtualenv
virtualenvis a tool to create isolated Python environments. Check if you already havevirtualenvinstalled by runningvirtualenv --version. If not, installvirtualenv:pip install --user --upgrade virtualenv
To create an isolated development environment for this guide, create a new virtual environment in
virtualenv. For example, the following command activates an environment namedcmle-env:virtualenv cmle-env source cmle-env/bin/activate
-
For the purposes of this tutorial, run the rest of the commands within your virtual environment.
See more information about usingvirtualenv. To exitvirtualenv, rundeactivate.
Cloud Shell
-
Open the Google Cloud Platform Console.
-
Click the Activate Google Cloud Shell button at the top of the console window.
A Cloud Shell session opens inside a new frame at the bottom of the console and displays a command-line prompt. It can take a few seconds for the shell session to be initialized.
Your Cloud Shell session is ready to use.
-
Configure the
gcloudcommand-line tool to use your selected project.gcloud config set project [selected-project-id]
where
[selected-project-id]is your project ID. (Omit the enclosing brackets.)
Verify the Google Cloud SDK components
To verify that the Google Cloud SDK components are installed:
-
List your models:
gcloud ai-platform models list
-
If you have not created any models before, the command returns an empty list:
Listed 0 items.
After you start creating models, you can see them listed by using this command.
-
If you have installed
gcloudpreviously, updategcloud:gcloud components update
Install frameworks
macOS
Within your virtual environment, run the following command to install scikit-learn, XGBoost, and pandas:
(cmle-env)$ pip install scikit-learn xgboost pandas
For more details, installation options, and troubleshooting information, refer to the installation instructions for each framework:
Cloud Shell
Run the following command to install scikit-learn, XGBoost, and pandas:
pip install --user scikit-learn xgboost pandas
For more details, installation options, and troubleshooting information, refer to the installation instructions for each framework:
Set up your Cloud Storage bucket
You'll need a Cloud Storage bucket to store your training code and dependencies. For the purposes of this tutorial, it is easiest to use a dedicated Cloud Storage bucket in the same project you're using for AI Platform.
If you're using a bucket in a different project, you must ensure that your AI Platform service account can access your training code and dependencies in Cloud Storage. Without the appropriate permissions, your training job fails. See how to grant permissions for storage.
Make sure to use or set up a bucket in the same region you're using to run training jobs. See the available regions for AI Platform services.
This section shows you how to create a new bucket. You can use an existing bucket, but if it is not part of the project you are using to run AI Platform, you must explicitly grant access to the AI Platform service accounts.
-
Specify a name for your new bucket. The name must be unique across all buckets in Cloud Storage.
BUCKET_NAME="your_bucket_name"
For example, use your project name with
-mlengineappended:PROJECT_ID=$(gcloud config list project --format "value(core.project)") BUCKET_NAME=${PROJECT_ID}-mlengine -
Check the bucket name that you created.
echo $BUCKET_NAME
-
Select a region for your bucket and set a
REGIONenvironment variable.For example, the following code creates
REGIONand sets it tous-central1:REGION=us-central1
-
Create the new bucket:
gsutil mb -l $REGION gs://$BUCKET_NAME
Note: Use the same region where you plan on running AI Platform jobs. The example uses
us-central1because that is the region used in the getting-started instructions.
Create your Python training module
Create a file, iris_training.py, that contains the code to train your model.
This section provides an explanation of what each part of the training code
does:
- Setup and imports
- Download the data from Cloud Storage
- Load data into pandas
- Train and save your model
- Upload your saved model file to Cloud Storage
For your convenience, the full code for iris_training.py is hosted on GitHub
so you can use it for this tutorial:
Setup
Import the following libraries from Python and scikit-learn or XGBoost. Set a variable for the name of your Cloud Storage bucket.
scikit-learn
XGBoost
Download data from Cloud Storage
During the typical development process, you upload your own data to
Cloud Storage so that AI Platform can access it. The data
for this tutorial is hosted in a public Cloud Storage bucket:
gs://cloud-samples-data/ml-engine/iris/
The following code downloads the data using gsutil, and then diverts the data
from gsutil to stdout:
scikit-learn
XGBoost
Load data into pandas
Use pandas to load your data into NumPy arrays for training with scikit-learn or XGBoost.
scikit-learn
XGBoost
Train and save a model
Create a training module for AI Platform to run. In this example, the
training module trains a model on the Iris training data (iris_data and
iris_target) and saves your trained model by exporting it to a file. If you
want to use AI Platform to get online predictions after training, you
must name your model file according to the library you use to export it. See
more about the naming requirements for your
model file.
scikit-learn
Following the scikit-learn example on model persistence, you can train and export a model as shown below:
To export the model, you also have the option to use the pickle library as follows:
import pickle
with open('model.pkl', 'wb') as model_file:
pickle.dump(classifier, model_file)
XGBoost
You can export the model by using the "save_model" method of the Booster object.
To export the model, you also have the option to use the pickle library as follows:
import pickle
with open('model.pkl', 'wb') as model_file:
pickle.dump(bst, model_file)
Model file naming requirements
The saved model file that you upload to Cloud Storage must be named one
of: model.pkl, model.joblib, or model.bst, depending on which library you
used. This restriction ensures that AI Platform uses the same pattern
to reconstruct the model on import as was used during export.
This requirement does not apply if you create a custom prediction routine (beta).
scikit-learn
| Library used to export model | Correct model name |
|---|---|
pickle |
model.pkl |
joblib |
model.joblib |
XGBoost
| Library used to export model | Correct model name |
|---|---|
pickle |
model.pkl |
joblib |
model.joblib |
xgboost.Booster |
model.bst |
For future iterations of your model, organize your Cloud Storage bucket so that each new model has a dedicated directory.
Upload your saved model to Cloud Storage
If you're using a Cloud Storage bucket outside of the Google Cloud Platform project you're using to run AI Platform, make sure that AI Platform has access to your bucket.
scikit-learn
XGBoost
Create training application package
With iris_training.py created from the above snippets, create a training
application package that includes iris_training.py as its main module.
The easiest (and recommended) way to create a training application package uses
gcloud to package and upload the application when you submit your training
job. This method requires you to create a very simple file structure with two
files:
scikit-learn
For this tutorial, the file structure of your training application package should appear similar to the following:
iris_sklearn_trainer/
__init__.py
iris_training.py
In the command line, create a directory locally:
mkdir iris_sklearn_trainerCreate an empty file named
__init__.py:touch iris_sklearn_trainer/__init__.pySave your training code as
iris_training.py, and save that file within youriris_sklearn_trainerdirectory. Alternatively, usecURLto download and save the file from GitHub:curl https://raw.githubusercontent.com/GoogleCloudPlatform/cloudml-samples/master/sklearn/iris_training.py > iris_sklearn_trainer/iris_training.pyView the full source code on GitHub.
Confirm that your training application package is set up correctly:
ls ./iris_sklearn_trainer __init__.py iris_training.py
XGBoost
For this tutorial, the file structure of your training application package should appear similar to the following:
iris_xgboost_trainer/
__init__.py
iris_training.py
In the command line, create a directory locally:
mkdir iris_xgboost_trainerCreate an empty file named
__init__.py:touch iris_xgboost_trainer/__init__.pySave your training code as
iris_training.py, and save that file within youriris_xgboost_trainerdirectory. Alternatively, usecURLto download and save the file from GitHub:curl https://raw.githubusercontent.com/GoogleCloudPlatform/cloudml-samples/master/xgboost/iris_training.py > iris_xgboost_trainer/iris_training.pyView the full source code on GitHub.
Confirm that your training application package is set up correctly:
ls ./iris_xgboost_trainer __init__.py iris_training.py
Learn more about packaging a training application.
Run trainer locally
You can test your training application locally using the
gcloud ai-platform local train
command. This step is optional, but it is helpful for debugging purposes.
scikit-learn
In the command line, set the following environment variables,
replacing [VALUES-IN-BRACKETS] with the appropriate values:
TRAINING_PACKAGE_PATH="./iris_sklearn_trainer/"
MAIN_TRAINER_MODULE="iris_sklearn_trainer.iris_training"
Test your training job locally:
gcloud ai-platform local train \
--package-path $TRAINING_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE
XGBoost
In the command line, set the following environment variables,
replacing [VALUES-IN-BRACKETS] with the appropriate values:
TRAINING_PACKAGE_PATH="./iris_xgboost_trainer/"
MAIN_TRAINER_MODULE="iris_xgboost_trainer.iris_training"
Test your training job locally:
gcloud ai-platform local train \
--package-path $TRAINING_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE
Submit training job
In this section, you use
gcloud ai-platform jobs submit training to submit
your training job.
Specify training job parameters
Set the following environment variables for each parameter in your training job request:
BUCKET_NAME- The name of your Cloud Storage bucket.JOB_NAME- A name to use for the job (mixed-case letters, numbers, and underscores only, starting with a letter). For example,iris_scikit_learn_$(date +"%Y%m%d_%H%M%S")oriris_xgboost_$(date +"%Y%m%d_%H%M%S").JOB_DIR- The path to a Cloud Storage location to use for your training job's output files. For example,gs://$BUCKET_NAME/scikit_learn_job_dirorgs://$BUCKET_NAME/xgboost_job_dir.TRAINING_PACKAGE_PATH- The local path to the root directory of your training application. For example,./iris_sklearn_trainer/or./iris_xgboost_trainer/.MAIN_TRAINER_MODULE- Specifies which file the AI Platform training service should run. This is formatted as[YOUR_FOLDER_NAME.YOUR_PYTHON_FILE_NAME]. For example,iris_sklearn_trainer.iris_trainingoriris_xgboost_trainer.iris_training.REGION- The name of the region you're using to run your training job. Use one of the available regions for the AI Platform training service. Make sure your Cloud Storage bucket is in the same region.RUNTIME_VERSION- You must specify a AI Platform runtime version that supports scikit-learn. In this example,1.14.PYTHON_VERSION- The Python version to use for the job. Python 3.5 is available with AI Platform 1.4 or greater. For this tutorial, specify Python 2.7.SCALE_TIER- A predefined cluster specification for machines to run your training job. In this case,BASIC. You can also use custom scale tiers to define your own cluster configuration for training.
For your convenience, the environment variables for this tutorial are below.
scikit-learn
Replace [VALUES-IN-BRACKETS] with the appropriate values:
BUCKET_NAME=[YOUR-BUCKET-NAME]
JOB_NAME="iris_scikit_learn_$(date +"%Y%m%d_%H%M%S")"
JOB_DIR=gs://$BUCKET_NAME/scikit_learn_job_dir
TRAINING_PACKAGE_PATH="./iris_sklearn_trainer/"
MAIN_TRAINER_MODULE="iris_sklearn_trainer.iris_training"
REGION=us-central1
RUNTIME_VERSION=1.14
PYTHON_VERSION=2.7
SCALE_TIER=BASIC
XGBoost
Replace [VALUES-IN-BRACKETS] with the appropriate values:
BUCKET_NAME=[YOUR-BUCKET-NAME]
JOB_NAME="iris_xgboost_$(date +"%Y%m%d_%H%M%S")"
JOB_DIR=gs://$BUCKET_NAME/xgboost_job_dir
TRAINING_PACKAGE_PATH="./iris_xgboost_trainer/"
MAIN_TRAINER_MODULE="iris_xgboost_trainer.iris_training"
REGION=us-central1
RUNTIME_VERSION=1.14
PYTHON_VERSION=2.7
SCALE_TIER=BASIC
Submit the training job request:
gcloud ai-platform jobs submit training $JOB_NAME \
--job-dir $JOB_DIR \
--package-path $TRAINING_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE \
--region $REGION \
--runtime-version=$RUNTIME_VERSION \
--python-version=$PYTHON_VERSION \
--scale-tier $SCALE_TIER
You should see output similar to the following:
Job [iris_scikit_learn_[DATE]_[TIME]] submitted successfully.
Your job is still active. You may view the status of your job with the command
$ gcloud ai-platform jobs describe iris_scikit_learn_[DATE]_[TIME]
or continue streaming the logs with the command
$ gcloud ai-platform jobs stream-logs iris_scikit_learn_[DATE]_[TIME]
jobId: iris_scikit_learn_[DATE]_[TIME]
state: QUEUED
Viewing your training logs (optional)
AI Platform captures all stdout and stderr streams and logging
statements. These logs are stored in Logging; they are visible
both during and after execution.
To view the logs for your training job:
Console
Open your AI Platform Jobs page.
Select the name of the training job to inspect. This brings you to the Job details page for your selected training job.
Within the job details, select the View logs link. This brings you to the Logging page where you can search and filter logs for your selected training job.
gcloud
You can view logs in your terminal with
gcloud ai-platform jobs stream-logs.
gcloud ai-platform jobs stream-logs $JOB_NAME
Verify your model file in Cloud Storage
View the contents of the destination model folder to verify that your saved model file has been uploaded to Cloud Storage.
gsutil ls gs://$BUCKET_NAME/iris_*
Example output:
gs://bucket-name/iris_20180518_123815/:
gs://bucket-name/iris_20180518_123815/model.joblib
Deploy your model to AI Platform for online predictions
To deploy your model and return predictions, follow the instructions to
deploy models and versions.
You can use the $RUNTIME_VERSION and $PYTHON_VERSION variables you defined
earlier in this tutorial to deploy the model using
gcloud ai-platform versions create.
What's next
- Get online predictions with scikit-learn on AI Platform.
- See how to use custom scale tiers to define your own cluster configuration for training.


