diff --git a/on_demand/kfp-caip-sklearn/README.md b/on_demand/kfp-caip-sklearn/README.md index 58150390..76e3c854 100644 --- a/on_demand/kfp-caip-sklearn/README.md +++ b/on_demand/kfp-caip-sklearn/README.md @@ -56,7 +56,7 @@ ml.googleapis.com \ dataflow.googleapis.com ``` -The **Cloud Build** service account needs the Editor permissions in your GCP project to upload the pipeline package to an **AI Platform Pipelines** instance. +The **Cloud Build** service account needs the Editor permissions in your Google Cloud project to upload the pipeline package to an **AI Platform Pipelines** instance. ``` PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)") @@ -185,10 +185,10 @@ In this lab, you will develop, package as a docker image, and run on AI Platform ### Lab-02 - Implementing continuous training pipeline with Kubeflow Pipelines and Cloud AI Platform -In this lab, you will author, deploy, and run a **Kubeflow Pipelines (KFP)** pipeline that automates ML workflow steps you experminted with in the first lab. +In this lab, you will author, deploy, and run a **Kubeflow Pipelines (KFP)** that automates ML workflow steps you experminted with in the first lab. ### Lab-03 - CI/CD for the continuous training pipeline -In this lab, you will author a **Cloud Build** CI/CD workflow that automates the process of building and deploying of the KFP pipeline authored in the second lab. You will also integrate the **Cloud Build** workflow with **GitHub**. +In this lab, you will author a **Cloud Build** CI/CD workflow that automates the process of building and deploying of the KFP authored in the second lab. You will also integrate the **Cloud Build** workflow with **GitHub**. diff --git a/on_demand/kfp-caip-sklearn/lab-01-caip-containers/exercises/lab-01.ipynb b/on_demand/kfp-caip-sklearn/lab-01-caip-containers/exercises/lab-01.ipynb index fe7ea088..9245c88e 100644 --- a/on_demand/kfp-caip-sklearn/lab-01-caip-containers/exercises/lab-01.ipynb +++ b/on_demand/kfp-caip-sklearn/lab-01-caip-containers/exercises/lab-01.ipynb @@ -7,14 +7,16 @@ "# Using custom containers with AI Platform Training\n", "\n", "**Learning Objectives:**\n", - "1. Learn how to create a train and a validation split with Big Query\n", - "1. Learn how to wrap a machine learning model into a Docker container and train in on CAIP\n", - "1. Learn how to use the hyperparameter tunning engine on GCP to find the best hyperparameters\n", - "1. Learn how to deploy a trained machine learning model GCP as a rest API and query it.\n", + "1. Learn how to create a train and a validation split with BigQuery\n", + "1. Learn how to wrap a machine learning model into a Docker container and train in on AI Platform\n", + "1. Learn how to use the hyperparameter tunning engine on Google Cloud to find the best hyperparameters\n", + "1. Learn how to deploy a trained machine learning model Google Cloud as a rest API and query it\n", "\n", - "In this lab, you develop, package as a docker image, and run on **AI Platform Training** a training application that trains a multi-class classification model that predicts the type of forest cover from cartographic data. The [dataset](../../../datasets/covertype/README.md) used in the lab is based on **Covertype Data Set** from UCI Machine Learning Repository.\n", + "In this lab, you develop a multi-class classification model, package the model as a docker image, and run on **AI Platform Training** as a training application. The training application trains a multi-class classification model that predicts the type of forest cover from cartographic data. The [dataset](../../../datasets/covertype/README.md) used in the lab is based on **Covertype Data Set** from UCI Machine Learning Repository.\n", "\n", - "The training code uses `scikit-learn` for data pre-processing and modeling. The code has been instrumented using the `hypertune` package so it can be used with **AI Platform** hyperparameter tuning.\n" + "Scikit-learn is one of the most useful libraries for machine learning in Python. The training code uses `scikit-learn` for data pre-processing and modeling. \n", + "\n", + "The code is instrumented using the `hypertune` package so it can be used with **AI Platform** hyperparameter tuning job in searching for the best combination of hyperparameter values by optimizing the metrics you specified." ] }, { @@ -48,6 +50,63 @@ "from sklearn.compose import ColumnTransformer" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare lab dataset\n", + "\n", + "Set environment variable so that we can use them throughout the entire lab.\n", + "\n", + "The pipeline ingests data from BigQuery. The cell below uploads the Covertype dataset to BigQuery.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PROJECT_ID=!(gcloud config get-value core/project)\n", + "PROJECT_ID=PROJECT_ID[0]\n", + "DATASET_ID='covertype_dataset'\n", + "DATASET_LOCATION='US'\n", + "TABLE_ID='covertype'\n", + "DATA_SOURCE='gs://workshop-datasets/covertype/small/dataset.csv'\n", + "SCHEMA='Elevation:INTEGER,Aspect:INTEGER,Slope:INTEGER,Horizontal_Distance_To_Hydrology:INTEGER,Vertical_Distance_To_Hydrology:INTEGER,Horizontal_Distance_To_Roadways:INTEGER,Hillshade_9am:INTEGER,Hillshade_Noon:INTEGER,Hillshade_3pm:INTEGER,Horizontal_Distance_To_Fire_Points:INTEGER,Wilderness_Area:STRING,Soil_Type:STRING,Cover_Type:INTEGER'\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, create the BigQuery dataset and upload the Covertype csv data into a table.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!bq --location=$DATASET_LOCATION --project_id=$PROJECT_ID mk --dataset $DATASET_ID\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!bq --project_id=$PROJECT_ID --dataset_id=$DATASET_ID load \\\n", + "--source_format=CSV \\\n", + "--skip_leading_rows=1 \\\n", + "--replace \\\n", + "$TABLE_ID \\\n", + "$DATA_SOURCE \\\n", + "$SCHEMA\n" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -62,7 +121,9 @@ "Set location paths, connections strings, and other environment settings. Make sure to update `REGION`, and `ARTIFACT_STORE` with the settings reflecting your lab environment. \n", "\n", "- `REGION` - the compute region for AI Platform Training and Prediction\n", - "- `ARTIFACT_STORE` - the GCS bucket created during installation of AI Platform Pipelines. The bucket name starts with the `hostedkfp-default-` prefix." + "- `ARTIFACT_STORE` - the Cloud Storage bucket created during installation of AI Platform Pipelines. The bucket name starts with the `qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default` prefix.\n", + "\n", + "Run gsutil ls without URLs to list all of the Cloud Storage buckets under your default project ID." ] }, { @@ -74,6 +135,15 @@ "!gsutil ls" ] }, + { + "source": [ + "HINT: For ARTIFACT_STORE, copy the bucket name which starts with the qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default prefix from the previous cell output.\n", + "\n", + "Your copied value should look like 'gs://qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default')." + ], + "cell_type": "markdown", + "metadata": {} + }, { "cell_type": "code", "execution_count": null, @@ -81,7 +151,7 @@ "outputs": [], "source": [ "REGION = 'us-central1'\n", - "ARTIFACT_STORE = 'gs://hostedkfp-default-l2iv13wnek'\n", + "ARTIFACT_STORE = 'gs://qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default' # TO DO: REPLACE WITH YOUR ARTIFACT_STORE NAME\n", "\n", "PROJECT_ID = !(gcloud config get-value core/project)\n", "PROJECT_ID = PROJECT_ID[0]\n", @@ -95,7 +165,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Explore the Covertype dataset " + "## Explore the Covertype dataset \n", + "\n", + "Run the query statement below to scan covertype_dataset.covertype table in BigQuery and return the computed result rows." ] }, { @@ -115,15 +187,14 @@ "source": [ "## Create training and validation splits\n", "\n", - "Use BigQuery to sample training and validation splits and save them to GCS storage\n", - "### Create a training split" + "Use BigQuery to sample training and validation splits and save them to Cloud Storage.\n", + "\n", + "### Create a training split\n", + "\n", + "Run the query below in order to have repeatable sampling of the data in BigQuery. Note that `FARM_FINGERPRINT()` is used on the field that you are going to split your data. It creates a training split that takes 80% of the data using the `bq` command and exports this split into the BigQuery table of `covertype_dataset.training`." ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "!bq query \\\n", "-n 0 \\\n", @@ -134,7 +205,18 @@ "FROM `covertype_dataset.covertype` AS cover \\\n", "WHERE \\\n", "MOD(ABS(FARM_FINGERPRINT(TO_JSON_STRING(cover))), 10) IN (1, 2, 3, 4)' " - ] + ], + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [] + }, + { + "source": [ + "Use the `bq` extract command to export the BigQuery training table to GCS at `$TRAINING_FILE_PATH`." + ], + "cell_type": "markdown", + "metadata": {} }, { "cell_type": "code", @@ -165,7 +247,10 @@ "a validation split that takes 10% of the data using the `bq` command and\n", "export this split into the BigQuery table `covertype_dataset.validation`.\n", "\n", - "In the second cell, use the `bq` command to export that BigQuery validation table to GCS at `$VALIDATION_FILE_PATH`." + "In the second cell, use the `bq` command to export that BigQuery validation table to GCS at `$VALIDATION_FILE_PATH`.\n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-01-caip-containers** and opening **lab-01.ipynb**.\n", + "" ] }, { @@ -174,7 +259,7 @@ "metadata": {}, "outputs": [], "source": [ - "# TODO: You code to create the BQ table validation split" + "# TO DO: Your code goes here to create the BQ table validation split." ] }, { @@ -183,7 +268,7 @@ "metadata": {}, "outputs": [], "source": [ - "# TODO: Your code to export the validation table to GCS" + "# TO DO: Your code goes here to export the validation table to the Cloud Storage bucket." ] }, { @@ -329,7 +414,10 @@ "### Exercise\n", "\n", "Complete the code below to capture the metric that the hyper parameter tunning engine will use to optimize\n", - "the hyper parameter. " + "the hyper parameter. \n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-01-caip-containers** and opening **lab-01.ipynb**.\n", + "" ] }, { @@ -407,7 +495,7 @@ " pipeline.fit(X_train, y_train)\n", "\n", " if hptune:\n", - " # TODO: Score the model with the validation data and capture the result\n", + " # TO DO: Your code goes here to score the model with the validation data and capture the result\n", " # with the hypertune library\n", "\n", " # Save the model\n", @@ -442,7 +530,10 @@ "### Exercise\n", "\n", "Complete the Dockerfile below so that it copies the 'train.py' file into the container\n", - "at `/app` and runs it when the container is started. " + "at `/app` and runs it when the container is started. \n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-01-caip-containers** and opening **lab-01.ipynb**.\n", + "" ] }, { @@ -456,7 +547,7 @@ "FROM gcr.io/deeplearning-platform-release/base-cpu\n", "RUN pip install -U fire cloudml-hypertune scikit-learn==0.20.4 pandas==0.24.2\n", "\n", - "# TODO" + "# TO DO: Your code goes here" ] }, { @@ -516,7 +607,10 @@ "Complete the `hptuning_config.yaml` file below so that the hyperparameter\n", "tunning engine try for parameter values\n", "* `max_iter` the two values 200 and 300\n", - "* `alpha` a linear range of values between 0.00001 and 0.001" + "* `alpha` a linear range of values between 0.00001 and 0.001\n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-01-caip-containers** and opening **lab-01.ipynb**.\n", + "" ] }, { @@ -549,7 +643,7 @@ " hyperparameterMetricTag: accuracy\n", " enableTrialEarlyStopping: TRUE \n", " params:\n", - " # TODO: Your code goes here\n", + " # TO DO: Your code goes here\n", " " ] }, @@ -561,7 +655,10 @@ "\n", "\n", "### Exercise\n", - "Use the `gcloud` command to start the hyperparameter tuning job." + "Use the `gcloud` command to start the hyperparameter tuning job.\n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-01-caip-containers** and opening **lab-01.ipynb**.\n", + "\n" ] }, { @@ -575,13 +672,13 @@ "SCALE_TIER = \"BASIC\"\n", "\n", "!gcloud ai-platform jobs submit training $JOB_NAME \\\n", - "--region=# TODO\\\n", - "--job-dir=# TODO \\\n", - "--master-image-uri=# TODO \\\n", - "--scale-tier=# TODO \\\n", - "--config # TODO \\\n", + "--region=# TO DO: ADD YOUR REGION \\\n", + "--job-dir=# TO DO: ADD YOUR JOB-DIR \\\n", + "--master-image-uri=# TO DO: ADD YOUR IMAGE-URI \\\n", + "--scale-tier=# TO DO: ADD YOUR SCALE-TIER \\\n", + "--config # TO DO: ADD YOUR CONFIG PATH \\\n", "-- \\\n", - "# TODO" + "# TO DO: Complete the command" ] }, { @@ -590,7 +687,7 @@ "source": [ "### Monitor the job.\n", "\n", - "You can monitor the job using GCP console or from within the notebook using `gcloud` commands." + "You can monitor the job using Google Cloud console or from within the notebook using `gcloud` commands." ] }, { @@ -611,6 +708,13 @@ "!gcloud ai-platform jobs stream-logs $JOB_NAME" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**NOTE: The above AI platform job stream logs will take approximately 5~10 minutes to display.**" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -622,7 +726,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "After the job completes you can review the results using GCP Console or programatically by calling the AI Platform Training REST end-point." + "After the job completes you can review the results using Google Cloud Console or programatically by calling the AI Platform Training REST end-point." ] }, { @@ -720,13 +824,20 @@ "!gcloud ai-platform jobs stream-logs $JOB_NAME" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**NOTE: The above AI platform job stream logs will take approximately 5~10 minutes to display.**" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Examine the training output\n", "\n", - "The training script saved the trained model as the 'model.pkl' in the `JOB_DIR` folder on GCS." + "The training script saved the trained model as the 'model.pkl' in the `JOB_DIR` folder on Cloud Storage." ] }, { @@ -759,7 +870,10 @@ "### Exercise\n", "\n", "Complete the `gcloud` command below to create a model with\n", - "`model_name` in `$REGION` tagged with `labels`:" + "`model_name` in `$REGION` tagged with `labels`:\n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-01-caip-containers** and opening **lab-01.ipynb**.\n", + "" ] }, { @@ -771,7 +885,7 @@ "model_name = 'forest_cover_classifier'\n", "labels = \"task=classifier,domain=forestry\"\n", "\n", - "!gcloud # TODO: You code goes here" + "!gcloud # TO DO: You code goes here" ] }, { @@ -792,7 +906,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Complete the `gcloud` command below to create a version of the model:" + "Complete the `gcloud` command below to create a version of the model:\n", + "\n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-01-caip-containers** and opening **lab-01.ipynb**.\n", + "" ] }, { @@ -803,12 +921,13 @@ "source": [ "model_version = 'v01'\n", "\n", - "!gcloud # TODO \\\n", - "--model=# TODO \\\n", - "--origin=# TODO \\\n", - "--runtime-version=# TODO \\\n", - "--framework=# TODO \\\n", - "--python-version=# TODO" + "!gcloud # TO DO: Complete the command \\\n", + "--model=# TO DO: ADD YOUR MODEL NAME \\\n", + "--origin=# TO DO: ADD YOUR PATH \\\n", + "--runtime-version=# TO DO: ADD YOUR RUNTIME \\\n", + "--framework=# TO DO: ADD YOUR FRAMEWORK \\\n", + "--python-version=# TO DO: ADD YOUR PYTHON VERSION \\\n", + "--region # TO DO: ADD YOUR REGION" ] }, { @@ -856,7 +975,10 @@ "### Exercise\n", "\n", "Using the `gcloud` command send the data in `$input_file` to \n", - "your model deployed as a REST API:" + "your model deployed as a REST API:\n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-01-caip-containers** and opening **lab-01.ipynb**.\n", + "" ] }, { @@ -865,7 +987,7 @@ "metadata": {}, "outputs": [], "source": [ - "!gcloud # TODO: Complete the command" + "!gcloud # TO DO: Complete the command" ] }, { @@ -901,4 +1023,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/on_demand/kfp-caip-sklearn/lab-01-caip-containers/lab-01.ipynb b/on_demand/kfp-caip-sklearn/lab-01-caip-containers/lab-01.ipynb index 2f33ba75..34580ce6 100644 --- a/on_demand/kfp-caip-sklearn/lab-01-caip-containers/lab-01.ipynb +++ b/on_demand/kfp-caip-sklearn/lab-01-caip-containers/lab-01.ipynb @@ -7,14 +7,16 @@ "# Using custom containers with AI Platform Training\n", "\n", "**Learning Objectives:**\n", - "1. Learn how to create a train and a validation split with Big Query\n", - "1. Learn how to wrap a machine learning model into a Docker container and train in on CAIP\n", - "1. Learn how to use the hyperparameter tunning engine on GCP to find the best hyperparameters\n", - "1. Learn how to deploy a trained machine learning model GCP as a rest API and query it\n", + "1. Learn how to create a train and a validation split with BigQuery\n", + "1. Learn how to wrap a machine learning model into a Docker container and train in on AI Platform\n", + "1. Learn how to use the hyperparameter tunning engine on Google Cloud to find the best hyperparameters\n", + "1. Learn how to deploy a trained machine learning model Google Cloud as a rest API and query it\n", "\n", - "In this lab, you develop, package as a docker image, and run on **AI Platform Training** a training application that trains a multi-class classification model that predicts the type of forest cover from cartographic data. The [dataset](../../../datasets/covertype/README.md) used in the lab is based on **Covertype Data Set** from UCI Machine Learning Repository.\n", + "In this lab, you develop a multi-class classification model, package the model as a docker image, and run on **AI Platform Training** as a training application. The training application trains a multi-class classification model that predicts the type of forest cover from cartographic data. The [dataset](../../../datasets/covertype/README.md) used in the lab is based on **Covertype Data Set** from UCI Machine Learning Repository.\n", "\n", - "The training code uses `scikit-learn` for data pre-processing and modeling. The code has been instrumented using the `hypertune` package so it can be used with **AI Platform** hyperparameter tuning.\n" + "Scikit-learn is one of the most useful libraries for machine learning in Python. The training code uses `scikit-learn` for data pre-processing and modeling. \n", + "\n", + "The code is instrumented using the `hypertune` package so it can be used with **AI Platform** hyperparameter tuning job in searching for the best combination of hyperparameter values by optimizing the metrics you specified.\n" ] }, { @@ -48,6 +50,63 @@ "from sklearn.compose import ColumnTransformer" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prepare lab dataset\n", + "\n", + "Set environment variable so that we can use them throughout the entire lab.\n", + "\n", + "The pipeline ingests data from BigQuery. The cell below uploads the Covertype dataset to BigQuery.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PROJECT_ID = !(gcloud config get-value core/project)\n", + "PROJECT_ID = PROJECT_ID[0]\n", + "DATASET_ID='covertype_dataset'\n", + "DATASET_LOCATION='US'\n", + "TABLE_ID='covertype'\n", + "DATA_SOURCE='gs://workshop-datasets/covertype/small/dataset.csv'\n", + "SCHEMA='Elevation:INTEGER,Aspect:INTEGER,Slope:INTEGER,Horizontal_Distance_To_Hydrology:INTEGER,Vertical_Distance_To_Hydrology:INTEGER,Horizontal_Distance_To_Roadways:INTEGER,Hillshade_9am:INTEGER,Hillshade_Noon:INTEGER,Hillshade_3pm:INTEGER,Horizontal_Distance_To_Fire_Points:INTEGER,Wilderness_Area:STRING,Soil_Type:STRING,Cover_Type:INTEGER'\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, create the BigQuery dataset and upload the Covertype csv data into a table.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!bq --location=$DATASET_LOCATION --project_id=$PROJECT_ID mk --dataset $DATASET_ID\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!bq --project_id=$PROJECT_ID --dataset_id=$DATASET_ID load \\\n", + "--source_format=CSV \\\n", + "--skip_leading_rows=1 \\\n", + "--replace \\\n", + "$TABLE_ID \\\n", + "$DATA_SOURCE \\\n", + "$SCHEMA\n" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -62,7 +121,9 @@ "Set location paths, connections strings, and other environment settings. Make sure to update `REGION`, and `ARTIFACT_STORE` with the settings reflecting your lab environment. \n", "\n", "- `REGION` - the compute region for AI Platform Training and Prediction\n", - "- `ARTIFACT_STORE` - the GCS bucket created during installation of AI Platform Pipelines. The bucket name starts with the `hostedkfp-default-` prefix." + "- `ARTIFACT_STORE` - the Cloud Storage bucket created during installation of AI Platform Pipelines. The bucket name starts with the `qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default` prefix.\n", + "\n", + " Run gsutil ls without URLs to list all of the Cloud Storage buckets under your default project ID. " ] }, { @@ -74,6 +135,15 @@ "!gsutil ls" ] }, + { + "source": [ + "**HINT:** For ARTIFACT_STORE, copy the bucket name which starts with the qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default prefix from the previous cell output. \n", + "\n", + "Your copied value should look like 'gs://qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default')." + ], + "cell_type": "markdown", + "metadata": {} + }, { "cell_type": "code", "execution_count": null, @@ -81,7 +151,7 @@ "outputs": [], "source": [ "REGION = 'us-central1'\n", - "ARTIFACT_STORE = 'gs://hostedkfp-default-l2iv13wnek'\n", + "ARTIFACT_STORE = 'gs://qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default' # TO DO: REPLACE WITH YOUR ARTIFACT_STORE NAME\n", "\n", "PROJECT_ID = !(gcloud config get-value core/project)\n", "PROJECT_ID = PROJECT_ID[0]\n", @@ -95,7 +165,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Explore the Covertype dataset " + "## Explore the Covertype dataset \n", + "\n", + "Run the query statement below to scan covertype_dataset.covertype table in BigQuery and return the computed result rows." ] }, { @@ -115,8 +187,10 @@ "source": [ "## Create training and validation splits\n", "\n", - "Use BigQuery to sample training and validation splits and save them to GCS storage\n", - "### Create a training split" + "Use BigQuery to sample training and validation splits and save them to Cloud Storage.\n", + "### Create a training split\n", + "\n", + "Run the query below in order to have repeatable sampling of the data in BigQuery. Note that `FARM_FINGERPRINT()` is used on the field that you are going to split your data. It creates a training split that takes 80% of the data using the `bq` command and exports this split into the BigQuery table of `covertype_dataset.training`." ] }, { @@ -136,6 +210,13 @@ "MOD(ABS(FARM_FINGERPRINT(TO_JSON_STRING(cover))), 10) IN (1, 2, 3, 4)' " ] }, + { + "source": [ + "Use the bq extract command to export the BigQuery training table to GCS at $TRAINING_FILE_PATH." + ], + "cell_type": "markdown", + "metadata": {} + }, { "cell_type": "code", "execution_count": null, @@ -152,7 +233,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create a validation split" + "### Create a validation split\n", + "\n", + "Run the query below to create a validation split that takes 10% of the data using the `bq` command and export this split into the BigQuery table `covertype_dataset.validation`.\n", + "\n", + "In the second cell, use the `bq` command to export that BigQuery validation table to GCS at `$VALIDATION_FILE_PATH.`" ] }, { @@ -572,7 +657,7 @@ "source": [ "### Monitor the job.\n", "\n", - "You can monitor the job using GCP console or from within the notebook using `gcloud` commands." + "You can monitor the job using Google Cloud console or from within the notebook using `gcloud` commands." ] }, { @@ -593,6 +678,13 @@ "!gcloud ai-platform jobs stream-logs $JOB_NAME" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**NOTE: The above AI platform job stream logs will take approximately 5~10 minutes to display.**" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -604,7 +696,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "After the job completes you can review the results using GCP Console or programatically by calling the AI Platform Training REST end-point." + "After the job completes you can review the results using Google Cloud Console or programatically by calling the AI Platform Training REST end-point." ] }, { @@ -702,13 +794,20 @@ "!gcloud ai-platform jobs stream-logs $JOB_NAME" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**NOTE: The above AI platform job stream logs will take approximately 5~10 minutes to display.**" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Examine the training output\n", "\n", - "The training script saved the trained model as the 'model.pkl' in the `JOB_DIR` folder on GCS." + "The training script saved the trained model as the 'model.pkl' in the `JOB_DIR` folder on Cloud Storage." ] }, { @@ -731,7 +830,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create a model resource" + "### Create a model resource\n", + "\n", + "Use the gcloud command to create a model with model_name in $REGION tagged with labels." ] }, { @@ -752,7 +853,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create a model version" + "### Create a model version\n", + "\n", + "Use the gcloud command to create a version of the model.\n", + "\n" ] }, { @@ -768,7 +872,8 @@ " --origin=$JOB_DIR \\\n", " --runtime-version=1.15 \\\n", " --framework=scikit-learn \\\n", - " --python-version=3.7" + " --python-version=3.7\\\n", + " --region global" ] }, { @@ -806,7 +911,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Invoke the model" + "#### Invoke the model\n", + "\n", + "Use the gcloud command send the data in $input_file to your model deployed as a REST API." ] }, { @@ -818,7 +925,8 @@ "!gcloud ai-platform predict \\\n", "--model $model_name \\\n", "--version $model_version \\\n", - "--json-instances $input_file" + "--json-instances $input_file\\\n", + "--region global" ] }, { @@ -834,11 +942,6 @@ } ], "metadata": { - "environment": { - "name": "tf2-gpu.2-1.m59", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-1:m59" - }, "kernelspec": { "display_name": "Python 3", "language": "python", @@ -854,9 +957,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.8" + "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/README.md b/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/README.md index 37dbe8a3..df37c454 100644 --- a/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/README.md +++ b/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/README.md @@ -1,6 +1,6 @@ -# Implementing cntinuous training pipeline with KFP and Cloud AI Platform +# Implementing cntinuous training pipeline with Kubeflow Pipeline and AI Platform -In this lab, you will build, deploy, and run a KFP pipeline that orchestrates **BigQuery** and **Cloud AI Platform** services to train a **scikit-learn** model. +In this lab, you will build, deploy, and run a Kubeflow Pipeline (KFP) that orchestrates **BigQuery** and **AI Platform** services to train a **scikit-learn** model. ## Lab instructions diff --git a/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/exercises/lab-02.ipynb b/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/exercises/lab-02.ipynb index cdfe43e4..41e9b635 100644 --- a/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/exercises/lab-02.ipynb +++ b/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/exercises/lab-02.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Continuous training pipeline with KFP and Cloud AI Platform" + "# Continuous training pipeline with Kubeflow Pipeline and AI Platform" ] }, { @@ -12,13 +12,13 @@ "metadata": {}, "source": [ "**Learning Objectives:**\n", - "1. Learn how to use KF pre-build components (BiqQuery, CAIP training and predictions)\n", - "1. Learn how to use KF lightweight python components\n", - "1. Learn how to build a KF pipeline with these components\n", - "1. Learn how to compile, upload, and run a KF pipeline with the command line\n", + "1. Learn how to use Kubeflow Pipeline (KFP) pre-build components (BiqQuery, AI Platform training and predictions)\n", + "1. Learn how to use KFP lightweight python components\n", + "1. Learn how to build a KFP with these components\n", + "1. Learn how to compile, upload, and run a KFP with the command line\n", "\n", "\n", - "In this lab, you will build, deploy, and run a KFP pipeline that orchestrates **BigQuery** and **Cloud AI Platform** services to train, tune, and deploy a **scikit-learn** model.\n" + "In this lab, you will build, deploy, and run a KFP pipeline that orchestrates **BigQuery** and **AI Platform** services to train, tune, and deploy a **scikit-learn** model.\n" ] }, { @@ -34,7 +34,9 @@ "source": [ "The workflow implemented by the pipeline is defined using a Python based Domain Specific Language (DSL). The pipeline's DSL is in the `covertype_training_pipeline.py` file that we will generate below.\n", "\n", - "The pipeline's DSL has been designed to avoid hardcoding any environment specific settings like file paths or connection strings. These settings are provided to the pipeline code through a set of environment variables.\n" + "The pipeline's DSL has been designed to avoid hardcoding any environment specific settings like file paths or connection strings. These settings are provided to the pipeline code through a set of environment variables.\n", + "\n", + "\n" ] }, { @@ -68,7 +70,10 @@ "source": [ "### Exercise\n", "\n", - "Complete the TODOs the pipeline file below." + "Complete TO DOs the pipeline file below.\n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-02-kfp-pipeline** and opening **lab-02.ipynb**.\n", + "" ] }, { @@ -91,7 +96,7 @@ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", - "\"\"\"KFP pipeline orchestrating BigQuery and Cloud AI Platform services.\"\"\"\n", + "\"\"\"KFP orchestrating BigQuery and Cloud AI Platform services.\"\"\"\n", "\n", "import os\n", "\n", @@ -166,13 +171,13 @@ "\n", "\n", "# Create component factories\n", - "component_store = # TODO\n", + "component_store = # TO DO: Complete the command\n", "\n", - "bigquery_query_op = # TODO - use the pre-built bigquery/query component\n", - "mlengine_train_op = # TODO - use the pre-built ml_engine/train\n", - "mlengine_deploy_op = # TODO - use the pre-built ml_engine/deploy component\n", - "retrieve_best_run_op = # TODO - package the retrieve_best_run function into a lightweight component\n", - "evaluate_model_op = # TODO - package the evaluate_model function into a lightweight component\n", + "bigquery_query_op = # TO DO: Use the pre-build bigquery/query component\n", + "mlengine_train_op = # TO DO: Use the pre-build ml_engine/train\n", + "mlengine_deploy_op = # TO DO: Use the pre-build ml_engine/deploy component\n", + "retrieve_best_run_op = # TO DO: Package the retrieve_best_run function into a lightweight component\n", + "evaluate_model_op = # TO DO: Package the evaluate_model function into a lightweight component\n", "\n", "\n", "@kfp.dsl.pipeline(\n", @@ -221,7 +226,7 @@ "\n", " testing_file_path = '{}/{}'.format(gcs_root, TESTING_FILE_PATH)\n", "\n", - " create_testing_split = # TODO - use the bigquery_query_op\n", + " create_testing_split = # TO DO: Use the bigquery_query_op\n", " \n", "\n", " # Tune hyperparameters\n", @@ -235,7 +240,7 @@ " job_dir = '{}/{}/{}'.format(gcs_root, 'jobdir/hypertune',\n", " kfp.dsl.RUN_ID_PLACEHOLDER)\n", "\n", - " hypertune = # TODO - use the mlengine_train_op\n", + " hypertune = # TO DO: Use the mlengine_train_op\n", "\n", " # Retrieve the best trial\n", " get_best_trial = retrieve_best_run_op(\n", @@ -253,7 +258,7 @@ " get_best_trial.outputs['max_iter'], '--hptune', 'False'\n", " ]\n", "\n", - " train_model = # TODO - use the mlengine_train_op\n", + " train_model = # TO DO: Use the mlengine_train_op\n", "\n", " # Evaluate the model on the testing split\n", " eval_model = evaluate_model_op(\n", @@ -329,13 +334,35 @@ "Update the below constants with the settings reflecting your lab environment. \n", "\n", "- `REGION` - the compute region for AI Platform Training and Prediction\n", - "- `ARTIFACT_STORE` - the GCS bucket created during installation of AI Platform Pipelines. The bucket name starts with the `hostedkfp-default-` prefix.\n", + "- `ARTIFACT_STORE` - the GCS bucket created during installation of AI Platform Pipelines. The bucket name will be similar to `qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default`.\n", "- `ENDPOINT` - set the `ENDPOINT` constant to the endpoint to your AI Platform Pipelines instance. Then endpoint to the AI Platform Pipelines instance can be found on the [AI Platform Pipelines](https://console.cloud.google.com/ai-platform/pipelines/clusters) page in the Google Cloud Console.\n", "\n", - "1. Open the *SETTINGS* for your instance\n", - "2. Use the value of the `host` variable in the *Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SKD* section of the *SETTINGS* window." + "1. Open the **SETTINGS** for your instance\n", + "2. Use the value of the `host` variable in the **Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SKD** section of the **SETTINGS** window.\n", + "\n", + "Run gsutil ls without URLs to list all of the Cloud Storage buckets under your default project ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!gsutil ls" ] }, + { + "source": [ + "**HINT:** \n", + "\n", + "For **ENDPOINT**, use the value of the `host` variable in the **Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SDK** section of the **SETTINGS** window.\n", + "\n", + "For **ARTIFACT_STORE_URI**, copy the bucket name which starts with the qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default prefix from the previous cell output. Your copied value should look like **'gs://qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default'**\n" + ], + "cell_type": "markdown", + "metadata": {} + }, { "cell_type": "code", "execution_count": null, @@ -343,8 +370,8 @@ "outputs": [], "source": [ "REGION = 'us-central1'\n", - "ENDPOINT = '337dd39580cbcbd2-dot-us-central2.pipelines.googleusercontent.com'\n", - "ARTIFACT_STORE_URI = 'gs://hostedkfp-default-e8c59nl4zo'\n", + "ENDPOINT = '337dd39580cbcbd2-dot-us-central2.pipelines.googleusercontent.com' # TO DO: REPLACE WITH YOUR ENDPOINT\n", + "ARTIFACT_STORE_URI = 'gs://qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default' # TO DO: REPLACE WITH YOUR ARTIFACT_STORE NAME \n", "PROJECT_ID = !(gcloud config get-value core/project)\n", "PROJECT_ID = PROJECT_ID[0]" ] @@ -367,6 +394,13 @@ "TRAINER_IMAGE='gcr.io/{}/{}:{}'.format(PROJECT_ID, IMAGE_NAME, TAG)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### **Note**: Please ignore any **incompatibility ERROR** that may appear for the packages visions as it will not affect the lab's functionality." + ] + }, { "cell_type": "code", "execution_count": null, @@ -420,7 +454,7 @@ "source": [ "#### Set the pipeline's compile time settings\n", "\n", - "The pipeline can run using a security context of the GKE default node pool's service account or the service account defined in the `user-gcp-sa` secret of the Kubernetes namespace hosting Kubeflow Pipelines. If you want to use the `user-gcp-sa` service account you change the value of `USE_KFP_SA` to `True`.\n", + "The pipeline can run using a security context of the GKE default node pool's service account or the service account defined in the `user-gcp-sa` secret of the Kubernetes namespace hosting KFP. If you want to use the `user-gcp-sa` service account you change the value of `USE_KFP_SA` to `True`.\n", "\n", "Note that the default AI Platform Pipelines configuration does not define the `user-gcp-sa` secret." ] @@ -458,7 +492,10 @@ "source": [ "### Exercise\n", "\n", - "Compile the `covertype_training_pipeline.py` with the `dsl-compile` command line:" + "Compile the `covertype_training_pipeline.py` with the `dsl-compile` command line:\n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-02-kfp-pipeline** and opening **lab-02.ipynb**.\n", + "" ] }, { @@ -467,7 +504,7 @@ "metadata": {}, "outputs": [], "source": [ - "# TODO" + "# TO DO: Your code goes here" ] }, { @@ -499,7 +536,10 @@ "source": [ "### Exercise\n", "\n", - "Upload the pipeline to the Kubeflow cluster using the `kfp` command line:" + "Upload the pipeline to the Kubeflow cluster using the `kfp` command line:\n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-02-kfp-pipeline** and opening **lab-02.ipynb**.\n", + "" ] }, { @@ -510,7 +550,7 @@ "source": [ "PIPELINE_NAME='covertype_continuous_training'\n", "\n", - "# TODO" + "# TO DO: Your code goes here" ] }, { @@ -539,7 +579,9 @@ "source": [ "### Submit a run\n", "\n", - "Find the ID of the `covertype_continuous_training` pipeline you uploaded in the previous step and update the value of `PIPELINE_ID` .\n" + "Find the ID of the `covertype_continuous_training` pipeline you uploaded in the previous step and update the value of `PIPELINE_ID` .\n", + "\n", + "\n" ] }, { @@ -548,7 +590,7 @@ "metadata": {}, "outputs": [], "source": [ - "PIPELINE_ID='0918568d-758c-46cf-9752-e04a4403cd84'" + "PIPELINE_ID='0918568d-758c-46cf-9752-e04a4403cd84' # TO DO: REPLACE WITH YOUR PIPELINE ID " ] }, { @@ -582,8 +624,12 @@ "- EXPERIMENT_NAME is set to the experiment used to run the pipeline. You can choose any name you want. If the experiment does not exist it will be created by the command\n", "- RUN_ID is the name of the run. You can use an arbitrary name\n", "- PIPELINE_ID is the id of your pipeline. Use the value retrieved by the `kfp pipeline list` command\n", - "- GCS_STAGING_PATH is the URI to the GCS location used by the pipeline to store intermediate files. By default, it is set to the `staging` folder in your artifact store.\n", - "- REGION is a compute region for AI Platform Training and Prediction." + "- GCS_STAGING_PATH is the URI to the Cloud Storage location used by the pipeline to store intermediate files. By default, it is set to the `staging` folder in your artifact store.\n", + "- REGION is a compute region for AI Platform Training and Prediction.\n", + "\n", + "\n", + "NOTE: If you need help, you may take a look at the complete solution by navigating to **mlops-on-gcp > workshops > kfp-caip-sklearn > lab-02-kfp-pipeline** and opening **lab-02.ipynb**.\n", + "" ] }, { @@ -592,7 +638,7 @@ "metadata": {}, "outputs": [], "source": [ - "# TODO" + "# TO DO: Your code goes here" ] }, { @@ -608,7 +654,7 @@ "https://[ENDPOINT]\n", "\n", "\n", - "**NOTE that your pipeline run may fail due to the bug in a BigQuery component that does not handle certain race conditions. If you observe the pipeline failure, retry the run from the KFP UI**\n" + "**NOTE that your pipeline run may fail due to the bug in a BigQuery component that does not handle certain race conditions. If you observe the pipeline failure, re-run the last cell of the notebook to submit another pipeline run or retry the run from the KFP UI**\n" ] }, { @@ -644,4 +690,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/lab-02.ipynb b/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/lab-02.ipynb index 6ad6a6bb..86a8bc3e 100644 --- a/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/lab-02.ipynb +++ b/on_demand/kfp-caip-sklearn/lab-02-kfp-pipeline/lab-02.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Continuous training pipeline with KFP and Cloud AI Platform" + "# Continuous training pipeline with Kubeflow Pipeline and AI Platform" ] }, { @@ -12,13 +12,13 @@ "metadata": {}, "source": [ "**Learning Objectives:**\n", - "1. Learn how to use KF pre-build components (BiqQuery, CAIP training and predictions)\n", - "1. Learn how to use KF lightweight python components\n", - "1. Learn how to build a KF pipeline with these components\n", - "1. Learn how to compile, upload, and run a KF pipeline with the command line\n", + "1. Learn how to use Kubeflow Pipeline(KFP) pre-build components (BiqQuery, AI Platform training and predictions)\n", + "1. Learn how to use KFP lightweight python components\n", + "1. Learn how to build a KFP with these components\n", + "1. Learn how to compile, upload, and run a KFP with the command line\n", "\n", "\n", - "In this lab, you will build, deploy, and run a KFP pipeline that orchestrates **BigQuery** and **Cloud AI Platform** services to train, tune, and deploy a **scikit-learn** model.\n" + "In this lab, you will build, deploy, and run a KFP pipeline that orchestrates **BigQuery** and **AI Platform** services to train, tune, and deploy a **scikit-learn** model." ] }, { @@ -82,7 +82,7 @@ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", - "\"\"\"KFP pipeline orchestrating BigQuery and Cloud AI Platform services.\"\"\"\n", + "\"\"\"KFP orchestrating BigQuery and Cloud AI Platform services.\"\"\"\n", "\n", "import os\n", "\n", @@ -344,13 +344,35 @@ "Update the below constants with the settings reflecting your lab environment. \n", "\n", "- `REGION` - the compute region for AI Platform Training and Prediction\n", - "- `ARTIFACT_STORE` - the GCS bucket created during installation of AI Platform Pipelines. The bucket name starts with the `hostedkfp-default-` prefix.\n", + "- `ARTIFACT_STORE` - the GCS bucket created during installation of AI Platform Pipelines. The bucket name will be similar to `qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default`.\n", "- `ENDPOINT` - set the `ENDPOINT` constant to the endpoint to your AI Platform Pipelines instance. Then endpoint to the AI Platform Pipelines instance can be found on the [AI Platform Pipelines](https://console.cloud.google.com/ai-platform/pipelines/clusters) page in the Google Cloud Console.\n", "\n", - "1. Open the *SETTINGS* for your instance\n", - "2. Use the value of the `host` variable in the *Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SKD* section of the *SETTINGS* window." + "1. Open the **SETTINGS** for your instance\n", + "2. Use the value of the `host` variable in the **Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SKD** section of the **SETTINGS** window.\n", + "\n", + "Run gsutil ls without URLs to list all of the Cloud Storage buckets under your default project ID." ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!gsutil ls" + ] + }, + { + "source": [ + "**HINT:** \n", + "\n", + "For **ENDPOINT**, use the value of the `host` variable in the **Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SDK** section of the **SETTINGS** window.\n", + "\n", + "For **ARTIFACT_STORE_URI**, copy the bucket name which starts with the qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default prefix from the previous cell output. Your copied value should look like **'gs://qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default'**" + ], + "cell_type": "markdown", + "metadata": {} + }, { "cell_type": "code", "execution_count": null, @@ -358,8 +380,8 @@ "outputs": [], "source": [ "REGION = 'us-central1'\n", - "ENDPOINT = '337dd39580cbcbd2-dot-us-central2.pipelines.googleusercontent.com'\n", - "ARTIFACT_STORE_URI = 'gs://hostedkfp-default-e8c59nl4zo'\n", + "ENDPOINT = '337dd39580cbcbd2-dot-us-central2.pipelines.googleusercontent.com' # TO DO: REPLACE WITH YOUR ENDPOINT\n", + "ARTIFACT_STORE_URI = 'gs://qwiklabs-gcp-xx-xxxxxxx-kubeflowpipelines-default' # TO DO: REPLACE WITH YOUR ARTIFACT_STORE NAME \n", "PROJECT_ID = !(gcloud config get-value core/project)\n", "PROJECT_ID = PROJECT_ID[0]" ] @@ -382,6 +404,13 @@ "TRAINER_IMAGE='gcr.io/{}/{}:{}'.format(PROJECT_ID, IMAGE_NAME, TAG)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### **Note**: Please ignore any **incompatibility ERROR** that may appear for the packages visions as it will not affect the lab's functionality." + ] + }, { "cell_type": "code", "execution_count": null, @@ -435,7 +464,7 @@ "source": [ "#### Set the pipeline's compile time settings\n", "\n", - "The pipeline can run using a security context of the GKE default node pool's service account or the service account defined in the `user-gcp-sa` secret of the Kubernetes namespace hosting Kubeflow Pipelines. If you want to use the `user-gcp-sa` service account you change the value of `USE_KFP_SA` to `True`.\n", + "The pipeline can run using a security context of the GKE default node pool's service account or the service account defined in the `user-gcp-sa` secret of the Kubernetes namespace hosting KFP. If you want to use the `user-gcp-sa` service account you change the value of `USE_KFP_SA` to `True`.\n", "\n", "Note that the default AI Platform Pipelines configuration does not define the `user-gcp-sa` secret." ] @@ -538,7 +567,8 @@ "source": [ "### Submit a run\n", "\n", - "Find the ID of the `covertype_continuous_training` pipeline you uploaded in the previous step and update the value of `PIPELINE_ID` .\n" + "Find the ID of the `covertype_continuous_training` pipeline you uploaded in the previous step and update the value of `PIPELINE_ID` .\n", + "\n" ] }, { @@ -547,7 +577,7 @@ "metadata": {}, "outputs": [], "source": [ - "PIPELINE_ID='0918568d-758c-46cf-9752-e04a4403cd84'" + "PIPELINE_ID='0918568d-758c-46cf-9752-e04a4403cd84' # TO DO: REPLACE WITH YOUR PIPELINE ID " ] }, { @@ -569,6 +599,21 @@ "GCS_STAGING_PATH = '{}/staging'.format(ARTIFACT_STORE_URI)" ] }, + { + "source": [ + "Run the pipeline using the `kfp` command line by retrieving the variables from the environment to pass to the pipeline where:\n", + "\n", + "- EXPERIMENT_NAME is set to the experiment used to run the pipeline. You can choose any name you want. If the experiment does not exist it will be created by the command\n", + "- RUN_ID is the name of the run. You can use an arbitrary name\n", + "- PIPELINE_ID is the id of your pipeline. Use the value retrieved by the `kfp pipeline list` command\n", + "- GCS_STAGING_PATH is the URI to the Cloud Storage location used by the pipeline to store intermediate files. By default, it is set to the `staging` folder in your artifact store.\n", + "- REGION is a compute region for AI Platform Training and Prediction. \n", + "\n", + "You should be already familiar with these and other parameters passed to the command. If not go back and review the pipeline code." + ], + "cell_type": "markdown", + "metadata": {} + }, { "cell_type": "code", "execution_count": null, @@ -591,21 +636,6 @@ "replace_existing_version=$REPLACE_EXISTING_VERSION" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "where\n", - "\n", - "- EXPERIMENT_NAME is set to the experiment used to run the pipeline. You can choose any name you want. If the experiment does not exist it will be created by the command\n", - "- RUN_ID is the name of the run. You can use an arbitrary name\n", - "- PIPELINE_ID is the id of your pipeline. Use the value retrieved by the `kfp pipeline list` command\n", - "- GCS_STAGING_PATH is the URI to the GCS location used by the pipeline to store intermediate files. By default, it is set to the `staging` folder in your artifact store.\n", - "- REGION is a compute region for AI Platform Training and Prediction. \n", - "\n", - "You should be already familiar with these and other parameters passed to the command. If not go back and review the pipeline code.\n" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -619,7 +649,7 @@ "https://[ENDPOINT]\n", "\n", "\n", - "**NOTE that your pipeline run may fail due to the bug in a BigQuery component that does not handle certain race conditions. If you observe the pipeline failure, retry the run from the KFP UI**\n" + "**NOTE that your pipeline run may fail due to the bug in a BigQuery component that does not handle certain race conditions. If you observe the pipeline failure, re-run the last cell of the notebook to submit another pipeline run or retry the run from the KFP UI**\n" ] }, { @@ -655,4 +685,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/on_demand/kfp-caip-sklearn/lab-03-kfp-cicd/README.md b/on_demand/kfp-caip-sklearn/lab-03-kfp-cicd/README.md index 585e6648..44765328 100644 --- a/on_demand/kfp-caip-sklearn/lab-03-kfp-cicd/README.md +++ b/on_demand/kfp-caip-sklearn/lab-03-kfp-cicd/README.md @@ -1,6 +1,6 @@ -# CI/CD for a KFP pipeline +# CI/CD for a Kubeflow Pipeline -In this lab you will walk through authoring of a **Cloud Build** CI/CD workflow that automatically builds and deploys a KFP pipeline. You will also integrate your workflow with **GitHub** by setting up a trigger that starts the workflow when a new tag is applied to the **GitHub** repo hosting the pipeline's code. +In this lab you will walk through authoring of a **Cloud Build** CI/CD workflow that automatically builds and deploys a Kubeflow Pipeline (KFP). You will also integrate your workflow with **GitHub** by setting up a trigger that starts the workflow when a new tag is applied to the **GitHub** repo hosting the pipeline's code. ## Lab instructions