Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback PR #10

Open
wants to merge 21 commits into
base: candidate
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -135,3 +135,15 @@ mlpipeline-ui-metadata.json
*.csv
*.sqllite
model.png

# development files
/venv
/.venv
/bin-tmp

# Visual Studio Code
.vscode

# development app
# TODO(axelmagn): delete after automating tests more
/simple_app
179 changes: 179 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Vertex MLOps Template

This project helps ML Engineers and Data Scientists accelerate the creation of
AI Applications on the [Vertex AI](https://cloud.google.com/vertex-ai) platform.
It is made available as a self-contained source repository, which users are
encouraged to fork and customize to fit their own workflows.

## Getting Started

When using the Vertex MLOps Template, code is organized into "apps". Each app
is packaged into its own container, which provides a self-contained environment
for execution on Vertex services. To get started, this section will walk you
through the creation and deployment of a simple app that performs
classifications on the
[fashion-mnist](https://www.tensorflow.org/datasets/catalog/fashion_mnist)
dataset.

### Prerequisites

In order to complete this tutorial, you will need access to a Google Cloud
Platform account and project, as well as a few basic resources within the
project.

Before starting, take a moment to set up and note the name of the following
resources:

- gcp-project: the GCP project to deploy the app within.
([docs](https://cloud.google.com/resource-manager/docs/creating-managing-projects))
- gcp-region: the region to use for regionalized resources (when in doubt, use
us-central1). ([docs](https://cloud.google.com/compute/docs/regions-zones))
- gcp-storage-root: the Google Cloud Storage path to contain application
objects. The bucket region *must* be set to the same region as `gcp-region`
for some Vertex functionality to work.
([docs](https://cloud.google.com/storage/docs/creating-buckets))

It is recommended that you obtain the Editor or Owner IAM roles within
your project. While more granular IAM roles may be utilized, granular IAM
permissions are beyond the scope of this quickstart.

It will also be necessary to enable the following Cloud Services:
([docs](https://cloud.google.com/service-usage/docs/enable-disable))

```
gcloud services enable \
aiplatform.googleapis.com
appengine.googleapis.com
cloudbuild.googleapis.com
cloudfunctions.googleapis.com
cloudscheduler.googleapis.com
```

### Install Manager Dependencies

```
pip install --user -r requirements.txt
```

or

```
python -m virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
```

### Create an App

```
./bin/manage.sh start app \
--name first_app \
--gcp-project {PROJECT_ID} \
--gcp-region {REGION} \
--gcp-storage-root gs://{BUCKET}/{PATH}

tree first_app
```

### Create a Pipeline

```
./bin/manage.sh start pipeline \
--name first_pipeline \
--app first_app

tree first_app
```

Take a moment to look through the code that was generated in
`first_app.pipelines.first_pipeline.pipeline`. It contains
the boilerplate for a simple pipeline, heavily commented.

Also notice that a new configuration file was created:
`config/pipeline_first_pipeline.yaml`

### Create a Trainer

```
./bin/manage.sh start trainer \
--name first_trainer \
--app first_app
```

Similarly, take a moment to take a look through the trainer generated in
`first_app.trainers.first_trainer.task`.

In your pipeline definition, configure the training component to use
`first_app.trainers.first_trainer.task`. (Search for "PLACEHOLDER")
### Update Configurations

```
cd first_app
cat config/pipeline_first_pipeline.yaml >> config/base.yaml
```

modify the `deploy` section of `config/base.yaml` to contain `first_pipeline`

```
deploy:
pipelines:
first_app: once
```

### Submit Pipeline

```
bash bin/run-local.sh build_pipeline first_pipeline
bash bin/run-local.sh run_pipeline first_pipeline
```

### Automate with Cloud Build

```
gcloud builds submit --config cicd/build_app.yaml
gcloud builds submit --config cicd/release_app.yaml
gcloud builds submit --config cicd/deploy_app.yaml
```

## Resources

- [Best practices for implementing machine learning on Google Cloud](https://cloud.google.com/architecture/ml-on-gcp-best-practices)
- [Practitioners Guide to MLOps: A framework for continuous delivery and automation of machine learning](https://cloud.google.com/resources/mlops-whitepaper)

## Q&A

### How is code in this project organized?

This repository can be used as the beginning of an ML *Project* which
holds your team's ML source code. *Projects* are made up of one or more
*Apps*, which are self-contained code bases that solve a particular ML problem.
In order to solve that problem, *Apps* need to perform a variety of *Tasks*.
*Tasks* can be hierarchical: for example one task may be to launch a pipeline
job, while another may be to perform the contents of a pipeline component, and
that pipeline component may itself be launching a training task on Vertex
Training. For this reason, task should be stateless, and rely on Vertex AI for
managing the state of the App. When tasks need to be automated or executed as
part of a workflow, Cloud Build is leveraged as a deployment tool.

### Why use YAML configs rather than environment variables?

ML applications often involve multiple tasks running on multiple services.
Environment variables, while commonly used to configure services in the [twelve
factor app](https://12factor.net/) methodology, become brittle when required to
be propagated to subtasks. By consolidating configuration into yaml files, we
can obviate the need to propagate environment variables.

### Why use YAML configs rather than {gflags,protobuf,jsonnet,TOML,...}?

There are many tools and languages that can be used to configure an application.
YAML was chosen due to its already widespread use for configuration on google
cloud platform. When authoring the configuration module, we prioritized
simplicity over scalability. If these choices are unsuitable for your use case,
we strongly encourage you to modify your app's configuration code to meet your
needs.

### Why not publish this as a PyPi package?

This project is intended to be customized to fit your needs. We therefore
recommend forking this repository, so that you can start to develop the
customizations and templates that fit your workflow.
20 changes: 20 additions & 0 deletions bin/manage.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash
#
# invoke the manager CLI

readonly PROJECT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." &> /dev/null && pwd )"

PYTHON_CMD="python"
if ! command -v "${PYTHON_CMD}" > /dev/null
then
PYTHON_CMD="python3"
if ! command -v "${PYTHON_CMD}" > /dev/null
then
echo "ERROR: Could not find a python interpreter."
exit
fi
fi

pushd "${PROJECT_DIR}" > /dev/null
"${PYTHON_CMD}" -m mlops_manager "${@}"
popd > /dev/null
Empty file added mlops_manager/__init__.py
Empty file.
25 changes: 25 additions & 0 deletions mlops_manager/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import logging

from . import cli, commands


def main():
# parse arguments
args, unknown = cli._PARSER.parse_known_args()

# configure logging
log_level_num = getattr(logging, args.log_level.upper(), None)
logging.basicConfig(level=log_level_num)

# handle command
if args.command is None:
cli._PARSER.print_help()
elif args.func.strict:
args = cli._PARSER.parse_args()
args.func(args)
else:
args.func(args, unknown)


if __name__ == "__main__":
main()
107 changes: 107 additions & 0 deletions mlops_manager/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
from .templating import get_templates_dir
from argparse import ArgumentParser
import os
import yaml

_PARSER = ArgumentParser()
_PARSER.add_argument("--log_level", help="Specify log level", type=str,
choices=["debug", "info", "warning", "error", "critical"],
default="info")
_COMMANDS_PARSER = _PARSER.add_subparsers(title="commands", dest="command")


def command(args=[], parent=_COMMANDS_PARSER, strict=True):
"""
Decorator for CLI commands.

see commands.py for examples.
"""
def decorator(func):
func.strict = strict
parser = parent.add_parser(func.__name__, description=func.__doc__)
for arg in args:
parser.add_argument(*arg[0], **arg[1])
parser.set_defaults(func=func)
return decorator


def template_command(args=[], parent=_COMMANDS_PARSER, strict=True):
"""
Decorator for a function that handles a subcommand for each template
"""

def decorator(func):
func.strict = strict

name = func.__name__
if name.endswith("_template"):
name = name[:-len("_template")]
parser = parent.add_parser(name, description=func.__doc__)
for arg in args:
parser.add_argument(*arg[0], **arg[1])
parser.set_defaults(func=func)
templates_subparser = parser.add_subparsers(
title="templates", dest="template")

with os.scandir(get_templates_dir()) as scan:
for entry in scan:
# load each template directory as a subcommand
if entry.is_dir():
template_parser = templates_subparser.add_parser(
entry.name)

with os.scandir(entry.path) as scan:
variants = [
variant.name for variant in scan
if variant.is_dir()
]

if "examples" in variants:
variants.remove("examples")
examples_path = os.path.join(entry.path, "examples")
with os.scandir(examples_path) as scan:
examples = [
example.name for example in scan
if example.is_dir()
]
template_parser.add_argument(
"--example",
help="example to included",
action='append',
choices=examples,
dest='examples',
default=[])

template_parser.add_argument(
"--variant",
help="template variant to use",
default='default',
choices=variants,
)

variables_path = os.path.join(entry.path, "variables.yaml")
with open(variables_path, 'r') as f:
variables = yaml.safe_load(f)
add_template_variables_to_parser(
template_parser, variables)

return decorator


def add_template_variables_to_parser(parser, variables):
args = variables.get('args', None)
if args is None:
args = {}
for arg in args:
long_option = arg.lower().replace("_", "-")
long_option = f"--{long_option}"
kwargs = variables['args'].get(arg, None)
if kwargs is None: # guard against explicit Nones in config
kwargs = {}
kwargs['required'] = kwargs.get('required', True)
parser.add_argument(long_option, **kwargs)


def arg(*args, **kwargs):
"""Argument packaging function for command decorator"""
return ([*args], kwargs)
Loading