Skip to content
This repository has been archived by the owner on Feb 3, 2021. It is now read-only.

Feature: spark init docker repo customization #358

Merged
merged 7 commits into from
Feb 7, 2018
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions aztk/utils/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@
"""
DOCKER
"""
DEFAULT_DOCKER_REPO = "aztk/base:spark2.2.0"
DEFAULT_DOCKER_REPO_GPU = "aztk/gpu:spark2.2.0"
DEFAULT_DOCKER_REPO = "aztk/base:latest"
DEFAULT_DOCKER_REPO_GPU = "aztk/gpu:latest"
DEFAULT_SPARK_PYTHON_DOCKER_REPO = "aztk/python:latest"
DEFAULT_SPARK_R_BASE_DOCKER_REPO = "aztk/r-base:latest"
DOCKER_SPARK_CONTAINER_NAME = "spark"

# DOCKER SPARK
Expand Down
34 changes: 29 additions & 5 deletions cli/spark/endpoints/init.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import os
import os
import argparse
import typing
from distutils.dir_util import copy_tree
Expand All @@ -8,14 +8,29 @@
def setup_parser(parser: argparse.ArgumentParser):
parser.add_argument('--global', dest='global_flag', action='store_true',
help="Create a .aztk/ folder in your home directory for global configurations.")
software_parser = parser.add_mutually_exclusive_group()
software_parser.add_argument('--python', action="store_true", required=False)
software_parser.add_argument('--r', '--R', action="store_true", required=False)
software_parser.add_argument('--java', action="store_true", required=False)
software_parser.add_argument('--scala', action="store_true", required=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be confusing to users to split out java & scala? They're essentially the same thing and may leave customers wondering why we have split them out. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No preference, just not sure how to word the flag if we combine them.

Under the hood, --java, --scala and no flag all do the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah.. I just can't think of a word that combines them both, like .NET but for java... I was thinking 'default' but that doesn't seem right either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just 'base'? and add some verbosity to the CLI and docs that this will use the out-of-the-box java/scale environment?



def execute(args: typing.NamedTuple):
# software_specific init
if args.python:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: good candidate for a switch statement. No need to change.

docker_repo = constants.DEFAULT_SPARK_PYTHON_DOCKER_REPO
elif args.r:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to change to args.r || args.R?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both --r and --R go to args.r

docker_repo = constants.DEFAULT_SPARK_R_BASE_DOCKER_REPO
else:
docker_repo = constants.DEFAULT_DOCKER_REPO

if args.global_flag:
create_directory(constants.GLOBAL_INIT_DIRECTORY_DEST)
create_directory(constants.GLOBAL_INIT_DIRECTORY_DEST, docker_repo)
else:
create_directory(constants.LOCAL_INIT_DIRECTORY_DEST)

def create_directory(dest_path: str):
create_directory(constants.LOCAL_INIT_DIRECTORY_DEST, docker_repo)


def create_directory(dest_path: str, docker_repo: str):
config_src_path = constants.INIT_DIRECTORY_SOURCE
config_dest_path = dest_path

Expand All @@ -29,3 +44,12 @@ def create_directory(dest_path: str):

if os.path.isfile(secrets_template_path) and not os.path.isfile(secrets_path):
os.rename(secrets_template_path, secrets_path)

cluster_path = os.path.join(dest_path, 'cluster.yaml')

if os.path.isfile(cluster_path):
with open(cluster_path, 'r') as stream:
cluster_yaml = stream.read()
cluster_yaml = cluster_yaml.replace("docker_repo: \n", "docker_repo: {}\n".format(docker_repo))
with open(cluster_path, 'w') as file:
file.write(cluster_yaml)
2 changes: 1 addition & 1 deletion config/cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ size: 2
username: spark

# docker_repo: <name of docker image repo (for more information, see https://github.com/Azure/aztk/blob/master/docs/12-docker-image.md)>
docker_repo: aztk/base:spark2.2.0
docker_repo:

# # optional custom scripts to run on the Spark master, Spark worker or all nodes in the cluster
# custom_scripts:
Expand Down
8 changes: 8 additions & 0 deletions docs/00-getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,14 @@ The minimum requirements to get started with this package are:
```
This will create a *.aztk* folder with preset configuration files in your current working directory.

If you would like to initialize your AZTK clusters with a specific development toolset, please pass one of the following flags:
```bash
aztk spark init --python
aztk spark init --R
aztk spark init --scala
aztk spark init --java
```

If you wish to have global configuration files that will be read regardless of your current working directory, run:
```bash
aztk spark init --global
Expand Down