Skip to content
This repository has been archived by the owner on Feb 3, 2021. It is now read-only.

Update VNET feature branch #284

Merged
merged 30 commits into from
Jan 9, 2018
Merged

Update VNET feature branch #284

merged 30 commits into from
Jan 9, 2018

Conversation

emlyn
Copy link
Contributor

@emlyn emlyn commented Dec 15, 2017

I think there's still a bit of cleaning up and error checking remaining, but it would be good to get feedback on what's here so far.

paselem and others added 21 commits November 30, 2017 12:17
* initial take on installing azure files

* fix cluster.yaml parsing of files shares

* remove test code

* add docs for Azure Files
* initial refactor

* rename cli_fe to cli

* add docs for sdk client

* typo

* remove conflict

* fix zip node scripts bug, add sdk_example program

* start models docs

* add ClusterConfiguration docs, fix merge bug

* Application docs update

* added Application and SparkConfiguration docs

* whitespace

* rename cli.py and spark/cli

* add docstring for load_spark_client
* conditionally install and use nvidia-docker

* status statements, and -y flag for install

* add example, remove unnecessary ppa

* rename custom script, remove print statement, update example

* add Dockerfile

* fix path in Dockerfile

* update Docker images to use service account

* updated docs, changed default docker repo for gpu skus

* make timing statements more verbose

* remove unnecessary script

* added gpu docs

* fix up docs and numba example
* update docker-image readme with new images

* update docs
* Update 60-gpu.md

make sure is available in region

* Update 60-gpu.md
* Added rstudio server script

* Added rstudio server port to aztk sdk

* Added R dockerfiles

* Added new line on dockerfiles

* Pointing dockerfiles to new aztk-base

* allow any user or application in the server to write to the history server log directory
* Retry asking for password when it doesn't match or is empty

* Limit to 3 retries and let user know of add-user command on failure

* Throw error on failure
* fix wrong path for global secrets

* load spark_conf files correctly

* docker-image docs fix

* docker-image docs fix

* move load_aztk_spark_config function to config.py
* add default filesystem master ha

* move settings to spark-defaults.conf

* whitespace
* Update README.md

streamline and update main readme.md

* Update README.md

* Update README.md

* Update 13-configuration.md

* Update 12-docker-image.md

* Update 12-docker-image.md

* Update README.md

* Create README.md

* Update README.md

* Update 10-clusters.md
* add feedback for cluster create wait

* whitespace

* alphasort imports
Copy link
Member

@jafreck jafreck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, per the discussion in #256. I think you are right about adding validation (specifically to SecretsConfig object in config.py) in and error checking throughout. But this is definitely on track.

@emlyn
Copy link
Contributor Author

emlyn commented Dec 20, 2017

I've added some checks so that it fails if both service principal and shared key auth is provided for either batch or storage. What other checks/validation do you think are needed? (for easy reference, this is how it compares to master: 89f0e54...emlyn:vnet).

@emlyn
Copy link
Contributor Author

emlyn commented Dec 22, 2017

I've also added some checks that the correct servicePrincipal keys are present.
One issue I'm not sure about, is that I currently hard code the resource url for service principal credentials ('https://management.core.windows.net/' / 'https://batch.core.windows.net/'). Is there a way to get the correct values programmatically, or should there be a way for the user to specify these?

@emlyn
Copy link
Contributor Author

emlyn commented Jan 5, 2018

@jafreck @paselem Happy New Year! Is this OK to merge into the feature branch now?

@jafreck
Copy link
Member

jafreck commented Jan 8, 2018

@emlyn yeah this is fine to merge. We will do a final review before merging into master which might include some small changes like variable/yaml key renames. But everything else looks great!

@emlyn
Copy link
Contributor Author

emlyn commented Jan 8, 2018

@jafreck Thanks. don't think I have permission to merge, could you (or someone else with permission) do it please?

@jafreck
Copy link
Member

jafreck commented Jan 9, 2018

@emlyn Sorry -- didn't realize that. I'll take care of it.

@jafreck jafreck merged commit 4b7cb1f into Azure:feature/vnet Jan 9, 2018
@emlyn emlyn deleted the vnet branch January 9, 2018 10:25
@emlyn
Copy link
Contributor Author

emlyn commented Jan 9, 2018

What's the next step, shall I open a PR from the feature branch onto master? It's already a bit behind again, so there are other things showing up in the diff - maybe it's better if someone with appropriate access merges master in first?

@paselem
Copy link
Contributor

paselem commented Jan 9, 2018

@jafreck - can you handle the merge when you get a chance? Probably worth doing it now. I wanted to take some time to test this branch out too since it's a pretty big change.

@emlyn - have you had a chance to do much testing on this yet?

@emlyn
Copy link
Contributor Author

emlyn commented Jan 9, 2018

@paselem I've successfully used it to start a couple of clusters and run a few spark jobs, but haven't done any thorough testing.

@paselem
Copy link
Contributor

paselem commented Jan 10, 2018

Thanks for the update @emlyn. We have another huge change being worked on in parallel for 'serverless jobs', and we are kind of in the same position. Lots of ad-hoc testing, but not enough thorough testing. I think we need to start to seriously think about having good automated tests especially for very large changes like this one.

We have a sync up today to discuss immediate next steps for this project - specifically how do we get the 'serverless jobs' and 'vnet' branches merged into master and released. My main concern is around the amount of churn and testing, so we'll need to get that addressed before we can be confident about the quality of the feature going in. I'm hopeful that by the end of the day we'll have a plan down to getting this merged in and can move forward. I'll update this thread once we know more.

@paselem
Copy link
Contributor

paselem commented Jan 11, 2018

@emlyn - we discussed these changes today. I did some quick testing and so far things look good. I have a local change w/ master merged into it and will probably push that back into this branch soon. We agreed that we want to push this into master by the end of the week. @jafreck has signed off and @timotheeguerin will do some additional testing and reviews. We need to figure out which change will go into master first (this or the serverless jobs). If we sign off on this quickly my guess is that this will make it in sooner.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants