Skip to content
This repository has been archived by the owner on Feb 3, 2021. It is now read-only.

Add Spark Shuffle Service Support #370

Closed
jafreck opened this issue Feb 6, 2018 · 4 comments
Closed

Add Spark Shuffle Service Support #370

jafreck opened this issue Feb 6, 2018 · 4 comments
Assignees
Labels
Milestone

Comments

@jafreck
Copy link
Member

jafreck commented Feb 6, 2018

Enable shuffle service by default by setting spark.shuffle.service.enabled true in spark-defaults.conf.

Also, open port 7337 in container.

https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior

@jafreck jafreck added the feature label Feb 6, 2018
@jafreck jafreck self-assigned this Feb 7, 2018
@jafreck jafreck added this to the v0.5.2 milestone Feb 7, 2018
@jiata
Copy link
Contributor

jiata commented Feb 8, 2018

why are we setting non-default configurations for spark? Shouldn't this be something we allow them to enable if they wish.

@paselem
Copy link
Contributor

paselem commented Feb 8, 2018

This is similar to the decision we made regarding the chunk size for data. It's a 'better' default for distributed spark clusters since it makes it more reliable to nodes leaving the cluster. In this case, we can rely on Spark's shuffle system to move data around and not force a re-compute if a node goes away. With our focus on low-pri nodes, I feel this is probably a good default for AZTK.

@jafreck
Copy link
Member Author

jafreck commented Feb 8, 2018

Additionally, enabling the shuffle service is required for dynamic allocation. And dynamic allocation allows re-scaling of the cluster during Spark application execution.

So without the shuffle service and dynamic allocation enabled by default, resizing your cluster will only affect subsequent jobs.

This is a very large deal for job submission mode, where the spark submits are executed immediately after the master is elected, and potentially before all workers in the cluster are up.

@jiata
Copy link
Contributor

jiata commented Feb 9, 2018

makes sense. thanks guys

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants