add shard size to dataframe_to_xshards #3491

shanyu-sys · 2021-02-10T02:31:25Z

No description provided.

shanyu-sys · 2021-02-19T09:01:51Z

Interface change:
Add shard_size in fit, predict and evaluate of PyTorchRayEstimator and TensorFlow2Estimator.
shard_size is for Spark DataFrame input, which specifies the number of Rows to transform as one shard of SparkXShards. It defaults to None, in which case Rows in one partition will be transformed as one shard.

jason-dai · 2021-02-19T11:56:11Z

Interface change:
Add shard_size in fit, predict and evaluate of PyTorchRayEstimator and TensorFlow2Estimator.
shard_size is for Spark DataFrame input, which specifies the number of Rows to transform as one shard of SparkXShards. It defaults to None, in which case Rows in one partition will be transformed as one shard.

I don't think it's reasonable to expose shard_size to the user; we can add an internal property in OrcaContext for test purpose.

yangw1234 · 2021-02-22T13:32:37Z

LGTM

shanyu-sys · 2021-02-22T13:37:56Z

I don't think it's reasonable to expose shard_size to the user; we can add an internal property in OrcaContext for test purpose.

shard_size has been added in OrcaContext

pyzoo/zoo/orca/common.py

shanyu-sys · 2021-03-01T03:24:10Z

Jenkins:
http://10.239.47.210:18888/view/ZOO-PR/job/ZOO-PR-Validation/5106/
http://10.239.47.210:18888/job/ZOO-PR-Python-Spark-2.4-py36-ray/1321/

* add shard size to dataframe_to_xshards * add ut and change default shard_size as None * add shard size in orca context and fix style * add ut in pytorch estimator and tf estimator * move shard_size to internal use * fix

* Orca PyTorch Estimator load data once (#2669) * load once * fix * fix * remove num_steps * remove iter * meet comments * wrap once * style fix * Add Orca Overview and Context Doc (#2748) * add docs * minor * Add init_orca_context (#2774) * initial imple * update * meet review * review and style * remove stopped * add doc * minor * move import * fix mxnet * remove * Update UTs and examples with init_orca_context (#2787) * update unit tests * minor * update * update mxnet * move barrier * fix mxnet * update * bug fix * update * update test * update mxnet example * update mxnet * minor * minor * minor * update examples * move ray import dependencies * readme * minor * bug fix * remove default * Add website doc for init_orca_context (#2822) * Support numa binding in init_spark_standalone (#2847) * support numa binding in init_spark_standalone * add doc and add to orca context * address comments * address comments * update scripts * hyperthreading * fix * shutdown hook (#2853) * Support RayOnSpark for k8s and add docs (#2836) * support ray on k8s * add to init orca context * style * minor * minor * ut * Fix stop_orca_context being called twice (#2878) * Update website doc of init orca context (#2879) * update doc * update * update Torch example (#3022) * update example * Update README.md * update resnet_finetune.py * delete some file * Create README.md * Update README.md * Update README.md * Update README.md * add detect conda env name * some change * update run example scripte * update init orca context * attempt to fix ray memory (#3205) * attempt to fix ray memory * exclude webui * Update doc for orca context (#3287) * update * add back * update * update * minor * meet review * fix typo * Add memory type support for Orca tf estimator (#3280) * add mem type for dataframe dataset * add ZooTrigger support * update mem type * add test * update * update mem type and orca context * update orca context docs * update orca context with comments * fix style * Add get_spark_session in OrcaContext (#3520) * add and modify * style * add shard size to dataframe_to_xshards (#3491) * add shard size to dataframe_to_xshards * add ut and change default shard_size as None * add shard size in orca context and fix style * add ut in pytorch estimator and tf estimator * move shard_size to internal use * fix * Add support for non-barrier mode to launch ray (#4014) * add support for non-barrier mode * fix style * meet review * meet review * move barrier mode to zoocontext * bug fix * modify * update * remove driver cores (#4169) * Add OrcaContext and make spark default read file backend (#2593) * orca context * handle error * enrich error msg * meet review * move log output * style * meet review * add zoocontext * fix ut * change import Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com> Co-authored-by: Kai Huang <huangkaivision@gmail.com> Co-authored-by: Yang Wang <yang3.wang@intel.com> Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com> Co-authored-by: jenniew <jenniewang123@gmail.com> Co-authored-by: dding3 <ding.ding@intel.com> Co-authored-by: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com>

* add shard size to dataframe_to_xshards * add ut and change default shard_size as None * add shard size in orca context and fix style * add ut in pytorch estimator and tf estimator * move shard_size to internal use * fix

* Orca PyTorch Estimator load data once (intel-analytics#2669) * load once * fix * fix * remove num_steps * remove iter * meet comments * wrap once * style fix * Add Orca Overview and Context Doc (intel-analytics#2748) * add docs * minor * Add init_orca_context (intel-analytics#2774) * initial imple * update * meet review * review and style * remove stopped * add doc * minor * move import * fix mxnet * remove * Update UTs and examples with init_orca_context (intel-analytics#2787) * update unit tests * minor * update * update mxnet * move barrier * fix mxnet * update * bug fix * update * update test * update mxnet example * update mxnet * minor * minor * minor * update examples * move ray import dependencies * readme * minor * bug fix * remove default * Add website doc for init_orca_context (intel-analytics#2822) * Support numa binding in init_spark_standalone (intel-analytics#2847) * support numa binding in init_spark_standalone * add doc and add to orca context * address comments * address comments * update scripts * hyperthreading * fix * shutdown hook (intel-analytics#2853) * Support RayOnSpark for k8s and add docs (intel-analytics#2836) * support ray on k8s * add to init orca context * style * minor * minor * ut * Fix stop_orca_context being called twice (intel-analytics#2878) * Update website doc of init orca context (intel-analytics#2879) * update doc * update * update Torch example (intel-analytics#3022) * update example * Update README.md * update resnet_finetune.py * delete some file * Create README.md * Update README.md * Update README.md * Update README.md * add detect conda env name * some change * update run example scripte * update init orca context * attempt to fix ray memory (intel-analytics#3205) * attempt to fix ray memory * exclude webui * Update doc for orca context (intel-analytics#3287) * update * add back * update * update * minor * meet review * fix typo * Add memory type support for Orca tf estimator (intel-analytics#3280) * add mem type for dataframe dataset * add ZooTrigger support * update mem type * add test * update * update mem type and orca context * update orca context docs * update orca context with comments * fix style * Add get_spark_session in OrcaContext (intel-analytics#3520) * add and modify * style * add shard size to dataframe_to_xshards (intel-analytics#3491) * add shard size to dataframe_to_xshards * add ut and change default shard_size as None * add shard size in orca context and fix style * add ut in pytorch estimator and tf estimator * move shard_size to internal use * fix * Add support for non-barrier mode to launch ray (intel-analytics#4014) * add support for non-barrier mode * fix style * meet review * meet review * move barrier mode to zoocontext * bug fix * modify * update * remove driver cores (intel-analytics#4169) * Add OrcaContext and make spark default read file backend (intel-analytics#2593) * orca context * handle error * enrich error msg * meet review * move log output * style * meet review * add zoocontext * fix ut * change import Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com> Co-authored-by: Kai Huang <huangkaivision@gmail.com> Co-authored-by: Yang Wang <yang3.wang@intel.com> Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com> Co-authored-by: jenniew <jenniewang123@gmail.com> Co-authored-by: dding3 <ding.ding@intel.com> Co-authored-by: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com>

* add shard size to dataframe_to_xshards * add ut and change default shard_size as None * add shard size in orca context and fix style * add ut in pytorch estimator and tf estimator * move shard_size to internal use * fix

* Orca PyTorch Estimator load data once (intel-analytics#2669) * load once * fix * fix * remove num_steps * remove iter * meet comments * wrap once * style fix * Add Orca Overview and Context Doc (intel-analytics#2748) * add docs * minor * Add init_orca_context (intel-analytics#2774) * initial imple * update * meet review * review and style * remove stopped * add doc * minor * move import * fix mxnet * remove * Update UTs and examples with init_orca_context (intel-analytics#2787) * update unit tests * minor * update * update mxnet * move barrier * fix mxnet * update * bug fix * update * update test * update mxnet example * update mxnet * minor * minor * minor * update examples * move ray import dependencies * readme * minor * bug fix * remove default * Add website doc for init_orca_context (intel-analytics#2822) * Support numa binding in init_spark_standalone (intel-analytics#2847) * support numa binding in init_spark_standalone * add doc and add to orca context * address comments * address comments * update scripts * hyperthreading * fix * shutdown hook (intel-analytics#2853) * Support RayOnSpark for k8s and add docs (intel-analytics#2836) * support ray on k8s * add to init orca context * style * minor * minor * ut * Fix stop_orca_context being called twice (intel-analytics#2878) * Update website doc of init orca context (intel-analytics#2879) * update doc * update * update Torch example (intel-analytics#3022) * update example * Update README.md * update resnet_finetune.py * delete some file * Create README.md * Update README.md * Update README.md * Update README.md * add detect conda env name * some change * update run example scripte * update init orca context * attempt to fix ray memory (intel-analytics#3205) * attempt to fix ray memory * exclude webui * Update doc for orca context (intel-analytics#3287) * update * add back * update * update * minor * meet review * fix typo * Add memory type support for Orca tf estimator (intel-analytics#3280) * add mem type for dataframe dataset * add ZooTrigger support * update mem type * add test * update * update mem type and orca context * update orca context docs * update orca context with comments * fix style * Add get_spark_session in OrcaContext (intel-analytics#3520) * add and modify * style * add shard size to dataframe_to_xshards (intel-analytics#3491) * add shard size to dataframe_to_xshards * add ut and change default shard_size as None * add shard size in orca context and fix style * add ut in pytorch estimator and tf estimator * move shard_size to internal use * fix * Add support for non-barrier mode to launch ray (intel-analytics#4014) * add support for non-barrier mode * fix style * meet review * meet review * move barrier mode to zoocontext * bug fix * modify * update * remove driver cores (intel-analytics#4169) * Add OrcaContext and make spark default read file backend (intel-analytics#2593) * orca context * handle error * enrich error msg * meet review * move log output * style * meet review * add zoocontext * fix ut * change import Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com> Co-authored-by: Kai Huang <huangkaivision@gmail.com> Co-authored-by: Yang Wang <yang3.wang@intel.com> Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com> Co-authored-by: jenniew <jenniewang123@gmail.com> Co-authored-by: dding3 <ding.ding@intel.com> Co-authored-by: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com>

* add shard size to dataframe_to_xshards * add ut and change default shard_size as None * add shard size in orca context and fix style * add ut in pytorch estimator and tf estimator * move shard_size to internal use * fix

shanyu-sys requested a review from yangw1234 February 10, 2021 02:31

yangw1234 changed the title ~~add shard size to dataframe_to_xshards~~ [WIP] add shard size to dataframe_to_xshards Feb 18, 2021

shanyu-sys added 2 commits February 18, 2021 22:04

add shard size to dataframe_to_xshards

256e1b6

add ut and change default shard_size as None

5b04049

shanyu-sys force-pushed the refactor_dataframe_to_xshards branch from 360a8f0 to 5b04049 Compare February 19, 2021 02:04

add shard size in orca context and fix style

0851ec1

shanyu-sys force-pushed the refactor_dataframe_to_xshards branch from 363383d to e42ece0 Compare February 22, 2021 02:34

add ut in pytorch estimator and tf estimator

e42ece0

shanyu-sys changed the title ~~[WIP] add shard size to dataframe_to_xshards~~ add shard size to dataframe_to_xshards Feb 22, 2021

jason-dai reviewed Feb 22, 2021

View reviewed changes

pyzoo/zoo/orca/common.py Outdated Show resolved Hide resolved

shanyu-sys added 2 commits February 22, 2021 22:02

move shard_size to internal use

3503dde

fix

5b2708f

shanyu-sys merged commit f1ea33a into intel-analytics:master Mar 1, 2021

shanyu-sys deleted the refactor_dataframe_to_xshards branch March 17, 2021 08:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add shard size to dataframe_to_xshards #3491

add shard size to dataframe_to_xshards #3491

shanyu-sys commented Feb 10, 2021

shanyu-sys commented Feb 19, 2021

jason-dai commented Feb 19, 2021

yangw1234 commented Feb 22, 2021

shanyu-sys commented Feb 22, 2021

shanyu-sys commented Mar 1, 2021

add shard size to dataframe_to_xshards #3491

add shard size to dataframe_to_xshards #3491

Conversation

shanyu-sys commented Feb 10, 2021

shanyu-sys commented Feb 19, 2021

jason-dai commented Feb 19, 2021

yangw1234 commented Feb 22, 2021

shanyu-sys commented Feb 22, 2021

shanyu-sys commented Mar 1, 2021