Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add shard size to dataframe_to_xshards #3491

Merged

Conversation

shanyu-sys
Copy link
Contributor

No description provided.

@yangw1234 yangw1234 changed the title add shard size to dataframe_to_xshards [WIP] add shard size to dataframe_to_xshards Feb 18, 2021
@shanyu-sys
Copy link
Contributor Author

Interface change:
Add shard_size in fit, predict and evaluate of PyTorchRayEstimator and TensorFlow2Estimator.
shard_size is for Spark DataFrame input, which specifies the number of Rows to transform as one shard of SparkXShards. It defaults to None, in which case Rows in one partition will be transformed as one shard.

@jason-dai
Copy link
Collaborator

Interface change:
Add shard_size in fit, predict and evaluate of PyTorchRayEstimator and TensorFlow2Estimator.
shard_size is for Spark DataFrame input, which specifies the number of Rows to transform as one shard of SparkXShards. It defaults to None, in which case Rows in one partition will be transformed as one shard.

I don't think it's reasonable to expose shard_size to the user; we can add an internal property in OrcaContext for test purpose.

@shanyu-sys shanyu-sys changed the title [WIP] add shard size to dataframe_to_xshards add shard size to dataframe_to_xshards Feb 22, 2021
@yangw1234
Copy link
Contributor

LGTM

@shanyu-sys
Copy link
Contributor Author

I don't think it's reasonable to expose shard_size to the user; we can add an internal property in OrcaContext for test purpose.

shard_size has been added in OrcaContext

pyzoo/zoo/orca/common.py Outdated Show resolved Hide resolved
@shanyu-sys shanyu-sys merged commit f1ea33a into intel-analytics:master Mar 1, 2021
@shanyu-sys shanyu-sys deleted the refactor_dataframe_to_xshards branch March 17, 2021 08:09
shanyu-sys added a commit to shanyu-sys/analytics-zoo that referenced this pull request Sep 16, 2021
* add shard size to dataframe_to_xshards

* add ut and change default shard_size as None

* add shard size in orca context and fix style

* add ut in pytorch estimator and tf estimator

* move shard_size to internal use

* fix
shanyu-sys added a commit that referenced this pull request Sep 22, 2021
* Orca PyTorch Estimator load data once (#2669)

* load once

* fix

* fix

* remove num_steps

* remove iter

* meet comments

* wrap once

* style fix

* Add Orca Overview and Context Doc (#2748)

* add docs

* minor

* Add init_orca_context (#2774)

* initial imple

* update

* meet review

* review and style

* remove stopped

* add doc

* minor

* move import

* fix mxnet

* remove

* Update UTs and examples with init_orca_context (#2787)

* update unit tests

* minor

* update

* update mxnet

* move barrier

* fix mxnet

* update

* bug fix

* update

* update test

* update mxnet example

* update mxnet

* minor

* minor

* minor

* update examples

* move ray import dependencies

* readme

* minor

* bug fix

* remove default

* Add website doc for init_orca_context (#2822)

* Support numa binding in init_spark_standalone (#2847)

* support numa binding in init_spark_standalone

* add doc and add to orca context

* address comments

* address comments

* update scripts

* hyperthreading

* fix

* shutdown hook (#2853)

* Support RayOnSpark for k8s and add docs (#2836)

* support ray on k8s

* add to init orca context

* style

* minor

* minor

* ut

* Fix stop_orca_context being called twice (#2878)

* Update website doc of init orca context (#2879)

* update doc

* update

* update Torch example (#3022)

* update example

* Update README.md

* update resnet_finetune.py

* delete some file

* Create README.md

* Update README.md

* Update README.md

* Update README.md

* add detect conda env name

* some change

* update run example scripte

* update init orca context

* attempt to fix ray memory (#3205)

* attempt to fix ray memory

* exclude webui

* Update doc for orca context (#3287)

* update

* add back

* update

* update

* minor

* meet review

* fix typo

* Add memory type support for Orca tf estimator (#3280)

* add mem type for dataframe dataset

* add ZooTrigger support

* update mem type

* add test

* update

* update mem type and orca context

* update orca context docs

* update orca context with comments

* fix style

* Add get_spark_session in OrcaContext (#3520)

* add and modify

* style

* add shard size to dataframe_to_xshards (#3491)

* add shard size to dataframe_to_xshards

* add ut and change default shard_size as None

* add shard size in orca context and fix style

* add ut in pytorch estimator and tf estimator

* move shard_size to internal use

* fix

* Add support for non-barrier mode to launch ray (#4014)

* add support for non-barrier mode

* fix style

* meet review

* meet review

* move barrier mode to zoocontext

* bug fix

* modify

* update

* remove driver cores (#4169)

* Add OrcaContext and make spark default read file backend (#2593)

* orca context

* handle error

* enrich error msg

* meet review

* move log output

* style

* meet review

* add zoocontext

* fix ut

* change import

Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Kai Huang <huangkaivision@gmail.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com>
Le-Zheng pushed a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* add shard size to dataframe_to_xshards

* add ut and change default shard_size as None

* add shard size in orca context and fix style

* add ut in pytorch estimator and tf estimator

* move shard_size to internal use

* fix
Le-Zheng added a commit to Le-Zheng/analytics-zoo that referenced this pull request Sep 22, 2021
* Orca PyTorch Estimator load data once (intel-analytics#2669)

* load once

* fix

* fix

* remove num_steps

* remove iter

* meet comments

* wrap once

* style fix

* Add Orca Overview and Context Doc (intel-analytics#2748)

* add docs

* minor

* Add init_orca_context (intel-analytics#2774)

* initial imple

* update

* meet review

* review and style

* remove stopped

* add doc

* minor

* move import

* fix mxnet

* remove

* Update UTs and examples with init_orca_context (intel-analytics#2787)

* update unit tests

* minor

* update

* update mxnet

* move barrier

* fix mxnet

* update

* bug fix

* update

* update test

* update mxnet example

* update mxnet

* minor

* minor

* minor

* update examples

* move ray import dependencies

* readme

* minor

* bug fix

* remove default

* Add website doc for init_orca_context (intel-analytics#2822)

* Support numa binding in init_spark_standalone (intel-analytics#2847)

* support numa binding in init_spark_standalone

* add doc and add to orca context

* address comments

* address comments

* update scripts

* hyperthreading

* fix

* shutdown hook (intel-analytics#2853)

* Support RayOnSpark for k8s and add docs (intel-analytics#2836)

* support ray on k8s

* add to init orca context

* style

* minor

* minor

* ut

* Fix stop_orca_context being called twice (intel-analytics#2878)

* Update website doc of init orca context (intel-analytics#2879)

* update doc

* update

* update Torch example (intel-analytics#3022)

* update example

* Update README.md

* update resnet_finetune.py

* delete some file

* Create README.md

* Update README.md

* Update README.md

* Update README.md

* add detect conda env name

* some change

* update run example scripte

* update init orca context

* attempt to fix ray memory (intel-analytics#3205)

* attempt to fix ray memory

* exclude webui

* Update doc for orca context (intel-analytics#3287)

* update

* add back

* update

* update

* minor

* meet review

* fix typo

* Add memory type support for Orca tf estimator (intel-analytics#3280)

* add mem type for dataframe dataset

* add ZooTrigger support

* update mem type

* add test

* update

* update mem type and orca context

* update orca context docs

* update orca context with comments

* fix style

* Add get_spark_session in OrcaContext (intel-analytics#3520)

* add and modify

* style

* add shard size to dataframe_to_xshards (intel-analytics#3491)

* add shard size to dataframe_to_xshards

* add ut and change default shard_size as None

* add shard size in orca context and fix style

* add ut in pytorch estimator and tf estimator

* move shard_size to internal use

* fix

* Add support for non-barrier mode to launch ray (intel-analytics#4014)

* add support for non-barrier mode

* fix style

* meet review

* meet review

* move barrier mode to zoocontext

* bug fix

* modify

* update

* remove driver cores (intel-analytics#4169)

* Add OrcaContext and make spark default read file backend (intel-analytics#2593)

* orca context

* handle error

* enrich error msg

* meet review

* move log output

* style

* meet review

* add zoocontext

* fix ut

* change import

Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Kai Huang <huangkaivision@gmail.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com>
yangw1234 pushed a commit to yangw1234/analytics-zoo that referenced this pull request Sep 23, 2021
* add shard size to dataframe_to_xshards

* add ut and change default shard_size as None

* add shard size in orca context and fix style

* add ut in pytorch estimator and tf estimator

* move shard_size to internal use

* fix
yangw1234 pushed a commit to yangw1234/analytics-zoo that referenced this pull request Sep 23, 2021
* add shard size to dataframe_to_xshards

* add ut and change default shard_size as None

* add shard size in orca context and fix style

* add ut in pytorch estimator and tf estimator

* move shard_size to internal use

* fix
yangw1234 pushed a commit to yangw1234/analytics-zoo that referenced this pull request Sep 26, 2021
* add shard size to dataframe_to_xshards

* add ut and change default shard_size as None

* add shard size in orca context and fix style

* add ut in pytorch estimator and tf estimator

* move shard_size to internal use

* fix
yangw1234 pushed a commit to yangw1234/analytics-zoo that referenced this pull request Sep 26, 2021
* add shard size to dataframe_to_xshards

* add ut and change default shard_size as None

* add shard size in orca context and fix style

* add ut in pytorch estimator and tf estimator

* move shard_size to internal use

* fix
yangw1234 pushed a commit that referenced this pull request Sep 27, 2021
* add shard size to dataframe_to_xshards

* add ut and change default shard_size as None

* add shard size in orca context and fix style

* add ut in pytorch estimator and tf estimator

* move shard_size to internal use

* fix
dding3 added a commit to dding3/analytics-zoo that referenced this pull request Oct 4, 2021
* Orca PyTorch Estimator load data once (intel-analytics#2669)

* load once

* fix

* fix

* remove num_steps

* remove iter

* meet comments

* wrap once

* style fix

* Add Orca Overview and Context Doc (intel-analytics#2748)

* add docs

* minor

* Add init_orca_context (intel-analytics#2774)

* initial imple

* update

* meet review

* review and style

* remove stopped

* add doc

* minor

* move import

* fix mxnet

* remove

* Update UTs and examples with init_orca_context (intel-analytics#2787)

* update unit tests

* minor

* update

* update mxnet

* move barrier

* fix mxnet

* update

* bug fix

* update

* update test

* update mxnet example

* update mxnet

* minor

* minor

* minor

* update examples

* move ray import dependencies

* readme

* minor

* bug fix

* remove default

* Add website doc for init_orca_context (intel-analytics#2822)

* Support numa binding in init_spark_standalone (intel-analytics#2847)

* support numa binding in init_spark_standalone

* add doc and add to orca context

* address comments

* address comments

* update scripts

* hyperthreading

* fix

* shutdown hook (intel-analytics#2853)

* Support RayOnSpark for k8s and add docs (intel-analytics#2836)

* support ray on k8s

* add to init orca context

* style

* minor

* minor

* ut

* Fix stop_orca_context being called twice (intel-analytics#2878)

* Update website doc of init orca context (intel-analytics#2879)

* update doc

* update

* update Torch example (intel-analytics#3022)

* update example

* Update README.md

* update resnet_finetune.py

* delete some file

* Create README.md

* Update README.md

* Update README.md

* Update README.md

* add detect conda env name

* some change

* update run example scripte

* update init orca context

* attempt to fix ray memory (intel-analytics#3205)

* attempt to fix ray memory

* exclude webui

* Update doc for orca context (intel-analytics#3287)

* update

* add back

* update

* update

* minor

* meet review

* fix typo

* Add memory type support for Orca tf estimator (intel-analytics#3280)

* add mem type for dataframe dataset

* add ZooTrigger support

* update mem type

* add test

* update

* update mem type and orca context

* update orca context docs

* update orca context with comments

* fix style

* Add get_spark_session in OrcaContext (intel-analytics#3520)

* add and modify

* style

* add shard size to dataframe_to_xshards (intel-analytics#3491)

* add shard size to dataframe_to_xshards

* add ut and change default shard_size as None

* add shard size in orca context and fix style

* add ut in pytorch estimator and tf estimator

* move shard_size to internal use

* fix

* Add support for non-barrier mode to launch ray (intel-analytics#4014)

* add support for non-barrier mode

* fix style

* meet review

* meet review

* move barrier mode to zoocontext

* bug fix

* modify

* update

* remove driver cores (intel-analytics#4169)

* Add OrcaContext and make spark default read file backend (intel-analytics#2593)

* orca context

* handle error

* enrich error msg

* meet review

* move log output

* style

* meet review

* add zoocontext

* fix ut

* change import

Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: Kai Huang <huangkaivision@gmail.com>
Co-authored-by: Yang Wang <yang3.wang@intel.com>
Co-authored-by: Xin Qiu <qiuxin2012@users.noreply.github.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
Co-authored-by: dding3 <ding.ding@intel.com>
Co-authored-by: Le-Zheng <30695225+Le-Zheng@users.noreply.github.com>
dding3 pushed a commit to dding3/analytics-zoo that referenced this pull request Oct 4, 2021
* add shard size to dataframe_to_xshards

* add ut and change default shard_size as None

* add shard size in orca context and fix style

* add ut in pytorch estimator and tf estimator

* move shard_size to internal use

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants