Skip to content

Commit

Permalink
ScriptExecutor improvements (#2820)
Browse files Browse the repository at this point in the history
* script executor improvements

* move ScriptExecutor to job_config

* rename ScriptExecutor to ScriptRunner, add TF versions of in process and ex process executors

* fix dead links

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
  • Loading branch information
SYangster and chesterxgchen authored Aug 22, 2024
1 parent bf836c5 commit 16e0c27
Show file tree
Hide file tree
Showing 38 changed files with 394 additions and 294 deletions.
16 changes: 8 additions & 8 deletions examples/advanced/job_api/pt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,27 +19,27 @@ python "script_name.py"
```

```commandline
python fedavg_script_executor_lightning_cifar10.py
python fedavg_script_runner_lightning_cifar10.py
```
### 1. [Federated averaging using the script executor](./fedavg_script_executor_cifar10.py)
### 1. [Federated averaging using the script executor](./fedavg_script_runner_cifar10.py)
Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html).


### 2. [Federated averaging using script executor and differential privacy filter](./fedavg_script_executor_dp_filter_cifar10.py)
### 2. [Federated averaging using script executor and differential privacy filter](./fedavg_script_runner_dp_filter_cifar10.py)
Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
with additional [differential privacy filters](https://arxiv.org/abs/1910.00962) on the client side.
```commandline
python fedavg_script_executor_dp_filter_cifar10.py
python fedavg_script_runner_dp_filter_cifar10.py
```
### 3. [Swarm learning using script executor](./swarm_script_executor_cifar10.py)
### 3. [Swarm learning using script executor](./swarm_script_runner_cifar10.py)
Implementation of [swarm learning](https://www.nature.com/articles/s41586-021-03583-3) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
```commandline
python swarm_script_executor_cifar10.py
python swarm_script_runner_cifar10.py
```
### 4. [Cyclic weight transfer using script executor](./cyclic_cc_script_executor_cifar10.py)
### 4. [Cyclic weight transfer using script executor](./cyclic_cc_script_runner_cifar10.py)
Implementation of [cyclic weight transfer](https://arxiv.org/abs/1709.05929) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
```commandline
python cyclic_cc_script_executor_cifar10.py
python cyclic_cc_script_runner_cifar10.py
```
### 5. [Federated averaging using model learning](./fedavg_model_learner_xsite_val_cifar10.py))
Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [model learner class](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/model_learner.html),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@

from nvflare.app_common.ccwf.ccwf_job import CCWFJob, CyclicClientConfig, CyclicServerConfig
from nvflare.app_common.ccwf.comps.simple_model_shareable_generator import SimpleModelShareableGenerator
from nvflare.app_common.executors.script_executor import ScriptExecutor
from nvflare.app_opt.pt.file_model_persistor import PTFileModelPersistor
from nvflare.job_config.script_runner import ScriptRunner

if __name__ == "__main__":
n_clients = 2
Expand All @@ -29,7 +29,7 @@
job.add_cyclic(
server_config=CyclicServerConfig(num_rounds=num_rounds, max_status_report_interval=300),
client_config=CyclicClientConfig(
executor=ScriptExecutor(task_script_path=train_script),
executor=ScriptRunner(script=train_script),
persistor=PTFileModelPersistor(model=Net()),
shareable_generator=SimpleModelShareableGenerator(),
),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@

from src.net import Net

from nvflare.app_common.executors.script_executor import ScriptExecutor
from nvflare.app_common.workflows.fedavg import FedAvg
from nvflare.app_opt.pt.job_config.model import PTModel

# from nvflare.app_opt.pt.job_config.fed_avg import FedAvgJob
from nvflare.job_config.api import FedJob
from nvflare.job_config.script_runner import ScriptRunner

if __name__ == "__main__":
n_clients = 2
Expand All @@ -44,8 +44,8 @@

# Add clients
for i in range(n_clients):
executor = ScriptExecutor(
task_script_path=train_script, task_script_args="" # f"--batch_size 32 --data_path /tmp/data/site-{i}"
executor = ScriptRunner(
script=train_script, script_args="" # f"--batch_size 32 --data_path /tmp/data/site-{i}"
)
job.to(executor, target=f"site-{i}")
# job.to_clients(executor)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@
from src.net import Net

from nvflare import FilterType
from nvflare.app_common.executors.script_executor import ScriptExecutor
from nvflare.app_common.filters.percentile_privacy import PercentilePrivacy
from nvflare.app_opt.pt.job_config.fed_avg import FedAvgJob
from nvflare.job_config.script_runner import ScriptRunner

if __name__ == "__main__":
n_clients = 2
Expand All @@ -27,7 +27,7 @@
job = FedAvgJob(name="cifar10_fedavg_privacy", num_rounds=num_rounds, n_clients=n_clients, initial_model=Net())

for i in range(n_clients):
executor = ScriptExecutor(task_script_path=train_script, task_script_args="")
executor = ScriptRunner(script=train_script, script_args="")
job.to(executor, f"site-{i}", tasks=["train"])

# add privacy filter.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@

from src.lit_net import LitNet

from nvflare.app_common.executors.script_executor import ScriptExecutor
from nvflare.app_opt.pt.job_config.fed_avg import FedAvgJob
from nvflare.job_config.script_runner import ScriptRunner

if __name__ == "__main__":
n_clients = 2
Expand All @@ -26,8 +26,8 @@

# Add clients
for i in range(n_clients):
executor = ScriptExecutor(
task_script_path=train_script, task_script_args="" # f"--batch_size 32 --data_path /tmp/data/site-{i}"
executor = ScriptRunner(
script=train_script, script_args="" # f"--batch_size 32 --data_path /tmp/data/site-{i}"
)
job.to(executor, f"site-{i}")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
from nvflare.app_common.aggregators.intime_accumulate_model_aggregator import InTimeAccumulateWeightedAggregator
from nvflare.app_common.ccwf.ccwf_job import CCWFJob, CrossSiteEvalConfig, SwarmClientConfig, SwarmServerConfig
from nvflare.app_common.ccwf.comps.simple_model_shareable_generator import SimpleModelShareableGenerator
from nvflare.app_common.executors.script_executor import ScriptExecutor
from nvflare.app_opt.pt.file_model_persistor import PTFileModelPersistor
from nvflare.job_config.script_runner import ScriptRunner

if __name__ == "__main__":
n_clients = 2
Expand All @@ -31,7 +31,7 @@
job.add_swarm(
server_config=SwarmServerConfig(num_rounds=num_rounds),
client_config=SwarmClientConfig(
executor=ScriptExecutor(task_script_path=train_script, evaluate_task_name="validate"),
executor=ScriptRunner(script=train_script, evaluate_task_name="validate"),
aggregator=aggregator,
persistor=PTFileModelPersistor(model=Net()),
shareable_generator=SimpleModelShareableGenerator(),
Expand Down
4 changes: 2 additions & 2 deletions examples/advanced/job_api/sklearn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ You can also run any of the below scripts directly using
```commandline
python "script_name.py"
```
### 1. [Federated K-Means Clustering](./kmeans_script_executor_higgs.py)
### 1. [Federated K-Means Clustering](./kmeans_script_runner_higgs.py)
Implementation of [K-Means](https://arxiv.org/abs/1602.05629). For more details see this [example](../../../advanced/sklearn-kmeans/README.md)
```commandline
python kmeans_script_executor_higgs.py
python kmeans_script_runner_higgs.py
```

> [!NOTE]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,11 @@

from nvflare import FedJob
from nvflare.app_common.aggregators.collect_and_assemble_aggregator import CollectAndAssembleAggregator
from nvflare.app_common.executors.script_executor import ScriptExecutor
from nvflare.app_common.shareablegenerators.full_model_shareable_generator import FullModelShareableGenerator
from nvflare.app_common.workflows.scatter_and_gather import ScatterAndGather
from nvflare.app_opt.sklearn.joblib_model_param_persistor import JoblibModelParamPersistor
from nvflare.client.config import ExchangeFormat
from nvflare.job_config.script_runner import ScriptRunner

preprocess = True # if False, assume data is already preprocessed and split

Expand Down Expand Up @@ -137,9 +137,9 @@ def split_higgs(input_data_path, input_header_path, output_dir, site_num, sample

# Add clients
for i in range(n_clients):
executor = ScriptExecutor(
task_script_path=train_script,
task_script_args=f"--data_root_dir {data_output_dir}",
executor = ScriptRunner(
script=train_script,
script_args=f"--data_root_dir {data_output_dir}",
params_exchange_format=ExchangeFormat.RAW, # kmeans requires raw values only rather than PyTorch Tensors (the default)
)
job.to(executor, f"site-{i+1}") # HIGGs data splitter assumes site names start from 1
Expand Down
18 changes: 9 additions & 9 deletions examples/advanced/job_api/tf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ In this example, the latest Client APIs were used to implement
client-side training logics (details in file
[`cifar10_tf_fl_alpha_split.py`](src/cifar10_tf_fl_alpha_split.py)),
and the new
[`FedJob`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/fed_job.py#L106)
[`FedJob`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/api.py)
APIs were used to programmatically set up an
`nvflare` job to be exported or ran by simulator (details in file
[`tf_fl_script_executor_cifar10.py`](tf_fl_script_executor_cifar10.py)),
[`tf_fl_script_runner_cifar10.py`](tf_fl_script_runner_cifar10.py)),
alleviating the need of writing job config files, simplifying
development process.

Expand All @@ -41,7 +41,7 @@ pip install -r ./requirements.txt
## 2. Run experiments

This example uses simulator to run all experiments. The script
[`tf_fl_script_executor_cifar10.py`](tf_fl_script_executor_cifar10.py)
[`tf_fl_script_runner_cifar10.py`](tf_fl_script_runner_cifar10.py)
is the main script to be used to launch different experiments with
different arguments (see sections below for details). A script
[`run_jobs.sh`](run_jobs.sh) is also provided to run all experiments
Expand All @@ -55,7 +55,7 @@ any experiment, and you can use `Tensorboard` to visualize the
training and validation process as the experiment runs. Data split
files, summary logs and results will be saved in a workspace
directory, which defaults to `/tmp` and can be configured by setting
`--workspace` argument of the `tf_fl_script_executor_cifar10.py`
`--workspace` argument of the `tf_fl_script_runner_cifar10.py`
script.

> [!WARNING]
Expand All @@ -82,7 +82,7 @@ To simulate a centralized training baseline, we run FedAvg algorithm
with 1 client for 25 rounds, where each round consists of one single epoch.

```
python ./tf_fl_script_executor_cifar10.py \
python ./tf_fl_script_runner_cifar10.py \
--algo centralized \
--n_clients 1 \
--num_rounds 25 \
Expand All @@ -101,7 +101,7 @@ in the centralized baseline above (50*4 divided by 8 clients is 25):
```
for alpha in 1.0 0.5 0.3 0.1; do
python ./tf_fl_script_executor_cifar10.py \
python ./tf_fl_script_runner_cifar10.py \
--algo fedavg \
--n_clients 8 \
--num_rounds 50 \
Expand All @@ -120,7 +120,7 @@ Next, let's try some different FL algorithms on a more heterogeneous split:
side to update the global model from client-side gradients. Here we
use SGD with momentum and cosine learning rate decay:
```
python ./tf_fl_script_executor_cifar10.py \
python ./tf_fl_script_runner_cifar10.py \
--algo fedopt \
--n_clients 8 \
--num_rounds 50 \
Expand All @@ -130,7 +130,7 @@ python ./tf_fl_script_executor_cifar10.py \
```
[FedProx](https://arxiv.org/abs/1812.06127) adds a regularizer to the loss:
```
python ./tf_fl_script_executor_cifar10.py \
python ./tf_fl_script_runner_cifar10.py \
--algo fedprox \
--n_clients 8 \
--num_rounds 50 \
Expand All @@ -145,7 +145,7 @@ during local training following the
described in [Li et al.](https://arxiv.org/abs/2102.02079)

```
python ./tf_fl_script_executor_cifar10.py \
python ./tf_fl_script_runner_cifar10.py \
--algo scaffold \
--n_clients 8 \
--num_rounds 50 \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@
from src.tf_net import ModerateTFNet

from nvflare import FedJob
from nvflare.app_common.executors.script_executor import ScriptExecutor
from nvflare.app_opt.tf.job_config.model import TFModel
from nvflare.job_config.script_runner import ScriptRunner

gpu_devices = tf.config.experimental.list_physical_devices("GPU")
for device in gpu_devices:
Expand Down Expand Up @@ -156,7 +156,7 @@
# Add clients
for i, train_idx_path in enumerate(train_idx_paths):
curr_task_script_args = task_script_args + f" --train_idx_path {train_idx_path}"
executor = ScriptExecutor(task_script_path=train_script, task_script_args=curr_task_script_args)
executor = ScriptRunner(script=train_script, script_args=curr_task_script_args)
job.to(executor, f"site-{i+1}")

# Can export current job to folder.
Expand Down
62 changes: 0 additions & 62 deletions examples/getting_started/pt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,65 +4,3 @@
We provide several examples to quickly get you started using NVFlare's Job API.
All examples in this folder are based on using [PyTorch](https://pytorch.org/) as the model training framework.
Furthermore, we support [PyTorch Lightning](https://lightning.ai).

## Setup environment
First, install nvflare and dependencies:
```commandline
pip install -r requirements.txt
```

## Tutorials
A good starting point for understanding the Job API scripts and NVFlare components are the following tutorials.
### 1. [Federated averaging using script executor](./nvflare_pt_getting_started.ipynb)
Tutorial on [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html).

### 2. [Federated averaging using script executor with Lightning API](./nvflare_lightning_getting_started.ipynb)
Tutorial on [FedAvg](https://arxiv.org/abs/1602.05629) using the [Lightning Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html#id4)

## Examples
You can also run any of the below scripts directly using
```commandline
python "script_name.py"
```
### 1. [Federated averaging using script executor](./fedavg_script_executor_cifar10.py)
Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html).
```commandline
python fedavg_script_executor_cifar10.py
```
### 2. [Federated averaging using script executor with Lightning API](./fedavg_script_executor_lightning_cifar10.py)
Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Lightning Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html#id4)
```commandline
python fedavg_script_executor_lightning_cifar10.py
```
### 3. [Federated averaging using the script executor for all clients](./fedavg_script_executor_cifar10_all.py)
Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html).
Here, we deploy the same configuration to all clients.
```commandline
python fedavg_script_executor_cifar10_all.py
```
### 4. [Federated averaging using script executor and differential privacy filter](./fedavg_script_executor_dp_filter_cifar10.py)
Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
with additional [differential privacy filters](https://arxiv.org/abs/1910.00962) on the client side.
```commandline
python fedavg_script_executor_dp_filter_cifar10.py
```
### 5. [Swarm learning using script executor](./swarm_script_executor_cifar10.py)
Implementation of [swarm learning](https://www.nature.com/articles/s41586-021-03583-3) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
```commandline
python swarm_script_executor_cifar10.py
```
### 6. [Cyclic weight transfer using script executor](./cyclic_cc_script_executor_cifar10.py)
Implementation of [cyclic weight transfer](https://arxiv.org/abs/1709.05929) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
```commandline
python cyclic_cc_script_executor_cifar10.py
```
### 7. [Federated averaging using model learning](./fedavg_model_learner_xsite_val_cifar10.py))
Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [model learner class](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/model_learner.html),
followed by [cross site validation](https://nvflare.readthedocs.io/en/main/programming_guide/controllers/cross_site_model_evaluation.html)
for federated model evaluation.
```commandline
python fedavg_model_learner_xsite_val_cifar10.py
```

> [!NOTE]
> More examples can be found at https://nvidia.github.io/NVFlare.
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@
"outputs": [],
"source": [
"from nvflare import FedJob\n",
"from nvflare.app_common.executors.script_executor import ScriptExecutor\n",
"from nvflare.job_config.script_runner import ScriptRunner\n",
"from nvflare.app_common.workflows.fedavg import FedAvg\n",
"\n",
"job = FedJob(name=\"cifar10_fedavg_lightning\")"
Expand Down Expand Up @@ -411,8 +411,8 @@
"outputs": [],
"source": [
"for i in range(n_clients):\n",
" executor = ScriptExecutor(\n",
" task_script_path=\"src/cifar10_lightning_fl.py\", task_script_args=\"\" # f\"--batch_size 32 --data_path /tmp/data/site-{i}\"\n",
" executor = ScriptRunner(\n",
" script=\"src/cifar10_lightning_fl.py\", script_args=\"\" # f\"--batch_size 32 --data_path /tmp/data/site-{i}\"\n",
" )\n",
" job.to(executor, f\"site-{i+1}\")"
]
Expand Down
Loading

0 comments on commit 16e0c27

Please sign in to comment.