ScriptExecutor improvements (#2820)

* script executor improvements * move ScriptExecutor to job_config * rename ScriptExecutor to ScriptRunner, add TF versions of in process and ex process executors * fix dead links --------- Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
NVIDIA · Aug 22, 2024 · 16e0c27 · 16e0c27
1 parent bf836c5
commit 16e0c27
Show file tree

Hide file tree

Showing 38 changed files with 394 additions and 294 deletions.
diff --git a/examples/advanced/job_api/pt/README.md b/examples/advanced/job_api/pt/README.md
@@ -19,27 +19,27 @@ python "script_name.py"
 ```
 
 ```commandline
-python fedavg_script_executor_lightning_cifar10.py
+python fedavg_script_runner_lightning_cifar10.py
 ```
-### 1. [Federated averaging using the script executor](./fedavg_script_executor_cifar10.py)
+### 1. [Federated averaging using the script executor](./fedavg_script_runner_cifar10.py)
 Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html).
 
 
-### 2. [Federated averaging using script executor and differential privacy filter](./fedavg_script_executor_dp_filter_cifar10.py)
+### 2. [Federated averaging using script executor and differential privacy filter](./fedavg_script_runner_dp_filter_cifar10.py)
 Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
 with additional [differential privacy filters](https://arxiv.org/abs/1910.00962) on the client side.
 ```commandline
-python fedavg_script_executor_dp_filter_cifar10.py
+python fedavg_script_runner_dp_filter_cifar10.py
 ```
-### 3. [Swarm learning using script executor](./swarm_script_executor_cifar10.py)
+### 3. [Swarm learning using script executor](./swarm_script_runner_cifar10.py)
 Implementation of [swarm learning](https://www.nature.com/articles/s41586-021-03583-3) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
 ```commandline
-python swarm_script_executor_cifar10.py
+python swarm_script_runner_cifar10.py
 ```
-### 4. [Cyclic weight transfer using script executor](./cyclic_cc_script_executor_cifar10.py)
+### 4. [Cyclic weight transfer using script executor](./cyclic_cc_script_runner_cifar10.py)
 Implementation of [cyclic weight transfer](https://arxiv.org/abs/1709.05929) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
 ```commandline
-python cyclic_cc_script_executor_cifar10.py
+python cyclic_cc_script_runner_cifar10.py
 ```
 ### 5. [Federated averaging using model learning](./fedavg_model_learner_xsite_val_cifar10.py))
 Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [model learner class](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/model_learner.html),

diff --git a/...i/pt/cyclic_cc_script_executor_cifar10.py → ...api/pt/cyclic_cc_script_runner_cifar10.py b/...i/pt/cyclic_cc_script_executor_cifar10.py → ...api/pt/cyclic_cc_script_runner_cifar10.py
@@ -16,8 +16,8 @@
 
 from nvflare.app_common.ccwf.ccwf_job import CCWFJob, CyclicClientConfig, CyclicServerConfig
 from nvflare.app_common.ccwf.comps.simple_model_shareable_generator import SimpleModelShareableGenerator
-from nvflare.app_common.executors.script_executor import ScriptExecutor
 from nvflare.app_opt.pt.file_model_persistor import PTFileModelPersistor
+from nvflare.job_config.script_runner import ScriptRunner
 
 if __name__ == "__main__":
     n_clients = 2
@@ -29,7 +29,7 @@
     job.add_cyclic(
         server_config=CyclicServerConfig(num_rounds=num_rounds, max_status_report_interval=300),
         client_config=CyclicClientConfig(
-            executor=ScriptExecutor(task_script_path=train_script),
+            executor=ScriptRunner(script=train_script),
             persistor=PTFileModelPersistor(model=Net()),
             shareable_generator=SimpleModelShareableGenerator(),
         ),

diff --git a/..._api/pt/fedavg_script_executor_cifar10.py → ...ob_api/pt/fedavg_script_runner_cifar10.py b/..._api/pt/fedavg_script_executor_cifar10.py → ...ob_api/pt/fedavg_script_runner_cifar10.py
@@ -14,12 +14,12 @@
 
 from src.net import Net
 
-from nvflare.app_common.executors.script_executor import ScriptExecutor
 from nvflare.app_common.workflows.fedavg import FedAvg
 from nvflare.app_opt.pt.job_config.model import PTModel
 
 # from nvflare.app_opt.pt.job_config.fed_avg import FedAvgJob
 from nvflare.job_config.api import FedJob
+from nvflare.job_config.script_runner import ScriptRunner
 
 if __name__ == "__main__":
     n_clients = 2
@@ -44,8 +44,8 @@
 
     # Add clients
     for i in range(n_clients):
-        executor = ScriptExecutor(
-            task_script_path=train_script, task_script_args=""  # f"--batch_size 32 --data_path /tmp/data/site-{i}"
+        executor = ScriptRunner(
+            script=train_script, script_args=""  # f"--batch_size 32 --data_path /tmp/data/site-{i}"
         )
         job.to(executor, target=f"site-{i}")
     # job.to_clients(executor)

diff --git a/...davg_script_executor_dp_filter_cifar10.py → ...fedavg_script_runner_dp_filter_cifar10.py b/...davg_script_executor_dp_filter_cifar10.py → ...fedavg_script_runner_dp_filter_cifar10.py
@@ -15,9 +15,9 @@
 from src.net import Net
 
 from nvflare import FilterType
-from nvflare.app_common.executors.script_executor import ScriptExecutor
 from nvflare.app_common.filters.percentile_privacy import PercentilePrivacy
 from nvflare.app_opt.pt.job_config.fed_avg import FedAvgJob
+from nvflare.job_config.script_runner import ScriptRunner
 
 if __name__ == "__main__":
     n_clients = 2
@@ -27,7 +27,7 @@
     job = FedAvgJob(name="cifar10_fedavg_privacy", num_rounds=num_rounds, n_clients=n_clients, initial_model=Net())
 
     for i in range(n_clients):
-        executor = ScriptExecutor(task_script_path=train_script, task_script_args="")
+        executor = ScriptRunner(script=train_script, script_args="")
         job.to(executor, f"site-{i}", tasks=["train"])
 
         # add privacy filter.

diff --git a/...davg_script_executor_lightning_cifar10.py → ...fedavg_script_runner_lightning_cifar10.py b/...davg_script_executor_lightning_cifar10.py → ...fedavg_script_runner_lightning_cifar10.py
@@ -14,8 +14,8 @@
 
 from src.lit_net import LitNet
 
-from nvflare.app_common.executors.script_executor import ScriptExecutor
 from nvflare.app_opt.pt.job_config.fed_avg import FedAvgJob
+from nvflare.job_config.script_runner import ScriptRunner
 
 if __name__ == "__main__":
     n_clients = 2
@@ -26,8 +26,8 @@
 
     # Add clients
     for i in range(n_clients):
-        executor = ScriptExecutor(
-            task_script_path=train_script, task_script_args=""  # f"--batch_size 32 --data_path /tmp/data/site-{i}"
+        executor = ScriptRunner(
+            script=train_script, script_args=""  # f"--batch_size 32 --data_path /tmp/data/site-{i}"
         )
         job.to(executor, f"site-{i}")
 

diff --git a/...b_api/pt/swarm_script_executor_cifar10.py → ...job_api/pt/swarm_script_runner_cifar10.py b/...b_api/pt/swarm_script_executor_cifar10.py → ...job_api/pt/swarm_script_runner_cifar10.py
@@ -18,8 +18,8 @@
 from nvflare.app_common.aggregators.intime_accumulate_model_aggregator import InTimeAccumulateWeightedAggregator
 from nvflare.app_common.ccwf.ccwf_job import CCWFJob, CrossSiteEvalConfig, SwarmClientConfig, SwarmServerConfig
 from nvflare.app_common.ccwf.comps.simple_model_shareable_generator import SimpleModelShareableGenerator
-from nvflare.app_common.executors.script_executor import ScriptExecutor
 from nvflare.app_opt.pt.file_model_persistor import PTFileModelPersistor
+from nvflare.job_config.script_runner import ScriptRunner
 
 if __name__ == "__main__":
     n_clients = 2
@@ -31,7 +31,7 @@
     job.add_swarm(
         server_config=SwarmServerConfig(num_rounds=num_rounds),
         client_config=SwarmClientConfig(
-            executor=ScriptExecutor(task_script_path=train_script, evaluate_task_name="validate"),
+            executor=ScriptRunner(script=train_script, evaluate_task_name="validate"),
             aggregator=aggregator,
             persistor=PTFileModelPersistor(model=Net()),
             shareable_generator=SimpleModelShareableGenerator(),

diff --git a/examples/advanced/job_api/sklearn/README.md b/examples/advanced/job_api/sklearn/README.md
@@ -15,10 +15,10 @@ You can also run any of the below scripts directly using
 ```commandline
 python "script_name.py"
 ```
-### 1. [Federated K-Means Clustering](./kmeans_script_executor_higgs.py)
+### 1. [Federated K-Means Clustering](./kmeans_script_runner_higgs.py)
 Implementation of [K-Means](https://arxiv.org/abs/1602.05629). For more details see this [example](../../../advanced/sklearn-kmeans/README.md) 
 ```commandline
-python kmeans_script_executor_higgs.py
+python kmeans_script_runner_higgs.py
 ```
 
 > [!NOTE]

diff --git a/...i/sklearn/kmeans_script_executor_higgs.py → ...api/sklearn/kmeans_script_runner_higgs.py b/...i/sklearn/kmeans_script_executor_higgs.py → ...api/sklearn/kmeans_script_runner_higgs.py
@@ -20,11 +20,11 @@
 
 from nvflare import FedJob
 from nvflare.app_common.aggregators.collect_and_assemble_aggregator import CollectAndAssembleAggregator
-from nvflare.app_common.executors.script_executor import ScriptExecutor
 from nvflare.app_common.shareablegenerators.full_model_shareable_generator import FullModelShareableGenerator
 from nvflare.app_common.workflows.scatter_and_gather import ScatterAndGather
 from nvflare.app_opt.sklearn.joblib_model_param_persistor import JoblibModelParamPersistor
 from nvflare.client.config import ExchangeFormat
+from nvflare.job_config.script_runner import ScriptRunner
 
 preprocess = True  # if False, assume data is already preprocessed and split
 
@@ -137,9 +137,9 @@ def split_higgs(input_data_path, input_header_path, output_dir, site_num, sample
 
     # Add clients
     for i in range(n_clients):
-        executor = ScriptExecutor(
-            task_script_path=train_script,
-            task_script_args=f"--data_root_dir {data_output_dir}",
+        executor = ScriptRunner(
+            script=train_script,
+            script_args=f"--data_root_dir {data_output_dir}",
             params_exchange_format=ExchangeFormat.RAW,  # kmeans requires raw values only rather than PyTorch Tensors (the default)
         )
         job.to(executor, f"site-{i+1}")  # HIGGs data splitter assumes site names start from 1

diff --git a/examples/advanced/job_api/tf/README.md b/examples/advanced/job_api/tf/README.md
@@ -19,10 +19,10 @@ In this example, the latest Client APIs were used to implement
 client-side training logics (details in file
 [`cifar10_tf_fl_alpha_split.py`](src/cifar10_tf_fl_alpha_split.py)),
 and the new
-[`FedJob`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/fed_job.py#L106)
+[`FedJob`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/api.py)
 APIs were used to programmatically set up an
 `nvflare` job to be exported or ran by simulator (details in file
-[`tf_fl_script_executor_cifar10.py`](tf_fl_script_executor_cifar10.py)),
+[`tf_fl_script_runner_cifar10.py`](tf_fl_script_runner_cifar10.py)),
 alleviating the need of writing job config files, simplifying
 development process.
 
@@ -41,7 +41,7 @@ pip install -r ./requirements.txt
 ## 2. Run experiments
 
 This example uses simulator to run all experiments. The script
-[`tf_fl_script_executor_cifar10.py`](tf_fl_script_executor_cifar10.py)
+[`tf_fl_script_runner_cifar10.py`](tf_fl_script_runner_cifar10.py)
 is the main script to be used to launch different experiments with
 different arguments (see sections below for details). A script
 [`run_jobs.sh`](run_jobs.sh) is also provided to run all experiments
@@ -55,7 +55,7 @@ any experiment, and you can use `Tensorboard` to visualize the
 training and validation process as the experiment runs. Data split
 files, summary logs and results will be saved in a workspace
 directory, which defaults to `/tmp` and can be configured by setting
-`--workspace` argument of the `tf_fl_script_executor_cifar10.py`
+`--workspace` argument of the `tf_fl_script_runner_cifar10.py`
 script.
 
 > [!WARNING]
@@ -82,7 +82,7 @@ To simulate a centralized training baseline, we run FedAvg algorithm
 with 1 client for 25 rounds, where each round consists of one single epoch.
 
 ```
-python ./tf_fl_script_executor_cifar10.py \
+python ./tf_fl_script_runner_cifar10.py \
        --algo centralized \
        --n_clients 1 \
        --num_rounds 25 \
@@ -101,7 +101,7 @@ in the centralized baseline above (50*4 divided by 8 clients is 25):
 ```
 for alpha in 1.0 0.5 0.3 0.1; do
 
-    python ./tf_fl_script_executor_cifar10.py \
+    python ./tf_fl_script_runner_cifar10.py \
        --algo fedavg \
        --n_clients 8 \
        --num_rounds 50 \
@@ -120,7 +120,7 @@ Next, let's try some different FL algorithms on a more heterogeneous split:
 side to update the global model from client-side gradients. Here we
 use SGD with momentum and cosine learning rate decay:
 ```
-python ./tf_fl_script_executor_cifar10.py \
+python ./tf_fl_script_runner_cifar10.py \
        --algo fedopt \
        --n_clients 8 \
        --num_rounds 50 \
@@ -130,7 +130,7 @@ python ./tf_fl_script_executor_cifar10.py \
 ```
 [FedProx](https://arxiv.org/abs/1812.06127) adds a regularizer to the loss:
 ```
-python ./tf_fl_script_executor_cifar10.py \
+python ./tf_fl_script_runner_cifar10.py \
        --algo fedprox \
        --n_clients 8 \
        --num_rounds 50 \
@@ -145,7 +145,7 @@ during local training following the
 described in [Li et al.](https://arxiv.org/abs/2102.02079)
 
 ```
-python ./tf_fl_script_executor_cifar10.py \
+python ./tf_fl_script_runner_cifar10.py \
        --algo scaffold \
        --n_clients 8 \
        --num_rounds 50 \

diff --git a/...b_api/tf/tf_fl_script_executor_cifar10.py → ...job_api/tf/tf_fl_script_runner_cifar10.py b/...b_api/tf/tf_fl_script_executor_cifar10.py → ...job_api/tf/tf_fl_script_runner_cifar10.py
@@ -21,8 +21,8 @@
 from src.tf_net import ModerateTFNet
 
 from nvflare import FedJob
-from nvflare.app_common.executors.script_executor import ScriptExecutor
 from nvflare.app_opt.tf.job_config.model import TFModel
+from nvflare.job_config.script_runner import ScriptRunner
 
 gpu_devices = tf.config.experimental.list_physical_devices("GPU")
 for device in gpu_devices:
@@ -156,7 +156,7 @@
     # Add clients
     for i, train_idx_path in enumerate(train_idx_paths):
         curr_task_script_args = task_script_args + f" --train_idx_path {train_idx_path}"
-        executor = ScriptExecutor(task_script_path=train_script, task_script_args=curr_task_script_args)
+        executor = ScriptRunner(script=train_script, script_args=curr_task_script_args)
         job.to(executor, f"site-{i+1}")
 
     # Can export current job to folder.

diff --git a/examples/getting_started/pt/README.md b/examples/getting_started/pt/README.md
@@ -4,65 +4,3 @@
 We provide several examples to quickly get you started using NVFlare's Job API. 
 All examples in this folder are based on using [PyTorch](https://pytorch.org/) as the model training framework.
 Furthermore, we support [PyTorch Lightning](https://lightning.ai).
-
-## Setup environment
-First, install nvflare and dependencies:
-```commandline
-pip install -r requirements.txt
-```
-
-## Tutorials
-A good starting point for understanding the Job API scripts and NVFlare components are the following tutorials.
-### 1. [Federated averaging using script executor](./nvflare_pt_getting_started.ipynb)
-Tutorial on [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html).
-
-### 2. [Federated averaging using script executor with Lightning API](./nvflare_lightning_getting_started.ipynb)
-Tutorial on [FedAvg](https://arxiv.org/abs/1602.05629) using the [Lightning Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html#id4)
-
-## Examples
-You can also run any of the below scripts directly using
-```commandline
-python "script_name.py"
-```
-### 1. [Federated averaging using script executor](./fedavg_script_executor_cifar10.py)
-Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html).
-```commandline
-python fedavg_script_executor_cifar10.py
-```
-### 2. [Federated averaging using script executor with Lightning API](./fedavg_script_executor_lightning_cifar10.py)
-Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Lightning Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html#id4)
-```commandline
-python fedavg_script_executor_lightning_cifar10.py
-```
-### 3. [Federated averaging using the script executor for all clients](./fedavg_script_executor_cifar10_all.py)
-Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html).
-Here, we deploy the same configuration to all clients.
-```commandline
-python fedavg_script_executor_cifar10_all.py
-```
-### 4. [Federated averaging using script executor and differential privacy filter](./fedavg_script_executor_dp_filter_cifar10.py)
-Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
-with additional [differential privacy filters](https://arxiv.org/abs/1910.00962) on the client side.
-```commandline
-python fedavg_script_executor_dp_filter_cifar10.py
-```
-### 5. [Swarm learning using script executor](./swarm_script_executor_cifar10.py)
-Implementation of [swarm learning](https://www.nature.com/articles/s41586-021-03583-3) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
-```commandline
-python swarm_script_executor_cifar10.py
-```
-### 6. [Cyclic weight transfer using script executor](./cyclic_cc_script_executor_cifar10.py)
-Implementation of [cyclic weight transfer](https://arxiv.org/abs/1709.05929) using the [Client API](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/client_api.html)
-```commandline
-python cyclic_cc_script_executor_cifar10.py
-```
-### 7. [Federated averaging using model learning](./fedavg_model_learner_xsite_val_cifar10.py))
-Implementation of [FedAvg](https://arxiv.org/abs/1602.05629) using the [model learner class](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type/model_learner.html),
-followed by [cross site validation](https://nvflare.readthedocs.io/en/main/programming_guide/controllers/cross_site_model_evaluation.html)
-for federated model evaluation.
-```commandline
-python fedavg_model_learner_xsite_val_cifar10.py
-```
-
-> [!NOTE]
-> More examples can be found at https://nvidia.github.io/NVFlare.
diff --git a/examples/getting_started/pt/nvflare_lightning_getting_started.ipynb b/examples/getting_started/pt/nvflare_lightning_getting_started.ipynb
@@ -330,7 +330,7 @@
    "outputs": [],
    "source": [
     "from nvflare import FedJob\n",
-    "from nvflare.app_common.executors.script_executor import ScriptExecutor\n",
+    "from nvflare.job_config.script_runner import ScriptRunner\n",
     "from nvflare.app_common.workflows.fedavg import FedAvg\n",
     "\n",
     "job = FedJob(name=\"cifar10_fedavg_lightning\")"
@@ -411,8 +411,8 @@
    "outputs": [],
    "source": [
     "for i in range(n_clients):\n",
-    "    executor = ScriptExecutor(\n",
-    "        task_script_path=\"src/cifar10_lightning_fl.py\", task_script_args=\"\"  # f\"--batch_size 32 --data_path /tmp/data/site-{i}\"\n",
+    "    executor = ScriptRunner(\n",
+    "        script=\"src/cifar10_lightning_fl.py\", script_args=\"\"  # f\"--batch_size 32 --data_path /tmp/data/site-{i}\"\n",
     "    )\n",
     "    job.to(executor, f\"site-{i+1}\")"
    ]