Re-factor hello-numpy-cse example (#2880)

NVIDIA · Aug 30, 2024 · fc088c1 · fc088c1
1 parent 4be97a5
commit fc088c1
Show file tree

Hide file tree

Showing 27 changed files with 251 additions and 400 deletions.
diff --git a/examples/hello-world/hello-cross-val/README.md b/examples/hello-world/hello-cross-val/README.md
diff --git a/examples/hello-world/hello-cross-val/requirements.txt b/examples/hello-world/hello-cross-val/requirements.txt
diff --git a/examples/hello-world/hello-numpy-cross-val/README.md b/examples/hello-world/hello-numpy-cross-val/README.md
@@ -2,53 +2,87 @@
 
 The cross-site model evaluation workflow uses the data from clients to run evaluation with the models of other clients. Data is not shared. Rather the collection of models is distributed to each client site to run local validation. The server collects the results of local validation to construct an all-to-all matrix of model performance vs. client dataset. It uses the [CrossSiteModelEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_model_eval.html) controller workflow.
 
-> **_NOTE:_** This example uses a Numpy-based trainer and will generate its data within the code.
 
-You can follow the [hello_world notebook](../hello_world.ipynb) or the following:
-
-### 1. Install NVIDIA FLARE
+## Installation
 
 Follow the [Installation](../../getting_started/README.md) instructions.
 
-### 2. Run the experiment
+# Run training and cross site validation right after training
 
-Use nvflare simulator to run the hello-examples:
+This example uses a Numpy-based trainer to simulate the training
+steps.
 
-```
-nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 hello-numpy-cross-val/jobs/hello-numpy-cross-val
+We first perform FedAvg training and then conduct cross-site validation.
+
+So you will see two workflows (ScatterAndGather and CrossSiteModelEval) are configured.
+
+## 1. Prepare the job and run the experiment using simulator
+
+We use Job API to generate the job and run the job using simulator:
+
+```bash
+python3 job_train_and_cse.py
 ```
 
-### 3. Access the logs and results
+## 2. Access the logs and results
 
-You can find the running logs and results inside the simulator's workspace/simulate_job
+You can find the running logs and results inside the simulator's workspace:
 
 ```bash
-$ ls /tmp/nvflare/simulate_job/
-app_server  app_site-1  app_site-2  log.txt
+$ ls /tmp/nvflare/jobs/workdir/
+server/  site-1/  site-2/  startup/
+```
+
+The cross site validation results:
 
+```bash
+$ cat /tmp/nvflare/jobs/workdir/server/simulate_job/cross_site_val/cross_val_results.json
 ```
 
-# Run cross site validation using the previous trained results
+# Run cross site evaluation using the previous trained results
 
-## Introduction
+We can also run cross-site evaluation without the training workflow, making use of the previous results or just want to evaluate on the pretrained models.
 
-The "hello-numpy-cross-val-only" and "hello-numpy-cross-val-only-list-models" jobs show how to run the NVFlare cross-site validation without the training workflow, making use of the previous run results. The first one uses the default single server model. The second enables a list of server models. You can provide / use your own previous trained models for the cross-validation.
+You can provide / use your own pretrained models for the cross-site evaluation.
 
-### Generate the previous run best global model and local best model
+## 1. Generate the pretrained model
 
-Run the following command to generate the pre-trained models:
+In reality, users would use any training workflows to obtain these pretrained models
 
+To mimic that, run the following command to generate the pre-trained models:
+
+```bash
+python3 generate_pretrain_models.py
 ```
-python pre_train_models.py 
-```
 
-### How to run the Job
+## 2. Prepare the job and run the experiment using simulator
+
+Note that our pretrained models is generated under:
+
+```python
+SERVER_MODEL_DIR = "/tmp/nvflare/server_pretrain_models"
+CLIENT_MODEL_DIR = "/tmp/nvflare/client_pretrain_models"
+```
 
-Define two OS system variable "SERVER_MODEL_DIR" and "CLIENT_MODEL_DIR" to point to the absolute path of the server best model and local best model location respectively. Then use the NVFlare admin command "submit_job" to submit and run the cross-validation job.
+In our job_cse.py we also specify that.
 
-For example, define the system variable "SERVER_MODEL_DIR" like this:
+Then we can use Job API to generate the job and run it using simulator:
 
+```bash
+python3 job_cse.py
 ```
-export SERVER_MODEL_DIR="/path/to/model/location/at/server-side"
+
+## 3. Access the logs and results
+
+You can find the running logs and results inside the simulator's workspace:
+
+```bash
+$ ls /tmp/nvflare/jobs/workdir/
+server/  site-1/  site-2/  startup/
 ```
 
+The cross site validation results:
+
+```bash
+$ cat /tmp/nvflare/jobs/workdir/server/simulate_job/cross_site_val/cross_val_results.json
+```
diff --git a/...-numpy-cross-val-only/pre_train_models.py → ...mpy-cross-val/generate_pretrain_models.py b/...-numpy-cross-val-only/pre_train_models.py → ...mpy-cross-val/generate_pretrain_models.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2022, NVIDIA CORPORATION.  All rights reserved.
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -16,29 +16,24 @@
 
 import numpy as np
 
-from nvflare.app_common.abstract.model import ModelLearnableKey, make_model_learnable
-from nvflare.app_common.np.constants import NPConstants
+SERVER_MODEL_DIR = "/tmp/nvflare/server_pretrain_models"
+CLIENT_MODEL_DIR = "/tmp/nvflare/client_pretrain_models"
+
+
+def _save_model(model_data, model_dir: str, model_file: str):
+    if not os.path.exists(model_dir):
+        os.makedirs(model_dir)
+    model_path = os.path.join(model_dir, model_file)
+    np.save(model_path, model_data)
 
-SERVER_MODEL_DIR = "models/server"
-CLIENT_MODEL_DIR = "models/client"
 
 if __name__ == "__main__":
     """
     This is the tool to generate the pre-trained models for demonstrating the cross-validation without training.
     """
 
     model_data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32)
-    model_learnable = make_model_learnable(weights={NPConstants.NUMPY_KEY: model_data}, meta_props={})
 
-    working_dir = os.getcwd()
-    model_dir = os.path.join(working_dir, SERVER_MODEL_DIR)
-    if not os.path.exists(model_dir):
-        os.makedirs(model_dir)
-    model_path = os.path.join(model_dir, "server.npy")
-    np.save(model_path, model_learnable[ModelLearnableKey.WEIGHTS][NPConstants.NUMPY_KEY])
-
-    model_dir = os.path.join(working_dir, CLIENT_MODEL_DIR)
-    if not os.path.exists(model_dir):
-        os.makedirs(model_dir)
-    model_save_path = os.path.join(model_dir, "best_numpy.npy")
-    np.save(model_save_path, model_data)
+    _save_model(model_data=model_data, model_dir=SERVER_MODEL_DIR, model_file="server_1.npy")
+    _save_model(model_data=model_data, model_dir=SERVER_MODEL_DIR, model_file="server_2.npy")
+    _save_model(model_data=model_data, model_dir=CLIENT_MODEL_DIR, model_file="best_numpy.npy")
diff --git a/examples/hello-world/hello-numpy-cross-val/job_cse.py b/examples/hello-world/hello-numpy-cross-val/job_cse.py
@@ -0,0 +1,63 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from nvflare import FedJob
+from nvflare.app_common.app_constant import AppConstants
+from nvflare.app_common.np.np_formatter import NPFormatter
+from nvflare.app_common.np.np_model_locator import NPModelLocator
+from nvflare.app_common.np.np_trainer import NPTrainer
+from nvflare.app_common.np.np_validator import NPValidator
+from nvflare.app_common.widgets.validation_json_generator import ValidationJsonGenerator
+from nvflare.app_common.workflows.cross_site_model_eval import CrossSiteModelEval
+
+SERVER_MODEL_DIR = "/tmp/nvflare/server_pretrain_models"
+CLIENT_MODEL_DIR = "/tmp/nvflare/client_pretrain_models"
+
+
+if __name__ == "__main__":
+    n_clients = 2
+
+    job = FedJob(name="hello-numpy-cse", min_clients=n_clients)
+
+    model_locator_id = job.to_server(
+        NPModelLocator(
+            model_dir="/tmp/nvflare/server_pretrain_models",
+            model_name={"server_model_1": "server_1.npy", "server_model_2": "server_2.npy"},
+        )
+    )
+    formatter_id = job.to_server(NPFormatter())
+    job.to_server(ValidationJsonGenerator())
+
+    # Define the controller workflow and send to server
+    controller = CrossSiteModelEval(
+        model_locator_id=model_locator_id,
+        formatter_id=formatter_id,
+    )
+    job.to_server(controller)
+
+    # Add clients
+    trainer = NPTrainer(
+        train_task_name=AppConstants.TASK_TRAIN,
+        submit_model_task_name=AppConstants.TASK_SUBMIT_MODEL,
+        model_dir="/tmp/nvflare/client_pretrain_models",
+    )
+    job.to_clients(trainer, tasks=[AppConstants.TASK_SUBMIT_MODEL])
+    validator = NPValidator(
+        validate_task_name=AppConstants.TASK_VALIDATION,
+    )
+    job.to_clients(validator, tasks=[AppConstants.TASK_VALIDATION])
+
+    job.export_job("/tmp/nvflare/jobs")
+    job.simulator_run("/tmp/nvflare/jobs/workdir", gpu="0", n_clients=n_clients)
diff --git a/examples/hello-world/hello-numpy-cross-val/job_train_and_cse.py b/examples/hello-world/hello-numpy-cross-val/job_train_and_cse.py
@@ -0,0 +1,69 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from nvflare import FedJob
+from nvflare.apis.dxo import DataKind
+from nvflare.app_common.aggregators.intime_accumulate_model_aggregator import InTimeAccumulateWeightedAggregator
+from nvflare.app_common.app_constant import AppConstants
+from nvflare.app_common.np.np_formatter import NPFormatter
+from nvflare.app_common.np.np_model_locator import NPModelLocator
+from nvflare.app_common.np.np_model_persistor import NPModelPersistor
+from nvflare.app_common.np.np_trainer import NPTrainer
+from nvflare.app_common.np.np_validator import NPValidator
+from nvflare.app_common.shareablegenerators.full_model_shareable_generator import FullModelShareableGenerator
+from nvflare.app_common.widgets.validation_json_generator import ValidationJsonGenerator
+from nvflare.app_common.workflows.cross_site_model_eval import CrossSiteModelEval
+from nvflare.app_common.workflows.scatter_and_gather import ScatterAndGather
+
+if __name__ == "__main__":
+    n_clients = 2
+    num_rounds = 1
+
+    job = FedJob(name="hello-numpy-cse", min_clients=n_clients)
+
+    persistor_id = job.to_server(NPModelPersistor())
+    aggregator_id = job.to_server(InTimeAccumulateWeightedAggregator(expected_data_kind=DataKind.WEIGHTS))
+    shareable_generator_id = job.to_server(FullModelShareableGenerator())
+    model_locator_id = job.to_server(NPModelLocator())
+    formatter_id = job.to_server(NPFormatter())
+    job.to_server(ValidationJsonGenerator())
+
+    # Define the controller workflow and send to server
+    controller = ScatterAndGather(
+        min_clients=n_clients,
+        num_rounds=num_rounds,
+        persistor_id=persistor_id,
+        aggregator_id=aggregator_id,
+        shareable_generator_id=shareable_generator_id,
+    )
+    job.to_server(controller)
+
+    # Define the controller workflow and send to server
+    controller = CrossSiteModelEval(model_locator_id=model_locator_id, formatter_id=formatter_id)
+    job.to_server(controller)
+
+    # Add clients
+    trainer = NPTrainer(
+        train_task_name=AppConstants.TASK_TRAIN,
+        submit_model_task_name=AppConstants.TASK_SUBMIT_MODEL,
+    )
+    job.to_clients(trainer, tasks=[AppConstants.TASK_TRAIN, AppConstants.TASK_SUBMIT_MODEL])
+    validator = NPValidator(
+        validate_task_name=AppConstants.TASK_VALIDATION,
+    )
+    job.to_clients(validator, tasks=[AppConstants.TASK_VALIDATION])
+
+    job.export_job("/tmp/nvflare/jobs")
+    job.simulator_run("/tmp/nvflare/jobs/workdir", gpu="0", n_clients=n_clients)
diff --git a/...y-cross-val/jobs/hello-numpy-cross-val-only-list-models/app/config/config_fed_client.json b/...y-cross-val/jobs/hello-numpy-cross-val-only-list-models/app/config/config_fed_client.json
diff --git a/...y-cross-val/jobs/hello-numpy-cross-val-only-list-models/app/config/config_fed_server.json b/...y-cross-val/jobs/hello-numpy-cross-val-only-list-models/app/config/config_fed_server.json