Skip to content

Commit

Permalink
Re-factor hello-numpy-cse example (#2880)
Browse files Browse the repository at this point in the history
  • Loading branch information
YuanTingHsieh authored Aug 30, 2024
1 parent 4be97a5 commit fc088c1
Show file tree
Hide file tree
Showing 27 changed files with 251 additions and 400 deletions.
25 changes: 0 additions & 25 deletions examples/hello-world/hello-cross-val/README.md

This file was deleted.

1 change: 0 additions & 1 deletion examples/hello-world/hello-cross-val/requirements.txt

This file was deleted.

80 changes: 57 additions & 23 deletions examples/hello-world/hello-numpy-cross-val/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,53 +2,87 @@

The cross-site model evaluation workflow uses the data from clients to run evaluation with the models of other clients. Data is not shared. Rather the collection of models is distributed to each client site to run local validation. The server collects the results of local validation to construct an all-to-all matrix of model performance vs. client dataset. It uses the [CrossSiteModelEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_model_eval.html) controller workflow.

> **_NOTE:_** This example uses a Numpy-based trainer and will generate its data within the code.

You can follow the [hello_world notebook](../hello_world.ipynb) or the following:

### 1. Install NVIDIA FLARE
## Installation

Follow the [Installation](../../getting_started/README.md) instructions.

### 2. Run the experiment
# Run training and cross site validation right after training

Use nvflare simulator to run the hello-examples:
This example uses a Numpy-based trainer to simulate the training
steps.

```
nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 hello-numpy-cross-val/jobs/hello-numpy-cross-val
We first perform FedAvg training and then conduct cross-site validation.

So you will see two workflows (ScatterAndGather and CrossSiteModelEval) are configured.

## 1. Prepare the job and run the experiment using simulator

We use Job API to generate the job and run the job using simulator:

```bash
python3 job_train_and_cse.py
```

### 3. Access the logs and results
## 2. Access the logs and results

You can find the running logs and results inside the simulator's workspace/simulate_job
You can find the running logs and results inside the simulator's workspace:

```bash
$ ls /tmp/nvflare/simulate_job/
app_server app_site-1 app_site-2 log.txt
$ ls /tmp/nvflare/jobs/workdir/
server/ site-1/ site-2/ startup/
```

The cross site validation results:

```bash
$ cat /tmp/nvflare/jobs/workdir/server/simulate_job/cross_site_val/cross_val_results.json
```

# Run cross site validation using the previous trained results
# Run cross site evaluation using the previous trained results

## Introduction
We can also run cross-site evaluation without the training workflow, making use of the previous results or just want to evaluate on the pretrained models.

The "hello-numpy-cross-val-only" and "hello-numpy-cross-val-only-list-models" jobs show how to run the NVFlare cross-site validation without the training workflow, making use of the previous run results. The first one uses the default single server model. The second enables a list of server models. You can provide / use your own previous trained models for the cross-validation.
You can provide / use your own pretrained models for the cross-site evaluation.

### Generate the previous run best global model and local best model
## 1. Generate the pretrained model

Run the following command to generate the pre-trained models:
In reality, users would use any training workflows to obtain these pretrained models

To mimic that, run the following command to generate the pre-trained models:

```bash
python3 generate_pretrain_models.py
```
python pre_train_models.py
```

### How to run the Job
## 2. Prepare the job and run the experiment using simulator

Note that our pretrained models is generated under:

```python
SERVER_MODEL_DIR = "/tmp/nvflare/server_pretrain_models"
CLIENT_MODEL_DIR = "/tmp/nvflare/client_pretrain_models"
```

Define two OS system variable "SERVER_MODEL_DIR" and "CLIENT_MODEL_DIR" to point to the absolute path of the server best model and local best model location respectively. Then use the NVFlare admin command "submit_job" to submit and run the cross-validation job.
In our job_cse.py we also specify that.

For example, define the system variable "SERVER_MODEL_DIR" like this:
Then we can use Job API to generate the job and run it using simulator:

```bash
python3 job_cse.py
```
export SERVER_MODEL_DIR="/path/to/model/location/at/server-side"

## 3. Access the logs and results

You can find the running logs and results inside the simulator's workspace:

```bash
$ ls /tmp/nvflare/jobs/workdir/
server/ site-1/ site-2/ startup/
```

The cross site validation results:

```bash
$ cat /tmp/nvflare/jobs/workdir/server/simulate_job/cross_site_val/cross_val_results.json
```
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -16,29 +16,24 @@

import numpy as np

from nvflare.app_common.abstract.model import ModelLearnableKey, make_model_learnable
from nvflare.app_common.np.constants import NPConstants
SERVER_MODEL_DIR = "/tmp/nvflare/server_pretrain_models"
CLIENT_MODEL_DIR = "/tmp/nvflare/client_pretrain_models"


def _save_model(model_data, model_dir: str, model_file: str):
if not os.path.exists(model_dir):
os.makedirs(model_dir)
model_path = os.path.join(model_dir, model_file)
np.save(model_path, model_data)

SERVER_MODEL_DIR = "models/server"
CLIENT_MODEL_DIR = "models/client"

if __name__ == "__main__":
"""
This is the tool to generate the pre-trained models for demonstrating the cross-validation without training.
"""

model_data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32)
model_learnable = make_model_learnable(weights={NPConstants.NUMPY_KEY: model_data}, meta_props={})

working_dir = os.getcwd()
model_dir = os.path.join(working_dir, SERVER_MODEL_DIR)
if not os.path.exists(model_dir):
os.makedirs(model_dir)
model_path = os.path.join(model_dir, "server.npy")
np.save(model_path, model_learnable[ModelLearnableKey.WEIGHTS][NPConstants.NUMPY_KEY])

model_dir = os.path.join(working_dir, CLIENT_MODEL_DIR)
if not os.path.exists(model_dir):
os.makedirs(model_dir)
model_save_path = os.path.join(model_dir, "best_numpy.npy")
np.save(model_save_path, model_data)
_save_model(model_data=model_data, model_dir=SERVER_MODEL_DIR, model_file="server_1.npy")
_save_model(model_data=model_data, model_dir=SERVER_MODEL_DIR, model_file="server_2.npy")
_save_model(model_data=model_data, model_dir=CLIENT_MODEL_DIR, model_file="best_numpy.npy")
63 changes: 63 additions & 0 deletions examples/hello-world/hello-numpy-cross-val/job_cse.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


from nvflare import FedJob
from nvflare.app_common.app_constant import AppConstants
from nvflare.app_common.np.np_formatter import NPFormatter
from nvflare.app_common.np.np_model_locator import NPModelLocator
from nvflare.app_common.np.np_trainer import NPTrainer
from nvflare.app_common.np.np_validator import NPValidator
from nvflare.app_common.widgets.validation_json_generator import ValidationJsonGenerator
from nvflare.app_common.workflows.cross_site_model_eval import CrossSiteModelEval

SERVER_MODEL_DIR = "/tmp/nvflare/server_pretrain_models"
CLIENT_MODEL_DIR = "/tmp/nvflare/client_pretrain_models"


if __name__ == "__main__":
n_clients = 2

job = FedJob(name="hello-numpy-cse", min_clients=n_clients)

model_locator_id = job.to_server(
NPModelLocator(
model_dir="/tmp/nvflare/server_pretrain_models",
model_name={"server_model_1": "server_1.npy", "server_model_2": "server_2.npy"},
)
)
formatter_id = job.to_server(NPFormatter())
job.to_server(ValidationJsonGenerator())

# Define the controller workflow and send to server
controller = CrossSiteModelEval(
model_locator_id=model_locator_id,
formatter_id=formatter_id,
)
job.to_server(controller)

# Add clients
trainer = NPTrainer(
train_task_name=AppConstants.TASK_TRAIN,
submit_model_task_name=AppConstants.TASK_SUBMIT_MODEL,
model_dir="/tmp/nvflare/client_pretrain_models",
)
job.to_clients(trainer, tasks=[AppConstants.TASK_SUBMIT_MODEL])
validator = NPValidator(
validate_task_name=AppConstants.TASK_VALIDATION,
)
job.to_clients(validator, tasks=[AppConstants.TASK_VALIDATION])

job.export_job("/tmp/nvflare/jobs")
job.simulator_run("/tmp/nvflare/jobs/workdir", gpu="0", n_clients=n_clients)
69 changes: 69 additions & 0 deletions examples/hello-world/hello-numpy-cross-val/job_train_and_cse.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


from nvflare import FedJob
from nvflare.apis.dxo import DataKind
from nvflare.app_common.aggregators.intime_accumulate_model_aggregator import InTimeAccumulateWeightedAggregator
from nvflare.app_common.app_constant import AppConstants
from nvflare.app_common.np.np_formatter import NPFormatter
from nvflare.app_common.np.np_model_locator import NPModelLocator
from nvflare.app_common.np.np_model_persistor import NPModelPersistor
from nvflare.app_common.np.np_trainer import NPTrainer
from nvflare.app_common.np.np_validator import NPValidator
from nvflare.app_common.shareablegenerators.full_model_shareable_generator import FullModelShareableGenerator
from nvflare.app_common.widgets.validation_json_generator import ValidationJsonGenerator
from nvflare.app_common.workflows.cross_site_model_eval import CrossSiteModelEval
from nvflare.app_common.workflows.scatter_and_gather import ScatterAndGather

if __name__ == "__main__":
n_clients = 2
num_rounds = 1

job = FedJob(name="hello-numpy-cse", min_clients=n_clients)

persistor_id = job.to_server(NPModelPersistor())
aggregator_id = job.to_server(InTimeAccumulateWeightedAggregator(expected_data_kind=DataKind.WEIGHTS))
shareable_generator_id = job.to_server(FullModelShareableGenerator())
model_locator_id = job.to_server(NPModelLocator())
formatter_id = job.to_server(NPFormatter())
job.to_server(ValidationJsonGenerator())

# Define the controller workflow and send to server
controller = ScatterAndGather(
min_clients=n_clients,
num_rounds=num_rounds,
persistor_id=persistor_id,
aggregator_id=aggregator_id,
shareable_generator_id=shareable_generator_id,
)
job.to_server(controller)

# Define the controller workflow and send to server
controller = CrossSiteModelEval(model_locator_id=model_locator_id, formatter_id=formatter_id)
job.to_server(controller)

# Add clients
trainer = NPTrainer(
train_task_name=AppConstants.TASK_TRAIN,
submit_model_task_name=AppConstants.TASK_SUBMIT_MODEL,
)
job.to_clients(trainer, tasks=[AppConstants.TASK_TRAIN, AppConstants.TASK_SUBMIT_MODEL])
validator = NPValidator(
validate_task_name=AppConstants.TASK_VALIDATION,
)
job.to_clients(validator, tasks=[AppConstants.TASK_VALIDATION])

job.export_job("/tmp/nvflare/jobs")
job.simulator_run("/tmp/nvflare/jobs/workdir", gpu="0", n_clients=n_clients)

This file was deleted.

This file was deleted.

Loading

0 comments on commit fc088c1

Please sign in to comment.