Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add back hello-numpy-sag and update references #2816

Merged
merged 4 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,11 +76,11 @@ When you open a notebook, select the kernel `nvflare_example` using the dropdown
| Example | Framework | Summary |
|----------------------------------------------------------------------------------------------------------------------------------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Notebook for Hello Examples](./hello-world/hello_world.ipynb) | - | Notebook for examples below. |
| [Hello Scatter and Gather](./hello-world/hello-numpy-sag/README.md) | Numpy | Example using [ScatterAndGather](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.scatter_and_gather.html) controller workflow. |
| [Hello Cross-Site Validation](./hello-world/hello-numpy-cross-val/README.md) | Numpy | Example using [CrossSiteModelEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_model_eval.html) controller workflow, and example using previous results without training workflow. |
| [Hello FedAvg NumPy](./hello-world/hello-fedavg-numpy/README.md) | Numpy | Example using [FedAvg](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.fedavg.html) controller workflow. |
| [Hello Cross-Site Validation](./hello-world/hello-cross-val/README.md) | Numpy | Example using [CrossSiteEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_eval.html) controller workflow, and example using previous results without training workflow. |
| [Hello Cyclic Weight Transfer](./hello-world/hello-cyclic/README.md) | PyTorch | Example using [CyclicController](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cyclic_ctl.html) controller workflow to implement [Cyclic Weight Transfer](https://pubmed.ncbi.nlm.nih.gov/29617797/). |
| [Hello PyTorch](./hello-world/hello-pt/README.md) | PyTorch | Example using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [PyTorch](https://pytorch.org/) as the deep learning training framework. |
| [Hello TensorFlow](./hello-world/hello-tf2/README.md) | TensorFlow2 | Example of using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [TensorFlow](https://tensorflow.org/) as the deep learning training framework. |
| [Hello TensorFlow](./hello-world/hello-tf/README.md) | TensorFlow | Example of using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [TensorFlow](https://tensorflow.org/) as the deep learning training framework. |

## 2. Step-by-Step Examples
| Example | Dataset | Controller-Type | Execution API Type | Framework | Summary |
Expand Down
32 changes: 32 additions & 0 deletions examples/hello-world/hello-numpy-sag/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Hello Numpy Scatter and Gather

"[Scatter and Gather](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.scatter_and_gather.html)" is the standard workflow to implement Federated Averaging ([FedAvg](https://arxiv.org/abs/1602.05629)).
This workflow follows the hub and spoke model for communicating the global model to each client for local training (i.e., "scattering") and aggregates the result to perform the global model update (i.e., "gathering").

> **_NOTE:_** This example uses a Numpy-based trainer and will generate its data within the code.

You can follow the [hello_world notebook](../hello_world.ipynb) or the following:

### 1. Install NVIDIA FLARE

Follow the [Installation](https://nvflare.readthedocs.io/en/main/quickstart.html) instructions.

### 2. Run the experiment

Use nvflare simulator to run the hello-examples:

```
nvflare simulator -w /tmp/nvflare/hello-numpy-sag -n 2 -t 2 hello-world/hello-numpy-sag/jobs/hello-numpy-sag
```

### 3. Access the logs and results

You can find the running logs and results inside the simulator's workspace/simulate_job

```bash
$ ls /tmp/nvflare/hello-numpy-sag/simulate_job/
app_server app_site-1 app_site-2 log.txt model models

```

For how to use the FLARE API to run this app, see [this notebook](hello_numpy_sag.ipynb).
207 changes: 207 additions & 0 deletions examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "e129ede5",
"metadata": {},
"source": [
" # Hello Numpy SAG"
]
},
{
"cell_type": "markdown",
"id": "9bf7e391",
"metadata": {},
"source": [
"In this notebook, Hello Numpy SAG is run with the FLARE API to execute commands for submitting the job and following along to see the progress."
]
},
{
"cell_type": "markdown",
"id": "bbca0050",
"metadata": {},
"source": [
"### 1. Install NVIDIA FLARE\n",
"\n",
"Follow the [Installation](https://nvflare.readthedocs.io/en/main/getting_started.html#installation) instructions to set up an environment that has NVIDIA FLARE installed if you do not have one already. You will need an environment to run a provisioned FL system."
]
},
{
"cell_type": "markdown",
"id": "e5d7e675",
"metadata": {},
"source": [
"### 2. Provision and Start FL System\n",
"\n",
"In the rest of this example, we assume that 'nvflare provision' has been run in a workspace (set to '/workspace' below, but you can change this to the location you run provision from) to set up a project named `hello-example` with a server and two clients. Feel free to use an existing provisioned NVFLARE project if you have that available, or to try things out, you could set up and start a system in [POC mode](https://nvflare.readthedocs.io/en/main/getting_started.html#setting-up-the-application-environment-in-poc-mode).\n",
"\n",
"Use the 'start.sh' scripts to start the server and clients in seperate terminals to start the system."
]
},
{
"cell_type": "markdown",
"id": "6fe3165d",
"metadata": {},
"source": [
"\n",
"### 3. Connect to the FL System with the FLARE API\n",
"\n",
"Use `new_secure_session()` to initiate a session connecting to the FL Server with the FLARE API. The necessary arguments are the username of the admin user you are using and the corresponding startup kit location.\n",
"\n",
"In the code example below, we get the `admin_user_dir` by concatenating the workspace root with the default directories that are created if you provision a project with a given project name. You can change the values to what applies to your system if needed.\n",
"\n",
"Note that if debug mode is not enabled, there is no output after initiating a session successfully, so instead we print the output of `get_system_info()`. If you are unable to connect and initiate a session, make sure that your FL Server is running and that the configurations are correct with the right path to the admin startup kit directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c3dbde69",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"from nvflare.fuel.flare_api.flare_api import new_secure_session\n",
"\n",
"project_name = \"example_project\"\n",
"username = \"admin@nvidia.com\"\n",
"workspace_root = \"/tmp/nvflare/poc\"\n",
"admin_user_dir = os.path.join(workspace_root, project_name, \"prod_00\", username)\n",
"\n",
"sess = new_secure_session(\n",
" username=username,\n",
" startup_kit_location=admin_user_dir\n",
")\n",
"print(sess.get_system_info())"
]
},
{
"cell_type": "markdown",
"id": "405edb37",
"metadata": {},
"source": [
"### 4. Submit the Job with the FLARE API\n",
"\n",
"With a session successfully connected, you can use `submit_job()` to submit your job. You can change `path_to_example_job` to the location of the job you are submitting. If your session is not active, go back to the previous step and connect with a session.\n",
"\n",
"With POC command, we link the examples to the following directory ``` /tmp/nvflare/poc/example_project/prod_00/admin@nvidia.com/transfer```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3589b60-434b-4b6d-97bc-74e95bbc7b52",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"ls -l /tmp/nvflare/poc/example_project/prod_00/admin@nvidia.com/transfer\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c8f08cef",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"path_to_example_job = \"hello-world/hello-numpy-sag/jobs/hello-numpy-sag\"\n",
"job_id = sess.submit_job(path_to_example_job)\n",
"print(job_id + \" was submitted\")"
]
},
{
"cell_type": "markdown",
"id": "42317cf3",
"metadata": {},
"source": [
"### 5. After Submitting the Job\n",
"\n",
"You should be able to see the output in the terminals where you are running your FL Server and Clients when you submitted the job. You can also use `monitor_job()` to follow along and give you updates on the progress until the job is done.\n",
"\n",
"By default, `monitor_job()` only has one required arguement, the `job_id` of the job you are waiting for, and the default behavior is to wait until the job is complete before returning a Return Code of `JOB_FINISHED`.\n",
"\n",
"In order to follow along and see a more meaningful result, the following cell contains the `basic_cb_with_print` callback that keeps track of the number of times the callback is run and prints the `job_meta` the first three times and the final time before `monitor_job()` completes with every other call just printing a dot to save output space. This callback improves the output and is just an example of what can be done with additional arguments and the `job_meta` information of the job that is being monitored."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "03fd93d0",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from nvflare.fuel.flare_api.flare_api import Session, basic_cb_with_print\n",
"\n",
"\n",
"sess.monitor_job(job_id, cb=basic_cb_with_print, cb_run_counter={\"count\":0})"
]
},
{
"cell_type": "markdown",
"id": "31ccb6a6",
"metadata": {},
"source": [
"### 6. Shutting Down the FL System\n",
"\n",
"As of now, there is no specific FLARE API command for shutting down the FL system, but the FLARE API can use the `do_command()` function of the underlying AdminAPI to submit any commands that the FLARE Console supports including shutdown commands to the clients and server:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b0d8aa9c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"print(sess.api.do_command(\"shutdown client\"))\n",
"print(sess.api.do_command(\"shutdown server\"))\n",
"\n",
"sess.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "331c0ba2-8abe-47a3-a864-18dcb7489a44",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "nvflare_example",
"language": "python",
"name": "nvflare_example"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.18"
},
"vscode": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"format_version": 2,
"executors": [
{
"tasks": [
"train"
],
"executor": {
"path": "nvflare.app_common.np.np_trainer.NPTrainer",
"args": {}
}
}
],
"task_result_filters": [],
"task_data_filters": [],
"components": []
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"format_version": 2,
"server": {
"heart_beat_timeout": 600
},
"task_data_filters": [],
"task_result_filters": [],
"components": [
{
"id": "persistor",
"path": "nvflare.app_common.np.np_model_persistor.NPModelPersistor",
"args": {}
},
{
"id": "shareable_generator",
"path": "nvflare.app_common.shareablegenerators.full_model_shareable_generator.FullModelShareableGenerator",
"args": {}
},
{
"id": "aggregator",
"path": "nvflare.app_common.aggregators.intime_accumulate_model_aggregator.InTimeAccumulateWeightedAggregator",
"args": {
"expected_data_kind": "WEIGHTS",
"aggregation_weights": {
"site-1": 1.0,
"site-2": 1.0
}
}
}
],
"workflows": [
{
"id": "scatter_and_gather",
"path": "nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather",
"args": {
"min_clients": 2,
"num_rounds": 3,
"start_round": 0,
"wait_time_after_min_received": 10,
"aggregator_id": "aggregator",
"persistor_id": "persistor",
"shareable_generator_id": "shareable_generator",
"train_task_name": "train",
"train_timeout": 6000
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"name": "hello-numpy-sag",
"resource_spec": {},
"min_clients" : 2,
"deploy_map": {
"app": [
"@ALL"
]
}
}
1 change: 1 addition & 0 deletions examples/hello-world/hello-numpy-sag/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
nvflare~=2.4.0rc
nvkevlu marked this conversation as resolved.
Show resolved Hide resolved
Loading
Loading