-
Notifications
You must be signed in to change notification settings - Fork 175
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add CIFAR 10 examples for Tensorflow-based FedAvg & FedOpt (#2704)
* add alpha splitting * run experiments * add tensorboard writers; increase model size * fedopt version * add fedprox loss and callback * Update ModerateTFNet to match CIFAR10 torch implementation. * Fix multiprocessing GPU init error. Handle no alpha split case. * Add preprocessing to match torch CIFAR10 result. * Unify executor script for different algos. * Remove unused codes. * Add preprocessing steps to make TF results on par with torch examples. * Fix script executor args. * Add script to run all experiments. * Add README. * Fix graphs in README. * Modify TF FedOpt controller. * Update README and FedOpt result. * Remove duplicated flare init. * Fix result graph for centralized vs FedAvg. * Fix README re. alpha value for centralized training. * Improve README. * Add workspace arg. Change min_clients to num_clients. * Add warning on TF GPU vRAM allocation. * Clean up TB summary logs. * Remove FedProx which will be implemented in another PR. * Update notebook & README, re-add missing file. * Update license header. * Re-include missing script. * Remove change in torch example script. * Fix flake8, black and isort format issues. --------- Co-authored-by: Holger Roth <hroth@nvidia.com> Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
- Loading branch information
1 parent
236f4a0
commit d9ef041
Showing
13 changed files
with
932 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
# Simulated Federated Learning with CIFAR10 Using Tensorflow | ||
|
||
This example shows `Tensorflow`-based classic Federated Learning | ||
algorithms, namely FedAvg and FedOpt on CIFAR10 | ||
dataset. This example is analogous to [the example using `Pytorch` | ||
backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim) | ||
on the same dataset, where same experiments | ||
were conducted and analyzed. You should expect the same | ||
experimental results when comparing this example with the `Pytorch` one. | ||
|
||
In this example, the latest Client APIs were used to implement | ||
client-side training logics (details in file | ||
[`cifar10_tf_fl_alpha_split.py`](src/cifar10_tf_fl_alpha_split.py)), | ||
and the new | ||
[`FedJob`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/fed_job.py#L106) | ||
APIs were used to programmatically set up an | ||
`nvflare` job to be exported or ran by simulator (details in file | ||
[`tf_fl_script_executor_cifar10.py`](tf_fl_script_executor_cifar10.py)), | ||
alleviating the need of writing job config files, simplifying | ||
development process. | ||
|
||
Before continuing with the following sections, you can first refer to | ||
the [getting started notebook](nvflare_tf_getting_started.ipynb) | ||
included under this folder, to learn more about the implementation | ||
details, with an example walkthrough of FedAvg using a small | ||
Tensorflow model. | ||
|
||
## 1. Install requirements | ||
|
||
Install required packages | ||
``` | ||
pip install --upgrade pip | ||
pip install -r ./requirements.txt | ||
``` | ||
|
||
> **_NOTE:_** We recommend either using a containerized deployment or virtual environment, | ||
> please refer to [getting started](https://nvflare.readthedocs.io/en/latest/getting_started.html). | ||
|
||
## 2. Run experiments | ||
|
||
This example uses simulator to run all experiments. The script | ||
[`tf_fl_script_executor_cifar10.py`](tf_fl_script_executor_cifar10.py) | ||
is the main script to be used to launch different experiments with | ||
different arguments (see sections below for details). A script | ||
[`run_jobs.sh`](run_jobs.sh) is also provided to run all experiments | ||
described below at once: | ||
``` | ||
bash ./run_jobs.sh | ||
``` | ||
The CIFAR10 dataset will be downloaded when running any experiment for | ||
the first time. `Tensorboard` summary logs will be generated during | ||
any experiment, and you can use `Tensorboard` to visualize the | ||
training and validation process as the experiment runs. Data split | ||
files, summary logs and results will be saved in a workspace | ||
directory, which defaults to `/tmp` and can be configured by setting | ||
`--workspace` argument of the `tf_fl_script_executor_cifar10.py` | ||
script. | ||
|
||
> [!WARNING] | ||
> If you are using GPU, make sure to set the following | ||
> environment variables before running a training job, to prevent | ||
> `Tensoflow` from allocating full GPU memory all at once: | ||
> `export TF_FORCE_GPU_ALLOW_GROWTH=true && export | ||
> TF_GPU_ALLOCATOR=cuda_malloc_asyncp` | ||
The set-up of all experiments in this example are kept the same as | ||
[the example using `Pytorch` | ||
backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim). Refer | ||
to the `Pytorch` example for more details. Similar to the Pytorch | ||
example, we here also use Dirichelet sampling on CIFAR10 data labels | ||
to simulate data heterogeneity among data splits for different client | ||
sites, controlled by an alpha value, ranging from 0 (not including 0) | ||
to 1. A high alpha value indicates less data heterogeneity, i.e., an | ||
alpha value equal to 1.0 would result in homogeneous data distribution | ||
among different splits. | ||
|
||
### 2.1 Centralized training | ||
|
||
To simulate a centralized training baseline, we run FedAvg algorithm | ||
with 1 client for 25 rounds, where each round consists of one single epoch. | ||
|
||
``` | ||
python ./tf_fl_script_executor_cifar10.py \ | ||
--algo centralized \ | ||
--n_clients 1 \ | ||
--num_rounds 25 \ | ||
--batch_size 64 \ | ||
--epochs 1 \ | ||
--alpha 0.0 | ||
``` | ||
Note, here `--alpha 0.0` is a placeholder value used to disable data | ||
splits for centralized training. | ||
|
||
### 2.2 FedAvg with different data heterogeneity (alpha values) | ||
|
||
Here we run FedAvg for 50 rounds, each round with 4 local epochs. This | ||
corresponds roughly to the same number of iterations across clients as | ||
in the centralized baseline above (50*4 divided by 8 clients is 25): | ||
``` | ||
for alpha in 1.0 0.5 0.3 0.1; do | ||
python ./tf_fl_script_executor_cifar10.py \ | ||
--algo fedavg \ | ||
--n_clients 8 \ | ||
--num_rounds 50 \ | ||
--batch_size 64 \ | ||
--epochs 4 \ | ||
--alpha $alpha | ||
done | ||
``` | ||
|
||
### 2.3 Advanced FL algorithms (FedOpt) | ||
|
||
Next, let's try some different FL algorithms on a more heterogeneous split: | ||
|
||
[FedOpt](https://arxiv.org/abs/2003.00295) uses optimizers on server | ||
side to update the global model from client-side gradients. Here we | ||
use SGD with momentum and cosine learning rate decay: | ||
``` | ||
python ./tf_fl_script_executor_cifar10.py \ | ||
--algo fedopt \ | ||
--n_clients 8 \ | ||
--num_rounds 50 \ | ||
--batch_size 64 \ | ||
--epochs 4 \ | ||
--alpha 0.1 | ||
``` | ||
|
||
|
||
## 3. Results | ||
|
||
Now let's compare experimental results. | ||
|
||
### 3.1 Centralized training vs. FedAvg for homogeneous split | ||
Let's first compare FedAvg with homogeneous data split | ||
(i.e. `alpha=1.0`) and centralized training. As can be seen from the | ||
figure and table below, FedAvg can achieve similar performance to | ||
centralized training under homogeneous data split, i.e., when there is | ||
no difference in data distributions among different clients. | ||
|
||
| Config | Alpha | Val score | | ||
|-----------------|-------|-----------| | ||
| cifar10_central | n.a. | 0.8758 | | ||
| cifar10_fedavg | 1.0 | 0.8839 | | ||
|
||
![Central vs. FedAvg](./figs/fedavg-vs-centralized.png) | ||
|
||
### 3.2 Impact of client data heterogeneity | ||
|
||
Here we compare the impact of data heterogeneity by varying the | ||
`alpha` value, where lower values cause higher heterogeneity. As can | ||
be observed in the table below, performance of the FedAvg decreases | ||
as data heterogeneity becomes higher. | ||
|
||
| Config | Alpha | Val score | | ||
| ----------- | ----------- | ----------- | | ||
| cifar10_fedavg | 1.0 | 0.8838 | | ||
| cifar10_fedavg | 0.5 | 0.8685 | | ||
| cifar10_fedavg | 0.3 | 0.8323 | | ||
| cifar10_fedavg | 0.1 | 0.7903 | | ||
|
||
![Impact of client data | ||
heterogeneity](./figs/fedavg-diff-alphas.png) | ||
|
||
### 3.3 Impact of different FL algorithms | ||
|
||
Lastly, we compare the performance of different FL algorithms, with | ||
`alpha` value fixed to 0.1, i.e., a high client data heterogeneity. We | ||
can observe from the figure below that, FedOpt achieves better | ||
performance, with better convergence rates compared to FedAvg with the | ||
same alpha setting. | ||
|
||
| Config | Alpha | Val score | | ||
| ----------- | ----------- | ----------- | | ||
| cifar10_fedavg | 0.1 | 0.7903 | | ||
| cifar10_fedopt | 0.1 | 0.8145 | | ||
|
||
![Impact of different FL algorithms](./figs/fedavg-diff-algos.png) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
#!/usr/bin/env bash | ||
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
export TF_FORCE_GPU_ALLOW_GROWTH=true | ||
export TF_GPU_ALLOCATOR=cuda_malloc_asyncp | ||
|
||
|
||
# You can change GPU index if multiple GPUs are available | ||
GPU_INDX=0 | ||
|
||
# You can change workspace - where results and artefact will be saved. | ||
WORKSPACE=/tmp | ||
|
||
# Run centralized training job | ||
python ./tf_fl_script_executor_cifar10.py \ | ||
--algo centralized \ | ||
--n_clients 1 \ | ||
--num_rounds 25 \ | ||
--batch_size 64 \ | ||
--epochs 1 \ | ||
--alpha 0.0 \ | ||
--gpu $GPU_INDX \ | ||
--workspace $WORKSPACE | ||
|
||
|
||
# Run FedAvg with different alpha values | ||
for alpha in 1.0 0.5 0.3 0.1; do | ||
|
||
python ./tf_fl_script_executor_cifar10.py \ | ||
--algo fedavg \ | ||
--n_clients 8 \ | ||
--num_rounds 50 \ | ||
--batch_size 64 \ | ||
--epochs 4 \ | ||
--alpha $alpha \ | ||
--gpu $GPU_INDX \ | ||
--workspace $WORKSPACE | ||
|
||
done | ||
|
||
|
||
# Run FedOpt job | ||
python ./tf_fl_script_executor_cifar10.py \ | ||
--algo fedopt \ | ||
--n_clients 8 \ | ||
--num_rounds 50 \ | ||
--batch_size 64 \ | ||
--epochs 4 \ | ||
--alpha 0.1 \ | ||
--gpu $GPU_INDX \ | ||
--workspace $WORKSPACE |
Oops, something went wrong.