Add CIFAR 10 examples for Tensorflow-based FedAvg & FedOpt (#2704)

* add alpha splitting * run experiments * add tensorboard writers; increase model size * fedopt version * add fedprox loss and callback * Update ModerateTFNet to match CIFAR10 torch implementation. * Fix multiprocessing GPU init error. Handle no alpha split case. * Add preprocessing to match torch CIFAR10 result. * Unify executor script for different algos. * Remove unused codes. * Add preprocessing steps to make TF results on par with torch examples. * Fix script executor args. * Add script to run all experiments. * Add README. * Fix graphs in README. * Modify TF FedOpt controller. * Update README and FedOpt result. * Remove duplicated flare init. * Fix result graph for centralized vs FedAvg. * Fix README re. alpha value for centralized training. * Improve README. * Add workspace arg. Change min_clients to num_clients. * Add warning on TF GPU vRAM allocation. * Clean up TB summary logs. * Remove FedProx which will be implemented in another PR. * Update notebook & README, re-add missing file. * Update license header. * Re-include missing script. * Remove change in torch example script. * Fix flake8, black and isort format issues. --------- Co-authored-by: Holger Roth <hroth@nvidia.com> Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
NVIDIA · Aug 2, 2024 · d9ef041 · d9ef041
1 parent 236f4a0
commit d9ef041
Show file tree

Hide file tree

Showing 13 changed files with 932 additions and 11 deletions.
diff --git a/examples/getting_started/tf/README.md b/examples/getting_started/tf/README.md
@@ -0,0 +1,180 @@
+# Simulated Federated Learning with CIFAR10 Using Tensorflow
+
+This example shows `Tensorflow`-based classic Federated Learning
+algorithms, namely FedAvg and FedOpt on CIFAR10
+dataset. This example is analogous to [the example using `Pytorch`
+backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim)
+on the same dataset, where same experiments
+were conducted and analyzed. You should expect the same
+experimental results when comparing this example with the `Pytorch` one.
+
+In this example, the latest Client APIs were used to implement
+client-side training logics (details in file
+[`cifar10_tf_fl_alpha_split.py`](src/cifar10_tf_fl_alpha_split.py)),
+and the new
+[`FedJob`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/fed_job.py#L106)
+APIs were used to programmatically set up an
+`nvflare` job to be exported or ran by simulator (details in file
+[`tf_fl_script_executor_cifar10.py`](tf_fl_script_executor_cifar10.py)),
+alleviating the need of writing job config files, simplifying
+development process.
+
+Before continuing with the following sections, you can first refer to
+the [getting started notebook](nvflare_tf_getting_started.ipynb)
+included under this folder, to learn more about the implementation
+details, with an example walkthrough of FedAvg using a small
+Tensorflow model.
+
+## 1. Install requirements
+
+Install required packages
+```
+pip install --upgrade pip
+pip install -r ./requirements.txt
+```
+
+> **_NOTE:_**  We recommend either using a containerized deployment or virtual environment,
+> please refer to [getting started](https://nvflare.readthedocs.io/en/latest/getting_started.html).
+
+
+## 2. Run experiments
+
+This example uses simulator to run all experiments. The script
+[`tf_fl_script_executor_cifar10.py`](tf_fl_script_executor_cifar10.py)
+is the main script to be used to launch different experiments with
+different arguments (see sections below for details). A script
+[`run_jobs.sh`](run_jobs.sh) is also provided to run all experiments
+described below at once:
+```
+bash ./run_jobs.sh
+```
+The CIFAR10 dataset will be downloaded when running any experiment for
+the first time. `Tensorboard` summary logs will be generated during
+any experiment, and you can use `Tensorboard` to visualize the
+training and validation process as the experiment runs. Data split
+files, summary logs and results will be saved in a workspace
+directory, which defaults to `/tmp` and can be configured by setting
+`--workspace` argument of the `tf_fl_script_executor_cifar10.py`
+script.
+
+> [!WARNING]
+> If you are using GPU, make sure to set the following
+> environment variables before running a training job, to prevent
+> `Tensoflow` from allocating full GPU memory all at once:
+> `export TF_FORCE_GPU_ALLOW_GROWTH=true && export
+> TF_GPU_ALLOCATOR=cuda_malloc_asyncp`
+
+The set-up of all experiments in this example are kept the same as
+[the example using `Pytorch`
+backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim). Refer
+to the `Pytorch` example for more details. Similar to the Pytorch
+example, we here also use Dirichelet sampling on CIFAR10 data labels
+to simulate data heterogeneity among data splits for different client
+sites, controlled by an alpha value, ranging from 0 (not including 0)
+to 1. A high alpha value indicates less data heterogeneity, i.e., an
+alpha value equal to 1.0 would result in homogeneous data distribution
+among different splits.
+
+### 2.1 Centralized training
+
+To simulate a centralized training baseline, we run FedAvg algorithm
+with 1 client for 25 rounds, where each round consists of one single epoch.
+
+```
+python ./tf_fl_script_executor_cifar10.py \
+       --algo centralized \
+       --n_clients 1 \
+       --num_rounds 25 \
+       --batch_size 64 \
+       --epochs 1 \
+       --alpha 0.0
+```
+Note, here `--alpha 0.0` is a placeholder value used to disable data
+splits for centralized training.
+
+### 2.2 FedAvg with different data heterogeneity (alpha values)
+
+Here we run FedAvg for 50 rounds, each round with 4 local epochs. This
+corresponds roughly to the same number of iterations across clients as
+in the centralized baseline above (50*4 divided by 8 clients is 25):
+```
+for alpha in 1.0 0.5 0.3 0.1; do
+
+    python ./tf_fl_script_executor_cifar10.py \
+       --algo fedavg \
+       --n_clients 8 \
+       --num_rounds 50 \
+       --batch_size 64 \
+       --epochs 4 \
+       --alpha $alpha
+
+done
+```
+
+### 2.3 Advanced FL algorithms (FedOpt)
+
+Next, let's try some different FL algorithms on a more heterogeneous split:
+
+[FedOpt](https://arxiv.org/abs/2003.00295) uses optimizers on server
+side to update the global model from client-side gradients. Here we
+use SGD with momentum and cosine learning rate decay:
+```
+python ./tf_fl_script_executor_cifar10.py \
+       --algo fedopt \
+       --n_clients 8 \
+       --num_rounds 50 \
+       --batch_size 64 \
+       --epochs 4 \
+       --alpha 0.1
+```
+
+
+## 3. Results
+
+Now let's compare experimental results.
+
+### 3.1 Centralized training vs. FedAvg for homogeneous split
+Let's first compare FedAvg with homogeneous data split
+(i.e. `alpha=1.0`) and centralized training. As can be seen from the
+figure and table below, FedAvg can achieve similar performance to
+centralized training under homogeneous data split, i.e., when there is
+no difference in data distributions among different clients.
+
+| Config          | Alpha | Val score |
+|-----------------|-------|-----------|
+| cifar10_central | n.a.  | 0.8758    |
+| cifar10_fedavg  | 1.0   | 0.8839    |
+
+![Central vs. FedAvg](./figs/fedavg-vs-centralized.png)
+
+### 3.2 Impact of client data heterogeneity
+
+Here we compare the impact of data heterogeneity by varying the
+`alpha` value, where lower values cause higher heterogeneity. As can
+be observed in the table below, performance of the FedAvg decreases
+as data heterogeneity becomes higher.
+
+| Config |	Alpha |	Val score |
+| ----------- | ----------- |  ----------- |
+| cifar10_fedavg |	1.0 |	0.8838 |
+| cifar10_fedavg |	0.5 |	0.8685 |
+| cifar10_fedavg |	0.3 |	0.8323 |
+| cifar10_fedavg |	0.1 |	0.7903 |
+
+![Impact of client data
+heterogeneity](./figs/fedavg-diff-alphas.png)
+
+### 3.3 Impact of different FL algorithms
+
+Lastly, we compare the performance of different FL algorithms, with
+`alpha` value fixed to 0.1, i.e., a high client data heterogeneity. We
+can observe from the figure below that, FedOpt achieves better
+performance, with better convergence rates compared to FedAvg with the
+same alpha setting.
+
+| Config |	Alpha |	Val score |
+| ----------- | ----------- |  ----------- |
+| cifar10_fedavg |	0.1 |	0.7903 |
+| cifar10_fedopt |	0.1 |	0.8145 |
+
+![Impact of different FL algorithms](./figs/fedavg-diff-algos.png)
diff --git a/examples/getting_started/tf/figs/fedavg-diff-algos.png b/examples/getting_started/tf/figs/fedavg-diff-algos.png
diff --git a/examples/getting_started/tf/figs/fedavg-diff-alphas.png b/examples/getting_started/tf/figs/fedavg-diff-alphas.png
diff --git a/examples/getting_started/tf/figs/fedavg-vs-centralized.png b/examples/getting_started/tf/figs/fedavg-vs-centralized.png
diff --git a/examples/getting_started/tf/nvflare_tf_getting_started.ipynb b/examples/getting_started/tf/nvflare_tf_getting_started.ipynb
@@ -55,7 +55,7 @@
    "outputs": [],
    "source": [
     "! pip install --ignore-installed blinker\n",
-    "! pip install nvflare~=2.5.0rc tensorflow"
+    "! pip install -r ./requirements.txt"
    ]
   },
   {
@@ -410,6 +410,16 @@
    "source": [
     "! nvflare simulator -w /tmp/nvflare/jobs/workdir -n 2 -t 2 -gpu 0 /tmp/nvflare/jobs/job_config/cifar10_tf_fedavg"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "387662f4-7d05-4840-bcc7-a2523e03c2c2",
+   "metadata": {},
+   "source": [
+    "#### 8. Next steps\n",
+    "\n",
+    "Continue with the steps described in the [README.md](README.md) to run more experiments with a more complex model and more advanced FL algorithms. "
+   ]
   }
  ],
  "metadata": {
@@ -428,7 +438,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.7"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,

diff --git a/examples/getting_started/tf/run_jobs.sh b/examples/getting_started/tf/run_jobs.sh
@@ -0,0 +1,64 @@
+#!/usr/bin/env bash
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+export TF_FORCE_GPU_ALLOW_GROWTH=true
+export TF_GPU_ALLOCATOR=cuda_malloc_asyncp
+
+
+# You can change GPU index if multiple GPUs are available
+GPU_INDX=0
+
+# You can change workspace - where results and artefact will be saved.
+WORKSPACE=/tmp
+
+# Run centralized training job
+python ./tf_fl_script_executor_cifar10.py \
+       --algo centralized \
+       --n_clients 1 \
+       --num_rounds 25 \
+       --batch_size 64 \
+       --epochs 1 \
+       --alpha 0.0 \
+       --gpu $GPU_INDX \
+       --workspace $WORKSPACE
+
+
+# Run FedAvg with different alpha values
+for alpha in 1.0 0.5 0.3 0.1; do
+
+    python ./tf_fl_script_executor_cifar10.py \
+       --algo fedavg \
+       --n_clients 8 \
+       --num_rounds 50 \
+       --batch_size 64 \
+       --epochs 4 \
+       --alpha $alpha \
+       --gpu $GPU_INDX \
+       --workspace $WORKSPACE
+
+done
+
+
+# Run FedOpt job
+python ./tf_fl_script_executor_cifar10.py \
+       --algo fedopt \
+       --n_clients 8 \
+       --num_rounds 50 \
+       --batch_size 64 \
+       --epochs 4 \
+       --alpha 0.1 \
+       --gpu $GPU_INDX \
+       --workspace $WORKSPACE