Skip to content

Commit

Permalink
[2.4] address vdr feedback (#2299)
Browse files Browse the repository at this point in the history
* address vdr feedback

* fix typos, rename

* rewording, update diagram
  • Loading branch information
SYangster authored Jan 23, 2024
1 parent 5c58d75 commit 1d23ffc
Show file tree
Hide file tree
Showing 14 changed files with 181 additions and 28 deletions.
5 changes: 4 additions & 1 deletion docs/_static/css/additions.css
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
.wy-menu-vertical li.toctree-l4.current li.toctree-l5>a{display:block;background:#b1b1b1;padding:.4045em 7.3em}
.wy-menu-vertical li.toctree-l5.current li.toctree-l6>a{display:block;background:#a9a9a9;padding:.4045em 8.8em}
.wy-menu-vertical li.toctree-l5{font-size: .9em;}
.wy-menu-vertical li.toctree-l5{font-size: .9em;}
.wy-menu > .caption > span.caption-text {
color: #76b900;
}
3 changes: 2 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def resolve_xref(self, env, fromdocname, builder, typ, target, node, contnode):
# -- Project information -----------------------------------------------------

project = "NVIDIA FLARE"
copyright = "2023, NVIDIA"
copyright = "2024, NVIDIA"
author = "NVIDIA"

# The full version, including alpha/beta/rc tags
Expand Down Expand Up @@ -114,6 +114,7 @@ def resolve_xref(self, env, fromdocname, builder, typ, target, node, contnode):
html_scaled_image_link = False
html_show_sourcelink = True
html_favicon = "favicon.ico"
html_logo = "resources/nvidia_logo.png"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
Expand Down
2 changes: 1 addition & 1 deletion docs/example_applications_algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Can be run from the :github_nvflare_link:`hello_world notebook <examples/hello-w
--------------

* :ref:`Hello Scatter and Gather <hello_scatter_and_gather>` - Example using the Scatter And Gather (SAG) workflow with a Numpy trainer
* :ref:`Hello Cross-Site Validation <hello_cross_val>` - Example using the Cross Site Model Eval workflow with a Numpy trainer
* :ref:`Hello Cross-Site Validation <hello_cross_val>` - Example using the Cross Site Model Eval workflow with a Numpy trainer, also demonstrates running cross site validation using the previous training results.
* :github_nvflare_link:`Hello Cyclic Weight Transfer (GitHub) <examples/hello-world/hello-cyclic>` - Example using the CyclicController workflow to implement `Cyclic Weight Transfer <https://pubmed.ncbi.nlm.nih.gov/29617797/>`_ with TensorFlow as the deep learning training framework
* :github_nvflare_link:`Swarm Learning <examples/advanced/swarm_learning>` - Example using Swarm Learning and Client-Controlled Cross-site Evaluation workflows.
* :github_nvflare_link:`Client-Controlled Cyclic Weight Transfer <examples/hello-world/step-by-step/cifar10/cyclic_ccwf>` - Example using Client-Controlled Cyclic workflow using Client API.
Expand Down
64 changes: 64 additions & 0 deletions docs/fl_introduction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
.. _fl_introduction:

###########################
What is Federated Learning?
###########################

Federated Learning is a distributed learning paradigm where training occurs across multiple clients, each with their own local datasets.
This enables the creation of common robust models without sharing sensitive local data, helping solve issues of data privacy and security.

How does Federated Learning Work?
=================================
The federated learning (FL) server orchestrates the collaboration of multiple clients by first sending an initial model to the FL clients.
The clients perform training on their local datasets, then send the model updates back to the FL server for aggregation to form a global model.
This process forms a single round of federated learning and after a number of rounds, a robust global model can be developed.

.. image:: resources/fl_diagram.png
:height: 500px
:align: center

FL Terms and Definitions
========================

- FL server: manages job lifecycle, orchestrates workflow, assigns tasks to clients, performs aggregation
- FL client: executes tasks, performs local computation/learning with local dataset, submits result back to FL server
- FL algorithms: FedAvg, FedOpt, FedProx etc. implemented as workflows

.. note::

Here we describe the centralized version of FL, where the FL server has the role of the aggregrator node. However in a decentralized version such as
swarm learning, FL clients can serve as the aggregator node instead.

- Types of FL

- horizontal FL: clients hold different data samples over the same features
- vertical FL: clients hold different features over an overlapping set of data samples
- swarm learning: a decentralized subset of FL where orchestration and aggregation is performed by the clients

Main Benefits
=============

Enhanced Data Privacy and Security
----------------------------------
Federated learning facilitates data privacy and data locality by ensuring that the data remains at each site.
Additionally, privacy preserving techniques such as homomorphic encryption and differential privacy filters can also be leveraged to further protect the transferred data.

Improved Accuracy and Diversity
-------------------------------
By training with a variety of data sources across different clients, a robust and generalizable global model can be developed to better represent heterogeneous datasets.

Scalability and Network Efficiency
----------------------------------
With the ability to perform training at the edge, federated learning can be highly scalable across the globe.
Additionally only needing to transfer the model weights rather than entire datasets enables efficient use of network resources.

Applications
============
An important application of federated learning is in the healthcare sector, where data privacy regulations and patient record confidentiality make training models challenging.
Federated learning can help break down these healthcare data silos to allow hospitals and medical institutions to collaborate and pool their medical knowledge without the need to share their data.
Some common use cases involve classification and detection tasks, drug discovery with federated protein LLMs, and federated analytics on medical devices.

Furthermore there are many other areas and industries such as financial fraud detection, autonomous vehicles, HPC, mobile applications, etc.
where the ability to use distributed data silos while maintaining data privacy is essential for the development of better models.

Read on to learn how FLARE is built as a flexible federated computing framework to enable federated learning from research to production.
18 changes: 16 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,29 @@ NVIDIA FLARE
.. toctree::
:maxdepth: -1
:hidden:
:caption: Introduction

fl_introduction
flare_overview
whats_new
getting_started

.. toctree::
:maxdepth: -1
:hidden:
:caption: Guides

example_applications_algorithms
real_world_fl
user_guide
programming_guide
best_practices

.. toctree::
:maxdepth: -1
:hidden:
:caption: Miscellaneous

faq
publications_and_talks
contributing
Expand All @@ -39,8 +53,8 @@ Learn more in the :ref:`FLARE Overview <flare_overview>`, :ref:`What's New <what

Getting Started
===============
For first-time users and FL researchers, FLARE provides the :ref:`fl_simulator` that allows you to build, test, and deploy applications locally.
The :ref:`Getting Started guide <getting_started>` covers installation and walks through an example application using the FL Simulator.
For first-time users and FL researchers, FLARE provides the :ref:`FL Simulator <fl_simulator>` that allows you to build, test, and deploy applications locally.
The :ref:`Getting Started <getting_started>` guide covers installation and walks through an example application using the FL Simulator.

When you are ready to for a secure, distributed deployment, the :ref:`Real World Federated Learning <real_world_fl>` section covers the tools and process
required to deploy and operate a secure, real-world FLARE project.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ example that implements the :class:`cross site model evaluation workflow<nvflare
model evaluation and ``config_fed_server.json`` is configured with :class:`ValidationJsonGenerator<nvflare.app_common.widgets.validation_json_generator.ValidationJsonGenerator>`
to write the results to a JSON file on the server.

Example with Cross Site Model Evaluation / Federated Evaluation Workflow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
See the :github_nvflare_link:`Hello Numpy Cross-Site Validation <examples/hello-world/hello-numpy-cross-val>` for an example application with
the cross site model evaluation / federated evaluation workflow.
Examples with Cross Site Model Evaluation / Federated Evaluation Workflow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
See :github_nvflare_link:`Hello Numpy Cross-Site Validation <examples/hello-world/hello-numpy-cross-val>` and
:github_nvflare_link:`Step-by-step Cross-site Evaluation <examples/hello-world/step-by-step/cifar10/cse/cse.ipynb>` for examples using server-controlled cross-site evaluation workflows.
3 changes: 2 additions & 1 deletion docs/programming_guide/execution_api_type/model_learner.rst
Original file line number Diff line number Diff line change
Expand Up @@ -197,5 +197,6 @@ More Resources
==============

In addition to the :github_nvflare_link:`ModelLearner <nvflare/app_common/abstract/model_learner.py>` and :github_nvflare_link:`FLModel <nvflare/app_common/abstract/fl_model.py>` APIs, also take a look at some examples using the ModelLearner:

- :github_nvflare_link:`Step-by-step ModelLearner <examples/hello-world/step-by-step/cifar10/sag_model_learner/sag_model_learner.ipynb>`
- :github_nvflare_link:`CIFAR10 ModelLearner <hexamples/advanced/cifar10/pt/learners/cifar10_model_learner.py>`
- :github_nvflare_link:`CIFAR10 ModelLearner <examples/advanced/cifar10/pt/learners/cifar10_model_learner.py>`
Binary file modified docs/resources/3rd_party_integration_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/fl_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/nvidia_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
37 changes: 37 additions & 0 deletions docs/user_guide/nvflare_cli/job_cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -208,3 +208,40 @@ and change the app_1 batch_size to 4, app_2 batch_size to 6 for sag_pt_deploy_ma

The app names must be defined in the job template being used: in this case ``app_1``, ``app_2``, and ``app_server``,
are in ``sag_pt_deploy_map``.

***************************
FLARE Job Template Registry
***************************

Below is a table of all available :github_nvflare_link:`Job Templates <job_templates>`.

.. csv-table::
:header: Example,Execution API Type,Controller Type,Description
:widths: 18, 18, 10, 30

cyclic_cc_pt,client,client_api,client-controlled cyclic workflow with PyTorch ClientAPI trainer
cyclic_pt,server,client_api,server-controlled cyclic workflow with PyTorch ClientAPI trainer
psi_csv,server,Executor,private-set intersection for csv data
sag_cross_np,server,client_executor,scatter & gather and cross-site validation using numpy
sag_cse_pt,server,client_api,scatter & gather workflow and cross-site evaluation with PyTorch
sag_gnn,server,client_api,scatter & gather workflow for gnn learning
sag_nemo,server,client_api,Scatter and Gather Workflow for NeMo
sag_np,server,client_api,scatter & gather workflow using numpy
sag_np_cell_pipe,server,client_api,scatter & gather workflow using numpy
sag_np_metrics,server,client_api,scatter & gather workflow using numpy
sag_pt,server,client_api,scatter & gather workflow using pytorch
sag_pt_deploy_map,server,client_api,SAG workflow using pytorch with deploy_map & site-specific configs
sag_pt_executor,server,Executor,scatter & gather workflow and cross-site evaluation with PyTorch
sag_pt_he,server,client_api,scatter & gather workflow using pytorch and homomorphic encryption
sag_pt_mlflow,server,client_api,scatter & gather workflow using pytorch with MLflow tracking
sag_pt_model_learner,server,ModelLearner,scatter & gather workflow and cross-site evaluation with PyTorch
sag_tf,server,client_api,scatter & gather workflow using TensorFlow
sklearn_kmeans,server,client_api,scikit-learn KMeans model
sklearn_linear,server,client_api,scikit-learn linear model
sklearn_svm,server,client_api,scikit-learn SVM model
stats_df,server,stats_executor,FedStats: tabular data with pandas
stats_image,server,stats_executor,FedStats: image intensity histogram
swarm_cse_pt,client,client_api,Swarm Learning with Cross-Site Evaluation with PyTorch
swarm_cse_pt_model_learner,client,ModelLearner,Swarm Learning with Cross-Site Evaluation with PyTorch ModelLearner
vertical_xgb,server,Executor,vertical federated xgboost
xgboost_tree,server,client_api,xgboost horizontal tree-based collaboration model
2 changes: 1 addition & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ When you open a notebook, select the kernel `nvflare_example` using the dropdown
|----------------------------------------------------------------------------------------------------------------------------------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Notebook for Hello Examples](./hello-world/hello_world.ipynb) | - | Notebook for examples below. |
| [Hello Scatter and Gather](./hello-world/hello-numpy-sag/README.md) | Numpy | Example using [ScatterAndGather](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.scatter_and_gather.html) controller workflow. |
| [Hello Cross-Site Validation](./hello-world/hello-numpy-cross-val/README.md) | Numpy | Example using [CrossSiteModelEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_model_eval.html) controller workflow. |
| [Hello Cross-Site Validation](./hello-world/hello-numpy-cross-val/README.md) | Numpy | Example using [CrossSiteModelEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_model_eval.html) controller workflow, and example using previous results without training workflow. |
| [Hello Cyclic Weight Transfer](./hello-world/hello-cyclic/README.md) | PyTorch | Example using [CyclicController](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cyclic_ctl.html) controller workflow to implement [Cyclic Weight Transfer](https://pubmed.ncbi.nlm.nih.gov/29617797/). |
| [Hello PyTorch](./hello-world/hello-pt/README.md) | PyTorch | Example using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [PyTorch](https://pytorch.org/) as the deep learning training framework. |
| [Hello TensorFlow](./hello-world/hello-tf2/README.md) | TensorFlow2 | Example of using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [TensorFlow](https://tensorflow.org/) as the deep learning training framework. |
Expand Down
30 changes: 14 additions & 16 deletions examples/hello-world/step-by-step/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,22 @@
# Step-by-Step Examples

To run the notebooks in each example, please make sure you first set up a virtual environment and install "./requirements.txt" and JupyterLab following the [example root readme](../README.md).

* [cifar10](cifar10) - Multi-class classification with image data using CIFAR10 dataset
* [higgs](higgs) - Binary classification with tabular data using HIGGS dataset

These step-by-step example series are aimed to help users quickly get started and learn about FLARE.
For consistency, each example in the series uses the same dataset- CIFAR10 for image data and the HIGGS dataset for tabular data.
The examples will build upon previous ones to showcase different features, workflows, or APIs, allowing users to gain a comprehensive understanding of FLARE functionalities.
The examples will build upon previous ones to showcase different features, workflows, or APIs, allowing users to gain a comprehensive understanding of FLARE functionalities. See the README in each directory for more details about each series.

## Common Questions

Given a machine learning problem, here are some common questions we aim to cover when formulating a federated learning problem:
Here are some common questions we aim to cover in these examples series when formulating a federated learning problem:

* What does the data look like?
* How do we compare global statistics with the site's local data statistics?
* How to formulate the federated algorithms
* https://developer.download.nvidia.com/healthcare/clara/docs/federated_traditional_machine_learning_algorithms.pdf
* Given the formulation, how to convert the existing machine learning or deep learning code to Federated learning code.
* [ML to FL examples](https://github.com/NVIDIA/NVFlare/blob/main/examples/hello-world/ml-to-fl/README.md)
* For different types of federated learning workflows: Scatter and Gather, Cyclic Weight Transfer, Swarming learning,
Vertical learning, ... what do we need to change ?
* How can we capture the experiment log, so all sites' metrics and global metrics can be viewed in experiment tracking tools such as Weights & Biases, MLfLow, or Tensorboard

In these "step-by-step" examples, we will dive into these questions in two series of examples (See the README in each directory for more details about each series):

* [cifar10](cifar10) - Multi-class classification with image data using CIFAR10 dataset
* [higgs](higgs) - Binary classification with tabular data using HIGGS dataset


* How to formulate the [federated algorithms](https://developer.download.nvidia.com/healthcare/clara/docs/federated_traditional_machine_learning_algorithms.pdf)?
* How do we convert the existing machine learning or deep learning code to federated learning code? [ML to FL examples](https://github.com/NVIDIA/NVFlare/blob/main/examples/hello-world/ml-to-fl/README.md)
* How do we use different types of federated learning workflows (e.g. Scatter and Gather, Cyclic Weight Transfer, Swarming learning,
Vertical learning) and what do we need to change?
* How can we capture the experiment log, so all sites' metrics and global metrics can be viewed in experiment tracking tools such as Weights & Biases MLfLow, or Tensorboard
Loading

0 comments on commit 1d23ffc

Please sign in to comment.