[2.4] address vdr feedback (#2299)

* address vdr feedback * fix typos, rename * rewording, update diagram
NVIDIA · Jan 23, 2024 · 1d23ffc · 1d23ffc
1 parent 5c58d75
commit 1d23ffc
Show file tree

Hide file tree

Showing 14 changed files with 181 additions and 28 deletions.
diff --git a/docs/_static/css/additions.css b/docs/_static/css/additions.css
@@ -1,3 +1,6 @@
 .wy-menu-vertical li.toctree-l4.current li.toctree-l5>a{display:block;background:#b1b1b1;padding:.4045em 7.3em}
 .wy-menu-vertical li.toctree-l5.current li.toctree-l6>a{display:block;background:#a9a9a9;padding:.4045em 8.8em}
-.wy-menu-vertical li.toctree-l5{font-size: .9em;}
+.wy-menu-vertical li.toctree-l5{font-size: .9em;}
+.wy-menu > .caption > span.caption-text {
+    color: #76b900;
+  }
diff --git a/docs/conf.py b/docs/conf.py
@@ -44,7 +44,7 @@ def resolve_xref(self, env, fromdocname, builder, typ, target, node, contnode):
 # -- Project information -----------------------------------------------------
 
 project = "NVIDIA FLARE"
-copyright = "2023, NVIDIA"
+copyright = "2024, NVIDIA"
 author = "NVIDIA"
 
 # The full version, including alpha/beta/rc tags
@@ -114,6 +114,7 @@ def resolve_xref(self, env, fromdocname, builder, typ, target, node, contnode):
 html_scaled_image_link = False
 html_show_sourcelink = True
 html_favicon = "favicon.ico"
+html_logo = "resources/nvidia_logo.png"
 
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,

diff --git a/docs/example_applications_algorithms.rst b/docs/example_applications_algorithms.rst
@@ -26,7 +26,7 @@ Can be run from the :github_nvflare_link:`hello_world notebook <examples/hello-w
 --------------
 
   * :ref:`Hello Scatter and Gather <hello_scatter_and_gather>` - Example using the Scatter And Gather (SAG) workflow with a Numpy trainer
-  * :ref:`Hello Cross-Site Validation <hello_cross_val>` - Example using the Cross Site Model Eval workflow with a Numpy trainer
+  * :ref:`Hello Cross-Site Validation <hello_cross_val>` - Example using the Cross Site Model Eval workflow with a Numpy trainer, also demonstrates running cross site validation using the previous training results.
   * :github_nvflare_link:`Hello Cyclic Weight Transfer (GitHub) <examples/hello-world/hello-cyclic>` - Example using the CyclicController workflow to implement `Cyclic Weight Transfer <https://pubmed.ncbi.nlm.nih.gov/29617797/>`_ with TensorFlow as the deep learning training framework
   * :github_nvflare_link:`Swarm Learning <examples/advanced/swarm_learning>` - Example using Swarm Learning and Client-Controlled Cross-site Evaluation workflows.
   * :github_nvflare_link:`Client-Controlled Cyclic Weight Transfer <examples/hello-world/step-by-step/cifar10/cyclic_ccwf>` - Example using Client-Controlled Cyclic workflow using Client API.

diff --git a/docs/fl_introduction.rst b/docs/fl_introduction.rst
@@ -0,0 +1,64 @@
+.. _fl_introduction:
+
+###########################
+What is Federated Learning?
+###########################
+
+Federated Learning is a distributed learning paradigm where training occurs across multiple clients, each with their own local datasets.
+This enables the creation of common robust models without sharing sensitive local data, helping solve issues of data privacy and security.
+
+How does Federated Learning Work?
+=================================
+The federated learning (FL) server orchestrates the collaboration of multiple clients by first sending an initial model to the FL clients.
+The clients perform training on their local datasets, then send the model updates back to the FL server for aggregation to form a global model.
+This process forms a single round of federated learning and after a number of rounds, a robust global model can be developed.
+
+.. image:: resources/fl_diagram.png
+    :height: 500px
+    :align: center
+
+FL Terms and Definitions
+========================
+
+- FL server: manages job lifecycle, orchestrates workflow, assigns tasks to clients, performs aggregation
+- FL client: executes tasks, performs local computation/learning with local dataset, submits result back to FL server
+- FL algorithms: FedAvg, FedOpt, FedProx etc. implemented as workflows
+
+.. note::
+
+    Here we describe the centralized version of FL, where the FL server has the role of the aggregrator node. However in a decentralized version such as 
+    swarm learning, FL clients can serve as the aggregator node instead.
+
+- Types of FL
+
+  - horizontal FL: clients hold different data samples over the same features
+  - vertical FL: clients hold different features over an overlapping set of data samples
+  - swarm learning: a decentralized subset of FL where orchestration and aggregation is performed by the clients
+
+Main Benefits
+=============
+
+Enhanced Data Privacy and Security
+----------------------------------
+Federated learning facilitates data privacy and data locality by ensuring that the data remains at each site.
+Additionally, privacy preserving techniques such as homomorphic encryption and differential privacy filters can also be leveraged to further protect the transferred data.
+
+Improved Accuracy and Diversity
+-------------------------------
+By training with a variety of data sources across different clients, a robust and generalizable global model can be developed to better represent heterogeneous datasets.
+
+Scalability and Network Efficiency
+----------------------------------
+With the ability to perform training at the edge, federated learning can be highly scalable across the globe.
+Additionally only needing to transfer the model weights rather than entire datasets enables efficient use of network resources.
+
+Applications
+============
+An important application of federated learning is in the healthcare sector, where data privacy regulations and patient record confidentiality make training models challenging.
+Federated learning can help break down these healthcare data silos to allow hospitals and medical institutions to collaborate and pool their medical knowledge without the need to share their data.
+Some common use cases involve classification and detection tasks, drug discovery with federated protein LLMs, and federated analytics on medical devices.
+
+Furthermore there are many other areas and industries such as financial fraud detection, autonomous vehicles, HPC, mobile applications, etc. 
+where the ability to use distributed data silos while maintaining data privacy is essential for the development of better models.
+
+Read on to learn how FLARE is built as a flexible federated computing framework to enable federated learning from research to production.
diff --git a/docs/index.rst b/docs/index.rst
@@ -5,15 +5,29 @@ NVIDIA FLARE
 .. toctree::
    :maxdepth: -1
    :hidden:
+   :caption: Introduction
 
+   fl_introduction
    flare_overview
    whats_new
    getting_started
+
+.. toctree::
+   :maxdepth: -1
+   :hidden:
+   :caption: Guides
+
    example_applications_algorithms
    real_world_fl
    user_guide
    programming_guide
    best_practices
+
+.. toctree::
+   :maxdepth: -1
+   :hidden:
+   :caption: Miscellaneous
+
    faq
    publications_and_talks
    contributing
@@ -39,8 +53,8 @@ Learn more in the :ref:`FLARE Overview <flare_overview>`, :ref:`What's New <what
 
 Getting Started
 ===============
-For first-time users and FL researchers, FLARE provides the :ref:`fl_simulator` that allows you to build, test, and deploy applications locally.
-The :ref:`Getting Started guide <getting_started>` covers installation and walks through an example application using the FL Simulator.
+For first-time users and FL researchers, FLARE provides the :ref:`FL Simulator <fl_simulator>` that allows you to build, test, and deploy applications locally.
+The :ref:`Getting Started <getting_started>` guide covers installation and walks through an example application using the FL Simulator.
 
 When you are ready to for a secure, distributed deployment, the :ref:`Real World Federated Learning <real_world_fl>` section covers the tools and process
 required to deploy and operate a secure, real-world FLARE project.

diff --git a/docs/programming_guide/controllers/cross_site_model_evaluation.rst b/docs/programming_guide/controllers/cross_site_model_evaluation.rst
@@ -23,7 +23,7 @@ example that implements the :class:`cross site model evaluation workflow<nvflare
    model evaluation and ``config_fed_server.json`` is configured with :class:`ValidationJsonGenerator<nvflare.app_common.widgets.validation_json_generator.ValidationJsonGenerator>`
    to write the results to a JSON file on the server.
 
-Example with Cross Site Model Evaluation / Federated Evaluation Workflow
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-See the :github_nvflare_link:`Hello Numpy Cross-Site Validation <examples/hello-world/hello-numpy-cross-val>` for an example application with
-the cross site model evaluation / federated evaluation workflow.
+Examples with Cross Site Model Evaluation / Federated Evaluation Workflow
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+See :github_nvflare_link:`Hello Numpy Cross-Site Validation <examples/hello-world/hello-numpy-cross-val>` and
+:github_nvflare_link:`Step-by-step Cross-site Evaluation <examples/hello-world/step-by-step/cifar10/cse/cse.ipynb>` for examples using server-controlled cross-site evaluation workflows.
diff --git a/docs/programming_guide/execution_api_type/model_learner.rst b/docs/programming_guide/execution_api_type/model_learner.rst
@@ -197,5 +197,6 @@ More Resources
 ==============
 
 In addition to the :github_nvflare_link:`ModelLearner <nvflare/app_common/abstract/model_learner.py>` and :github_nvflare_link:`FLModel <nvflare/app_common/abstract/fl_model.py>` APIs, also take a look at some examples using the ModelLearner:
+
 - :github_nvflare_link:`Step-by-step ModelLearner <examples/hello-world/step-by-step/cifar10/sag_model_learner/sag_model_learner.ipynb>`
-- :github_nvflare_link:`CIFAR10 ModelLearner <hexamples/advanced/cifar10/pt/learners/cifar10_model_learner.py>`
+- :github_nvflare_link:`CIFAR10 ModelLearner <examples/advanced/cifar10/pt/learners/cifar10_model_learner.py>`
diff --git a/docs/resources/3rd_party_integration_diagram.png b/docs/resources/3rd_party_integration_diagram.png
diff --git a/docs/resources/fl_diagram.png b/docs/resources/fl_diagram.png
diff --git a/docs/resources/nvidia_logo.png b/docs/resources/nvidia_logo.png
diff --git a/docs/user_guide/nvflare_cli/job_cli.rst b/docs/user_guide/nvflare_cli/job_cli.rst
@@ -208,3 +208,40 @@ and change the app_1 batch_size to 4, app_2 batch_size to 6 for sag_pt_deploy_ma
 
     The app names must be defined in the job template being used: in this case ``app_1``, ``app_2``, and ``app_server``,
     are in ``sag_pt_deploy_map``.
+
+***************************
+FLARE Job Template Registry
+***************************
+
+Below is a table of all available :github_nvflare_link:`Job Templates <job_templates>`.
+
+.. csv-table::
+    :header: Example,Execution API Type,Controller Type,Description
+    :widths: 18, 18, 10, 30
+
+    cyclic_cc_pt,client,client_api,client-controlled cyclic workflow with PyTorch ClientAPI trainer
+    cyclic_pt,server,client_api,server-controlled cyclic workflow with PyTorch ClientAPI trainer
+    psi_csv,server,Executor,private-set intersection for csv data
+    sag_cross_np,server,client_executor,scatter & gather and cross-site validation using numpy
+    sag_cse_pt,server,client_api,scatter & gather workflow and cross-site evaluation with PyTorch
+    sag_gnn,server,client_api,scatter & gather workflow for gnn learning
+    sag_nemo,server,client_api,Scatter and Gather Workflow for NeMo
+    sag_np,server,client_api,scatter & gather workflow using numpy
+    sag_np_cell_pipe,server,client_api,scatter & gather workflow using numpy
+    sag_np_metrics,server,client_api,scatter & gather workflow using numpy
+    sag_pt,server,client_api,scatter & gather workflow using pytorch
+    sag_pt_deploy_map,server,client_api,SAG workflow using pytorch with deploy_map & site-specific configs
+    sag_pt_executor,server,Executor,scatter & gather workflow and cross-site evaluation with PyTorch
+    sag_pt_he,server,client_api,scatter & gather workflow using pytorch and homomorphic encryption
+    sag_pt_mlflow,server,client_api,scatter & gather workflow using pytorch with MLflow tracking
+    sag_pt_model_learner,server,ModelLearner,scatter & gather workflow and cross-site evaluation with PyTorch
+    sag_tf,server,client_api,scatter & gather workflow using TensorFlow
+    sklearn_kmeans,server,client_api,scikit-learn KMeans model
+    sklearn_linear,server,client_api,scikit-learn linear model
+    sklearn_svm,server,client_api,scikit-learn SVM model
+    stats_df,server,stats_executor,FedStats: tabular data with pandas
+    stats_image,server,stats_executor,FedStats: image intensity histogram
+    swarm_cse_pt,client,client_api,Swarm Learning with Cross-Site Evaluation with PyTorch
+    swarm_cse_pt_model_learner,client,ModelLearner,Swarm Learning with Cross-Site Evaluation with PyTorch ModelLearner
+    vertical_xgb,server,Executor,vertical federated xgboost
+    xgboost_tree,server,client_api,xgboost horizontal tree-based collaboration model
diff --git a/examples/README.md b/examples/README.md
@@ -77,7 +77,7 @@ When you open a notebook, select the kernel `nvflare_example` using the dropdown
 |----------------------------------------------------------------------------------------------------------------------------------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | [Notebook for Hello Examples](./hello-world/hello_world.ipynb)                                                                         | -            | Notebook for examples below.                                                                                                                                    |
 | [Hello Scatter and Gather](./hello-world/hello-numpy-sag/README.md)                                                                    | Numpy        | Example using [ScatterAndGather](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.scatter_and_gather.html) controller workflow.      |
-| [Hello Cross-Site Validation](./hello-world/hello-numpy-cross-val/README.md)                                                           | Numpy        | Example using [CrossSiteModelEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_model_eval.html) controller workflow. |
+| [Hello Cross-Site Validation](./hello-world/hello-numpy-cross-val/README.md)                                                           | Numpy        | Example using [CrossSiteModelEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_model_eval.html) controller workflow, and example using previous results without training workflow. |
 | [Hello Cyclic Weight Transfer](./hello-world/hello-cyclic/README.md)                                                                   | PyTorch      | Example using [CyclicController](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cyclic_ctl.html) controller workflow to implement [Cyclic Weight Transfer](https://pubmed.ncbi.nlm.nih.gov/29617797/). |
 | [Hello PyTorch](./hello-world/hello-pt/README.md)                                                                                      | PyTorch      | Example using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [PyTorch](https://pytorch.org/) as the deep learning training framework. |
 | [Hello TensorFlow](./hello-world/hello-tf2/README.md)                                                                                  | TensorFlow2  | Example of using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [TensorFlow](https://tensorflow.org/) as the deep learning training framework. |

diff --git a/examples/hello-world/step-by-step/README.md b/examples/hello-world/step-by-step/README.md
@@ -1,24 +1,22 @@
 #  Step-by-Step Examples
 
+To run the notebooks in each example, please make sure you first set up a virtual environment and install "./requirements.txt" and JupyterLab following the [example root readme](../README.md).
+
+* [cifar10](cifar10) - Multi-class classification with image data using CIFAR10 dataset
+* [higgs](higgs) - Binary classification with tabular data using HIGGS dataset
+
 These step-by-step example series are aimed to help users quickly get started and learn about FLARE.
 For consistency, each example in the series uses the same dataset- CIFAR10 for image data and the HIGGS dataset for tabular data.
-The examples will build upon previous ones to showcase different features, workflows, or APIs, allowing users to gain a comprehensive understanding of FLARE functionalities.
+The examples will build upon previous ones to showcase different features, workflows, or APIs, allowing users to gain a comprehensive understanding of FLARE functionalities. See the README in each directory for more details about each series.
+
+## Common Questions
 
-Given a machine learning problem, here are some common questions we aim to cover when formulating a federated learning problem:
+Here are some common questions we aim to cover in these examples series when formulating a federated learning problem:
 
 * What does the data look like?
 * How do we compare global statistics with the site's local data statistics? 
-* How to formulate the federated algorithms
-  * https://developer.download.nvidia.com/healthcare/clara/docs/federated_traditional_machine_learning_algorithms.pdf
-* Given the formulation, how to convert the existing machine learning or deep learning code to Federated learning code.
-  * [ML to FL examples](https://github.com/NVIDIA/NVFlare/blob/main/examples/hello-world/ml-to-fl/README.md)
-* For different types of federated learning workflows: Scatter and Gather, Cyclic Weight Transfer, Swarming learning, 
-Vertical learning, ... what do we need to change ?
-* How can we capture the experiment log, so all sites' metrics and global metrics can be viewed in experiment tracking tools such as Weights & Biases, MLfLow, or Tensorboard
-
-In these "step-by-step" examples, we will dive into these questions in two series of examples (See the README in each directory for more details about each series):
-
-* [cifar10](cifar10) - Multi-class classification with image data using CIFAR10 dataset
-* [higgs](higgs) - Binary classification with tabular data using HIGGS dataset
-
-
+* How to formulate the [federated algorithms](https://developer.download.nvidia.com/healthcare/clara/docs/federated_traditional_machine_learning_algorithms.pdf)?
+* How do we convert the existing machine learning or deep learning code to federated learning code? [ML to FL examples](https://github.com/NVIDIA/NVFlare/blob/main/examples/hello-world/ml-to-fl/README.md)
+* How do we use different types of federated learning workflows (e.g. Scatter and Gather, Cyclic Weight Transfer, Swarming learning,
+Vertical learning) and what do we need to change?
+* How can we capture the experiment log, so all sites' metrics and global metrics can be viewed in experiment tracking tools such as Weights & Biases MLfLow, or Tensorboard