diff --git a/docs/_static/css/additions.css b/docs/_static/css/additions.css index 999ff74614..a8490da9b4 100644 --- a/docs/_static/css/additions.css +++ b/docs/_static/css/additions.css @@ -1,3 +1,6 @@ .wy-menu-vertical li.toctree-l4.current li.toctree-l5>a{display:block;background:#b1b1b1;padding:.4045em 7.3em} .wy-menu-vertical li.toctree-l5.current li.toctree-l6>a{display:block;background:#a9a9a9;padding:.4045em 8.8em} -.wy-menu-vertical li.toctree-l5{font-size: .9em;} \ No newline at end of file +.wy-menu-vertical li.toctree-l5{font-size: .9em;} +.wy-menu > .caption > span.caption-text { + color: #76b900; + } \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py index fa388e92eb..57a8f9e1c7 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -44,7 +44,7 @@ def resolve_xref(self, env, fromdocname, builder, typ, target, node, contnode): # -- Project information ----------------------------------------------------- project = "NVIDIA FLARE" -copyright = "2023, NVIDIA" +copyright = "2024, NVIDIA" author = "NVIDIA" # The full version, including alpha/beta/rc tags @@ -114,6 +114,7 @@ def resolve_xref(self, env, fromdocname, builder, typ, target, node, contnode): html_scaled_image_link = False html_show_sourcelink = True html_favicon = "favicon.ico" +html_logo = "resources/nvidia_logo.png" # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, diff --git a/docs/example_applications_algorithms.rst b/docs/example_applications_algorithms.rst index ca7d6a2c4f..a69e8b4f1c 100644 --- a/docs/example_applications_algorithms.rst +++ b/docs/example_applications_algorithms.rst @@ -26,7 +26,7 @@ Can be run from the :github_nvflare_link:`hello_world notebook ` - Example using the Scatter And Gather (SAG) workflow with a Numpy trainer - * :ref:`Hello Cross-Site Validation ` - Example using the Cross Site Model Eval workflow with a Numpy trainer + * :ref:`Hello Cross-Site Validation ` - Example using the Cross Site Model Eval workflow with a Numpy trainer, also demonstrates running cross site validation using the previous training results. * :github_nvflare_link:`Hello Cyclic Weight Transfer (GitHub) ` - Example using the CyclicController workflow to implement `Cyclic Weight Transfer `_ with TensorFlow as the deep learning training framework * :github_nvflare_link:`Swarm Learning ` - Example using Swarm Learning and Client-Controlled Cross-site Evaluation workflows. * :github_nvflare_link:`Client-Controlled Cyclic Weight Transfer ` - Example using Client-Controlled Cyclic workflow using Client API. diff --git a/docs/fl_introduction.rst b/docs/fl_introduction.rst new file mode 100644 index 0000000000..8f12c855ad --- /dev/null +++ b/docs/fl_introduction.rst @@ -0,0 +1,68 @@ +.. _fl_introduction: + +########################### +What is Federated Learning? +########################### + +Federated Learning is a distributed learning paradigm where training occurs across multiple clients, each with their own local datasets. +This enables the creation of common robust models without sharing sensitive local data, helping solve issues of data privacy and security. + +How does Federated Learning Work? +================================= +The federated learning (FL) aggregrator orchestrates the collaboration of multiple clients by first sending an initial model to the FL clients. +The clients perform training on their local datasets, then send the model updates back to the FL aggregrator for aggregation to form a global model. +This process forms a single round of federated learning and after a number of rounds, a robust global model can be developed. + +.. note:: + + In the diagrams below, the FL server has the role of the FL aggregrator. In the case of client-controlled workflows such as swarm learning, + FL clients can serve as FL aggregrators instead. + +.. image:: resources/fl_diagram.png + :height: 500px + :align: center + +FL Terms and Definitions +======================== + +- FL server: manages job lifecycle +- FL aggregrator: orchestrates workflow, assigns tasks to clients, performs aggregation +- FL client: executes tasks, performs local computation/learning with local dataset, submits result back to FL aggregrator + +.. image:: resources/controller_worker_flow.png + :height: 350px + +- FL algorithms: FedAvg, FedOpt, FedProx etc. implemented as workflows +- Types of FL + + - horizontal FL: clients hold different data samples over the same features + - vertical FL: clients hold different features over an overlapping set of data samples + - swarm learning: a decentralized subset of FL where orchestration and aggregation is performed by the clients in cases where the server is not trusted + +Main Benefits +============= + +Enhanced Data Privacy and Security +---------------------------------- +Federated learning facilitates data privacy and data locality by ensuring that the data remains at each site. +Additionally, privacy preserving techniques such as homomorphic encryption and differential privacy filters can also be leveraged to further protect the transferred data. + +Improved Accuracy and Diversity +------------------------------- +By training with a variety of data sources across different clients, a robust and generalizable global model can be developed to better represent heterogeneous datasets. + +Scalability and Network Efficiency +---------------------------------- +With the ability to perform training at the edge, federated learning can be highly scalable across the globe. +Additionally only needing to transfer the model weights rather than entire datasets enables efficient use of network resources. + +Applications +============ +An important application of federated learning is in the healthcare sector, where data privacy regulations and patient record confidentiality make training models challenging. +Federated learning can help break down these healthcare data silos to allow hospitals and medical institutions to collaborate and pool their medical knowledge without the need to share their data. +Some common use cases involve classification and detection tasks, drug discovery with federated protein LLMs, and federated analytics on medical devices. + +Furthermore there are many other areas and industries such as financial fraud detection, autonomous vehicles, HPC, mobile applications, etc. +where the ability to use distributed data silos while maintaining data privacy is essential for the development of better models. + +Read on to learn how FLARE is built as a flexible federated computing framework to enable federated learning from research to production. \ No newline at end of file diff --git a/docs/index.rst b/docs/index.rst index 78ea857358..e15ca6462a 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -5,15 +5,29 @@ NVIDIA FLARE .. toctree:: :maxdepth: -1 :hidden: + :caption: Introduction + fl_introduction flare_overview whats_new getting_started + +.. toctree:: + :maxdepth: -1 + :hidden: + :caption: Guides + example_applications_algorithms real_world_fl user_guide programming_guide best_practices + +.. toctree:: + :maxdepth: -1 + :hidden: + :caption: Miscellaneous + faq publications_and_talks contributing @@ -34,13 +48,13 @@ and simulation to real-world production deployment. Some of the key components - **Management tools** for secure provisioning and deployment, orchestration, and management - **Specification-based API** for extensibility -Learn more in the :ref:`FLARE Overview `, :ref:`Key Features `, :ref:`What's New `, and the +Learn more in the :ref:`FLARE Overview `, :ref:`What's New `, and the :ref:`User Guide ` and :ref:`Programming Guide `. Getting Started =============== -For first-time users and FL researchers, FLARE provides the :ref:`fl_simulator` that allows you to build, test, and deploy applications locally. -The :ref:`Getting Started guide ` covers installation and walks through an example application using the FL Simulator. +For first-time users and FL researchers, FLARE provides the :ref:`FL Simulator ` that allows you to build, test, and deploy applications locally. +The :ref:`Getting Started ` guide covers installation and walks through an example application using the FL Simulator. When you are ready to for a secure, distributed deployment, the :ref:`Real World Federated Learning ` section covers the tools and process required to deploy and operate a secure, real-world FLARE project. diff --git a/docs/programming_guide/controllers/cross_site_model_evaluation.rst b/docs/programming_guide/controllers/cross_site_model_evaluation.rst index 456e8fc138..75936806d5 100644 --- a/docs/programming_guide/controllers/cross_site_model_evaluation.rst +++ b/docs/programming_guide/controllers/cross_site_model_evaluation.rst @@ -23,7 +23,7 @@ example that implements the :class:`cross site model evaluation workflow` to write the results to a JSON file on the server. -Example with Cross Site Model Evaluation / Federated Evaluation Workflow -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -See the :github_nvflare_link:`Hello Numpy Cross-Site Validation ` for an example application with -the cross site model evaluation / federated evaluation workflow. +Examples with Cross Site Model Evaluation / Federated Evaluation Workflow +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +See :github_nvflare_link:`Hello Numpy Cross-Site Validation ` and +:github_nvflare_link:`Step-by-step Cross-site Evaluation ` for examples using server-controlled cross-site evaluation workflows. diff --git a/docs/programming_guide/fl_clients/3rd_party_integration.rst b/docs/programming_guide/fl_clients/3rd_party_integration.rst index 4c83823f3b..60efdcfe02 100644 --- a/docs/programming_guide/fl_clients/3rd_party_integration.rst +++ b/docs/programming_guide/fl_clients/3rd_party_integration.rst @@ -5,6 +5,8 @@ ############################ NVFLARE 2.4.0 supports 3rd-party external systems to integrate with FL clients. +In certain scenarios, users face challenges when attempting to moving the training logic to the FLARE client side due to pre-existing ML/DL training system infrastructure. +We introduce the Third-Party Integration Pattern, which allows the FLARE system and a third-party external training system to seamlessly exchange model parameters without requiring a tightly integrated system. The FL Client installs the :mod:`TaskExchanger` executor and the 3rd-party system uses the :mod:`FlareAgent` to interact with the TaskExchanger to receive tasks, and submit results to the FLARE server. diff --git a/docs/programming_guide/fl_clients/model_learner.rst b/docs/programming_guide/fl_clients/model_learner.rst index 6a80fec437..292d0e78c3 100644 --- a/docs/programming_guide/fl_clients/model_learner.rst +++ b/docs/programming_guide/fl_clients/model_learner.rst @@ -197,5 +197,6 @@ More Resources ============== In addition to the :github_nvflare_link:`ModelLearner ` and :github_nvflare_link:`FLModel ` APIs, also take a look at some examples using the ModelLearner: + - :github_nvflare_link:`Step-by-step ModelLearner ` -- :github_nvflare_link:`CIFAR10 ModelLearner ` +- :github_nvflare_link:`CIFAR10 ModelLearner ` diff --git a/docs/programming_guide/resources/3rd_party_integration_diagram.png b/docs/programming_guide/resources/3rd_party_integration_diagram.png index 43de0dbaa7..7e02787425 100644 Binary files a/docs/programming_guide/resources/3rd_party_integration_diagram.png and b/docs/programming_guide/resources/3rd_party_integration_diagram.png differ diff --git a/docs/resources/fl_diagram.png b/docs/resources/fl_diagram.png new file mode 100644 index 0000000000..cb5442732f Binary files /dev/null and b/docs/resources/fl_diagram.png differ diff --git a/docs/resources/nvidia_logo.png b/docs/resources/nvidia_logo.png new file mode 100644 index 0000000000..578592cfc3 Binary files /dev/null and b/docs/resources/nvidia_logo.png differ diff --git a/docs/user_guide/nvflare_cli/job_cli.rst b/docs/user_guide/nvflare_cli/job_cli.rst index 78852af370..d8a29f2804 100644 --- a/docs/user_guide/nvflare_cli/job_cli.rst +++ b/docs/user_guide/nvflare_cli/job_cli.rst @@ -208,3 +208,40 @@ and change the app_1 batch_size to 4, app_2 batch_size to 6 for sag_pt_deploy_ma The app names must be defined in the job template being used: in this case ``app_1``, ``app_2``, and ``app_server``, are in ``sag_pt_deploy_map``. + +*************************** +FLARE Job Template Registry +*************************** + +Below is a table of all available :github_nvflare_link:`Job Templates `. + +.. csv-table:: + :header: Example,Client Category,Controller-Type,Description + :widths: 18, 10, 10, 30 + + cyclic_cc_pt,client,client_api,client-controlled cyclic workflow with PyTorch ClientAPI trainer + cyclic_pt,server,client_api,server-controlled cyclic workflow with PyTorch ClientAPI trainer + psi_csv,server,Executor,private-set intersection for csv data + sag_cross_np,server,client_executor,scatter & gather and cross-site validation using numpy + sag_cse_pt,server,client_api,scatter & gather workflow and cross-site evaluation with PyTorch + sag_gnn,server,client_api,scatter & gather workflow for gnn learning + sag_nemo,server,client_api,Scatter and Gather Workflow for NeMo + sag_np,server,client_api,scatter & gather workflow using numpy + sag_np_cell_pipe,server,client_api,scatter & gather workflow using numpy + sag_np_metrics,server,client_api,scatter & gather workflow using numpy + sag_pt,server,client_api,scatter & gather workflow using pytorch + sag_pt_deploy_map,server,client_api,SAG workflow using pytorch with deploy_map & site-specific configs + sag_pt_executor,server,Executor,scatter & gather workflow and cross-site evaluation with PyTorch + sag_pt_he,server,client_api,scatter & gather workflow using pytorch and homomorphic encryption + sag_pt_mlflow,server,client_api,scatter & gather workflow using pytorch with MLflow tracking + sag_pt_model_learner,server,ModelLearner,scatter & gather workflow and cross-site evaluation with PyTorch + sag_tf,server,client_api,scatter & gather workflow using TensorFlow + sklearn_kmeans,server,client_api,scikit-learn KMeans model + sklearn_linear,server,client_api,scikit-learn linear model + sklearn_svm,server,client_api,scikit-learn SVM model + stats_df,server,stats_executor,FedStats: tabular data with pandas + stats_image,server,stats_executor,FedStats: image intensity histogram + swarm_cse_pt,client,client_api,Swarm Learning with Cross-Site Evaluation with PyTorch + swarm_cse_pt_model_learner,client,ModelLearner,Swarm Learning with Cross-Site Evaluation with PyTorch ModelLearner + vertical_xgb,server,Executor,vertical federated xgboost + xgboost_tree,server,client_api,xgboost horizontal tree-based collaboration model diff --git a/examples/README.md b/examples/README.md index 38858563a5..04da856400 100644 --- a/examples/README.md +++ b/examples/README.md @@ -77,7 +77,7 @@ When you open a notebook, select the kernel `nvflare_example` using the dropdown |----------------------------------------------------------------------------------------------------------------------------------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------| | [Notebook for Hello Examples](./hello-world/hello_world.ipynb) | - | Notebook for examples below. | | [Hello Scatter and Gather](./hello-world/hello-numpy-sag/README.md) | Numpy | Example using [ScatterAndGather](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.scatter_and_gather.html) controller workflow. | -| [Hello Cross-Site Validation](./hello-world/hello-numpy-cross-val/README.md) | Numpy | Example using [CrossSiteModelEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_model_eval.html) controller workflow. | +| [Hello Cross-Site Validation](./hello-world/hello-numpy-cross-val/README.md) | Numpy | Example using [CrossSiteModelEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_model_eval.html) controller workflow, and example using previous results without training workflow. | | [Hello Cyclic Weight Transfer](./hello-world/hello-cyclic/README.md) | PyTorch | Example using [CyclicController](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cyclic_ctl.html) controller workflow to implement [Cyclic Weight Transfer](https://pubmed.ncbi.nlm.nih.gov/29617797/). | | [Hello PyTorch](./hello-world/hello-pt/README.md) | PyTorch | Example using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [PyTorch](https://pytorch.org/) as the deep learning training framework. | | [Hello TensorFlow](./hello-world/hello-tf2/README.md) | TensorFlow2 | Example of using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [TensorFlow](https://tensorflow.org/) as the deep learning training framework. | diff --git a/examples/hello-world/step-by-step/README.md b/examples/hello-world/step-by-step/README.md index 77ecc34f52..01dc102c7b 100644 --- a/examples/hello-world/step-by-step/README.md +++ b/examples/hello-world/step-by-step/README.md @@ -1,24 +1,22 @@ # Step-by-Step Examples +To run the notebooks in each example, please make sure you first set up a virtual environment and install "./requirements.txt" and JupyterLab following the [example root readme](../README.md). + +* [cifar10](cifar10) - Multi-class classification with image data using CIFAR10 dataset +* [higgs](higgs) - Binary classification with tabular data using HIGGS dataset + These step-by-step example series are aimed to help users quickly get started and learn about FLARE. For consistency, each example in the series uses the same dataset- CIFAR10 for image data and the HIGGS dataset for tabular data. -The examples will build upon previous ones to showcase different features, workflows, or APIs, allowing users to gain a comprehensive understanding of FLARE functionalities. +The examples will build upon previous ones to showcase different features, workflows, or APIs, allowing users to gain a comprehensive understanding of FLARE functionalities. See the README in each directory for more details about each series. + +## Key Ideas -Given a machine learning problem, here are some common questions we aim to cover when formulating a federated learning problem: +Here are some common questions we aim to cover in these examples series when formulating a federated learning problem: * What does the data look like? * How do we compare global statistics with the site's local data statistics? -* How to formulate the federated algorithms - * https://developer.download.nvidia.com/healthcare/clara/docs/federated_traditional_machine_learning_algorithms.pdf -* Given the formulation, how to convert the existing machine learning or deep learning code to Federated learning code. - * [ML to FL examples](https://github.com/NVIDIA/NVFlare/blob/main/examples/hello-world/ml-to-fl/README.md) -* For different types of federated learning workflows: Scatter and Gather, Cyclic Weight Transfer, Swarming learning, -Vertical learning, ... what do we need to change ? -* How can we capture the experiment log, so all sites' metrics and global metrics can be viewed in experiment tracking tools such as Weights & Biases, MLfLow, or Tensorboard - -In these "step-by-step" examples, we will dive into these questions in two series of examples (See the README in each directory for more details about each series): - -* [cifar10](cifar10) - Multi-class classification with image data using CIFAR10 dataset -* [higgs](higgs) - Binary classification with tabular data using HIGGS dataset - - +* How to formulate the [federated algorithms](https://developer.download.nvidia.com/healthcare/clara/docs/federated_traditional_machine_learning_algorithms.pdf)? +* How do we convert the existing machine learning or deep learning code to federated learning code? [ML to FL examples](https://github.com/NVIDIA/NVFlare/blob/main/examples/hello-world/ml-to-fl/README.md) +* How do we use different types of federated learning workflows (e.g. Scatter and Gather, Cyclic Weight Transfer, Swarming learning, +Vertical learning) and what do we need to change? +* How can we capture the experiment log, so all sites' metrics and global metrics can be viewed in experiment tracking tools such as Weights & Biases MLfLow, or Tensorboard diff --git a/job_templates/readme.md b/job_templates/readme.md index ba4e6fc61f..0e240dd6b1 100644 --- a/job_templates/readme.md +++ b/job_templates/readme.md @@ -13,8 +13,43 @@ Each job template contains the following informations * information card: info.md for display purpose * information config: used by program -# Configuration format +## Configuration format Configurations are written in HOCON (human optimized object Notation). As a variant of JSON, .conf can also use json format. The pyhocon format allows for comments, and you can remove many of the double quotes as well as replace ":" with "=" to make the configurations look cleaner. You can find details in [pyhoconb: HOCON Parser for python](https://github.com/chimpler/pyhocon). + +## List of Job Templates + +View all the available job templates with the following command: + +```nvflare job list_templates``` + +| Example | Controller-Type | Client Category | Description | +|---------|-----------------|-----------------|-------------| +| [cyclic_cc_pt](./cyclic_cc_pt) | client | client_api | client-controlled cyclic workflow with PyTorch ClientAPI trainer | +| [cyclic_pt](./cyclic_pt) | server | client_api | server-controlled cyclic workflow with PyTorch ClientAPI trainer | +| [psi_csv](./psi_csv) | server | Executor | private-set intersection for csv data | +| [sag_cross_np](./sag_cross_np) | server | client executor | scatter & gather and cross-site validation using numpy | +| [sag_cse_pt](./sag_cse_pt) | server | client_api | scatter & gather workflow and cross-site evaluation with PyTorch | +| [sag_gnn](./sag_gnn) | server | client_api | scatter & gather workflow for gnn learning | +| [sag_nemo](./sag_nemo) | server | client_api | Scatter and Gather Workflow for NeMo | +| [sag_np](./sag_np) | server | client_api | scatter & gather workflow using numpy | +| [sag_np_cell_pipe](./sag_np_cell_pipe) | server | client_api | scatter & gather workflow using numpy | +| [sag_np_metrics](./sag_np_metrics) | server | client_api | scatter & gather workflow using numpy | +| [sag_pt](./sag_pt) | server | client_api | scatter & gather workflow using pytorch | +| [sag_pt_deploy_map](./sag_pt_deploy_map) | server | client_api | SAG workflow with pytorch, deploy_map, site-specific configs | +| [sag_pt_executor](./sag_pt_executor) | server | Executor | scatter & gather workflow and cross-site evaluation with PyTorch | +| [sag_pt_he](./sag_pt_he) | server | client_api | scatter & gather workflow using pytorch and homomorphic encryption | +| [sag_pt_mlflow](./sag_pt_mlflow) | server | client_api | scatter & gather workflow using pytorch with MLflow tracking | +| [sag_pt_model_learner](./sag_pt_model_learner) | server | ModelLearner | scatter & gather workflow and cross-site evaluation with PyTorch | +| [sag_tf](./sag_tf) | server | client_api | scatter & gather workflow using TensorFlow | +| [sklearn_kmeans](./sklearn_kmeans) | server | client_api | scikit-learn KMeans model | +| [sklearn_linear](./sklearn_linear) | server | client_api | scikit-learn linear model | +| [sklearn_svm](./sklearn_svm) | server | client_api | scikit-learn SVM model | +| [stats_df](./stats_df) | server | stats executor | FedStats: tabular data with pandas | +| [stats_image](./stats_image) | server | stats executor | FedStats: image intensity histogram | +| [swarm_cse_pt](./swarm_cse_pt) | client | client_api | Swarm Learning with Cross-Site Evaluation with PyTorch | +| [swarm_cse_pt_model_learner](./swarm_cse_pt_model_learner) | client | ModelLearner | Swarm Learning with Cross-Site Evaluation with PyTorch ModelLearner | +| [vertical_xgb](./vertical_xgb) | server | Executor | vertical federated xgboost | +| [xgboost_tree](./xgboost_tree) | server | client_api | xgboost horizontal tree-based collaboration model |