Doc: Compute Section (#230)

Demonstrate how to use the Zero-Copy data access to compute (read & write) on fields and particles. This uses the typical AMReX level - block - compute loop pattern in AMReX.
AMReX-Codes · Dec 6, 2023 · ad4ddbe · ad4ddbe
1 parent 33b272c
commit ad4ddbe
Show file tree

Hide file tree

Showing 11 changed files with 202 additions and 12 deletions.
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -12,6 +12,7 @@ recommonmark
 # Sphinx<7.2 because we are waiting for
 #   https://github.com/breathe-doc/breathe/issues/943
 sphinx>=5.3,<7.2
+sphinx-copybutton
 sphinx-design
 sphinx_rtd_theme>=1.1.1
 # reference system

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -44,6 +44,7 @@
     "sphinx.ext.autodoc",
     "sphinx.ext.mathjax",
     "sphinx.ext.viewcode",
+    "sphinx_copybutton",
     "sphinx_design",
     "sphinx_rtd_theme",
     "breathe",

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -66,6 +66,7 @@ Usage
    usage/how_to_run
    usage/api
    usage/zerocopy
+   usage/compute
    usage/workflows
 ..   usage/tests
 

diff --git a/docs/source/usage/api.rst b/docs/source/usage/api.rst
@@ -22,6 +22,8 @@ For brevity, below the 3D APIs are shown.
 pyAMReX classes and functions follow the same structure as the `C++ AMReX APIs <https://amrex-codes.github.io/amrex/doxygen/>`__.
 
 
+.. _usage-api-base:
+
 Base
 ----
 
@@ -186,6 +188,8 @@ Utility
 .. autofunction:: amrex.space3d.write_single_level_plotfile
 
 
+.. _usage-api-amrcore:
+
 AmrCore
 -------
 
@@ -198,6 +202,8 @@ AmrCore
    :undoc-members:
 
 
+.. _usage-api-particles:
+
 Particles
 ---------
 

diff --git a/docs/source/usage/compute.rst b/docs/source/usage/compute.rst
@@ -0,0 +1,102 @@
+.. _usage-compute:
+
+Compute
+=======
+
+With zero-copy read and write access to data structures, this section presents how to compute in pyAMReX.
+
+Since the pyAMReX data containers are wrappers to the C++ AMReX objects, it is worth reading:
+
+* `AMReX Basics <https://amrex-codes.github.io/amrex/docs_html/Basics_Chapter.html>`__ and
+* `AMReX parallelization strategy for MPI+X (e.g, GPUs, CPUs) <https://amrex-codes.github.io/amrex/docs_html/GPU.html>`__.
+
+As a very short, simplified overview, this narrows down to:
+
+* AMReX decomposes **index space** into **rectangular, block-structured, regular grids**,
+* block are often intentionally slightly **over-decomposed**, there is >= one block per compute unit (CPU core or GPU),
+* **particles** are chunked/tiled and usually decomposed like the field blocks,
+* **refinement levels** are represented as (potentially sparse) levels of blocks.
+
+Computations are thus performed (mostly) on whole blocks, which enables to use compute advanced acceleration techniques on CPUs or GPUs.
+
+
+.. _usage-compute-fields:
+
+Fields
+------
+
+The most common data structure to interact with is a `MultiFab <https://amrex-codes.github.io/amrex/docs_html/Basics.html#fabarray-multifab-and-imultifab>`__, which is a collection of boxes with associated field data.
+The field data can have more than one component (in the slowest varying index), but all components have the same `staggering/centering <https://amrex-codes.github.io/amrex/docs_html/Basics.html#box>`__.
+
+This is how to iterate and potentially compute for all blocks assigned to a local process in pyAMReX:
+
+.. literalinclude:: ../../../tests/test_multifab.py
+   :language: python3
+   :dedent: 4
+   :start-after: # Manual: Compute Mfab START
+   :end-before: # Manual: Compute Mfab END
+
+For a complete physics example that uses CPU/GPU agnostic Python code for computation on fields, see:
+
+* `Heat Equation example <https://github.com/AMReX-Codes/amrex-tutorials/blob/main/GuidedTutorials/HeatEquation/Source/main.py>`__
+
+For many small CPU and GPU examples on how to compute on fields, see the following test cases:
+
+* `MultiFab example <https://github.com/AMReX-Codes/amrex-tutorials/blob/main/GuidedTutorials/MultiFab/main.py>`__
+
+* .. dropdown:: Examples in ``test_multifab.py``
+
+     .. literalinclude:: ../../../tests/test_multifab.py
+        :language: python3
+        :caption: This files is in ``tests/test_multifab.py``.
+
+
+.. _usage-compute-particles:
+
+Particles
+---------
+
+AMReX `Particles <https://amrex-codes.github.io/amrex/docs_html/Particle_Chapter.html>`__ are stored in the `ParticleContainer <https://amrex-codes.github.io/amrex/docs_html/Particle.html#the-particlecontainer>`__ class.
+
+There are a few small differences to the `iteration over a ParticleContainer <https://amrex-codes.github.io/amrex/docs_html/Particle.html#iterating-over-particles>`__ compared to a ``MultiFab``:
+
+* ``ParticleContainer`` is aware of mesh-refinement levels,
+* AMReX supports a variety of data layouts for particles (AoS and SoA + runtime SoA attributes), which requires a few more calls to access.
+
+Here is the general structure for computing on particles:
+
+.. literalinclude:: ../../../tests/test_particleContainer.py
+   :language: python3
+   :dedent: 4
+   :start-after: # Manual: Compute PC START
+   :end-before: # Manual: Compute PC END
+
+For many small CPU and GPU examples on how to compute on particles, see the following test cases:
+
+* .. dropdown:: Examples in ``test_particleContainer.py``
+
+     .. literalinclude:: ../../../tests/test_particleContainer.py
+        :language: python3
+        :caption: This files is in ``tests/test_particleContainer.py``.
+
+* .. dropdown:: Examples in ``test_aos.py``
+
+     .. literalinclude:: ../../../tests/test_aos.py
+        :language: python3
+        :caption: This files is in ``tests/test_aos.py``.
+
+* .. dropdown:: Examples in ``test_soa.py``
+
+     .. literalinclude:: ../../../tests/test_soa.py
+        :language: python3
+        :caption: This files is in ``tests/test_soa.py``.
+
+
+Other C++ Calls
+---------------
+
+pyAMReX exposes many more C++-written and GPU-accelerated AMReX functions for :py:class:`~amrex.space3d.MultiFab` and :ref:`particles <usage-api-particles>` to Python, which can be used to set, reduce, rescale, redistribute, etc. contained data.
+Check out the detailed :ref:`API docs for more details <usage-api>` and use further third party libraries at will on the exposed data structures, replacing the hot loops described above.
+
+You can also leave the Python world in pyAMReX and go back to C++ whenever needed.
+For :ref:`some applications <usage_run>`, pyAMReX might act as *scriptable glue* that transports fields and particles from one (C++) function to another without recompilation, by exposing the functions and methods to Python.
diff --git a/docs/source/usage/zerocopy.rst b/docs/source/usage/zerocopy.rst
@@ -4,7 +4,7 @@ Zero-Copy
 =========
 
 pyAMReX is designed to bridge the worlds of block-structured codes and data science.
-As such, it includes zero-copy GPU data access for AI/ML, in situ analysis, application coupling by implementing :ref:`standardized data interfaces <developers-implementation`.
+As such, it includes zero-copy GPU data access for AI/ML, in situ analysis, application coupling by implementing :ref:`standardized data interfaces <developers-implementation>`.
 
 
 CPU: numpy

diff --git a/docs/spack.yaml b/docs/spack.yaml
@@ -23,5 +23,6 @@ spack:
   - py-breathe
   - py-recommonmark
   - py-pygments
+  - py-sphinx-copybutton
   - py-sphinx-design
   - py-sphinx-rtd-theme
diff --git a/src/AmrCore/AmrMesh.cpp b/src/AmrCore/AmrMesh.cpp
@@ -68,9 +68,9 @@ void init_AmrMesh(py::module &m)
              >(),
              py::arg("rb"), py::arg("max_level_in"), py::arg("n_cell_in"), py::arg("coord"), py::arg("ref_ratios"), py::arg("is_per"))
 
-        .def("Verbose", &AmrMesh::Verbose)
-        .def("max_level", &AmrMesh::maxLevel)
-        .def("finest_level", &AmrMesh::finestLevel)
+        .def_property_readonly("verbose", &AmrMesh::Verbose)
+        .def_property_readonly("max_level", &AmrMesh::maxLevel)
+        .def_property_readonly("finest_level", &AmrMesh::finestLevel)
         .def("ref_ratio", py::overload_cast< >(&AmrMesh::refRatio, py::const_))
         .def("ref_ratio", py::overload_cast< int >(&AmrMesh::refRatio, py::const_))
     ;

diff --git a/src/Base/MultiFab.cpp b/src/Base/MultiFab.cpp
@@ -113,7 +113,7 @@ void init_MultiFab(py::module &m)
         .def_property_readonly("num_comp", &FabArrayBase::nComp)
         .def_property_readonly("size", &FabArrayBase::size)
 
-        .def_property_readonly("nGrowVect", &FabArrayBase::nGrowVect)
+        .def_property_readonly("n_grow_vect", &FabArrayBase::nGrowVect)
 
         /* data access in Box index space */
         .def("__iter__",
@@ -410,8 +410,8 @@ void init_MultiFab(py::module &m)
 
         .def("box_array", &MultiFab::boxArray)
         .def("dm", &MultiFab::DistributionMap)
-        .def("n_comp", &MultiFab::nComp)
-        .def("n_grow_vect", &MultiFab::nGrowVect)
+        .def_property_readonly("n_comp", &MultiFab::nComp)
+        .def_property_readonly("n_grow_vect", &MultiFab::nGrowVect)
 
         /* masks & ownership */
         // TODO:

diff --git a/tests/test_multifab.py b/tests/test_multifab.py
@@ -8,9 +8,47 @@
 import amrex.space3d as amr
 
 
+def test_mfab_numpy(mfab):
+    """Used in docs/source/usage/compute.rst"""
+
+    # Manual: Compute Mfab START
+    # finest active MR level, get from a simulation's AmrMesh object, e.g.:
+    # finest_level = sim.finest_level
+    finest_level = 0  # no MR
+
+    # iterate over every mesh-refinement levels
+    for lev in range(finest_level + 1):
+        # get an existing MultiFab, e.g., from a simulation:
+        # mfab = sim.get_field(lev=lev)  # code-specific getter function
+
+        # grow (aka guard/ghost/halo) regions
+        ngv = mfab.n_grow_vect
+
+        # get every local block of the field
+        for mfi in mfab:
+            # global index space box, including guards
+            bx = mfi.tilebox().grow(ngv)
+            print(bx)  # note: global index space of this block
+
+            # numpy representation: non-copying view, including the
+            # guard/ghost region
+            field_np = mfab.array(mfi).to_numpy()
+
+            # notes on indexing in field_np:
+            # - numpy uses locally zero-based indexing
+            # - layout is F_CONTIGUOUS by default, just like AMReX
+
+            # notes:
+            # Only the next lines are the "HOT LOOP" of the computation.
+            # For efficiency, use numpy array operation for speed on CPUs.
+            # For GPUs use .to_cupy() above and compute with cupy or numba.
+            field_np[()] = 42.0
+    # Manual: Compute Mfab END
+
+
 def test_mfab_loop(mfab):
-    ngv = mfab.nGrowVect
-    print(f"\n  mfab={mfab}, mfab.nGrowVect={ngv}")
+    ngv = mfab.n_grow_vect
+    print(f"\n  mfab={mfab}, mfab.n_grow_vect={ngv}")
 
     for mfi in mfab:
         bx = mfi.tilebox().grow(ngv)
@@ -160,7 +198,7 @@ def test_mfab_ops_cuda_numba(mfab_device):
     # https://numba.pydata.org/numba-doc/dev/cuda/cuda_array_interface.html
     from numba import cuda
 
-    ngv = mfab_device.nGrowVect
+    ngv = mfab_device.n_grow_vect
 
     # assign 3: define kernel
     @cuda.jit
@@ -197,8 +235,8 @@ def test_mfab_ops_cuda_cupy(mfab_device):
     import cupyx.profiler
 
     # AMReX -> cupy
-    ngv = mfab_device.nGrowVect
-    print(f"\n  mfab_device={mfab_device}, mfab_device.nGrowVect={ngv}")
+    ngv = mfab_device.n_grow_vect
+    print(f"\n  mfab_device={mfab_device}, mfab_device.n_grow_vect={ngv}")
 
     # assign 3
     with cupyx.profiler.time_range("assign 3 [()]", color_id=0):

diff --git a/tests/test_particleContainer.py b/tests/test_particleContainer.py
@@ -276,6 +276,46 @@ def test_per_cell(empty_particle_container, std_geometry, std_particle):
     assert ncells * std_particle.real_array_data[1] == sum_1
 
 
+def test_pc_numpy(particle_container, Npart):
+    """Used in docs/source/usage/compute.rst"""
+    pc = particle_container
+
+    # Manual: Compute PC START
+    # code-specific getter function, e.g.:
+    # pc = sim.get_particles()
+
+    # iterate over every mesh-refinement levels (no MR: lev=0)
+    for lvl in range(pc.finest_level + 1):
+        # get every local chunk of particles
+        for pti in pc.iterator(pc, level=lvl):
+            # default layout: AoS with positions and cpuid
+            # note: not part of the new PureSoA particle container layout
+            aos = pti.aos().to_numpy()
+
+            # additional compile-time and runtime attributes in SoA format
+            soa = pti.soa().to_numpy()
+
+            # notes:
+            # Only the next lines are the "HOT LOOP" of the computation.
+            # For efficiency, use numpy array operation for speed on CPUs.
+            # For GPUs use .to_cupy() above and compute with cupy or numba.
+
+            # print all particle ids in the chunk
+            print(aos[()]["cpuid"])
+
+            # write to all particles in the chunk
+            aos[()]["x"] = 0.30
+            aos[()]["y"] = 0.35
+            aos[()]["z"] = 0.40
+
+            for soa_real in soa.real:
+                soa_real[()] = 42.0
+
+            for soa_int in soa.int:
+                soa_int[()] = 12
+    # Manual: Compute PC END
+
+
 @pytest.mark.skipif(
     importlib.util.find_spec("pandas") is None, reason="pandas is not available"
 )