Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Beam and Plasma to Pure SoA #928

Conversation

AlexanderSinn
Copy link
Member

@AlexanderSinn AlexanderSinn commented May 2, 2023

It finally works now.

Performance on A100:

amr.n_cell = 2047 2047 2000
beam.num_particles = 100000000
elec.ppc = 4 4
ions.ppc = 4 4

New:

TinyProfiler total time across processes [min...avg...max]: 200.5 ... 200.5 ... 200.5

--------------------------------------------------------------------------------------------------
Name                                               NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
--------------------------------------------------------------------------------------------------
ExplicitDeposition()                                 4000      58.13      58.13      58.13  28.99% <---
AdvancePlasmaParticles()                             4000      57.11      57.11      57.11  28.48% <---
DepositCurrent_PlasmaParticleContainer()             4000      54.09      54.09      54.09  26.98% <---
hpmg::MultiGrid::solve1()                            2000      19.15      19.15      19.15   9.55%
AnyDST::Execute()                                   12000      5.006      5.006      5.006   2.50%
sortBeamParticlesByBox()                                2      2.179      2.179      2.179   1.09%
FFTPoissonSolverDirichlet::SolvePoissonEquation()    6000     0.8317     0.8317     0.8317   0.41%
Hipace::SolveOneSlice()                              2000     0.6687     0.6687     0.6687   0.33%
Fields::ShiftSlices()                                2000     0.5452     0.5452     0.5452   0.27%
AdvanceBeamParticlesSlice()                          2000     0.5168     0.5168     0.5168   0.26% <---
Hipace::InitializeSxSyWithBeam()                     2000     0.3997     0.3997     0.3997   0.20%
Fields::LinCombination()                             4000     0.3667     0.3667     0.3667   0.18%
PlasmaParticleContainer::InitParticles                  2       0.27       0.27       0.27   0.13%
DepositCurrentSlice_BeamParticleContainer()          4000     0.2205     0.2205     0.2205   0.11% <---
Fields::SolveExmByAndEypBx()                         2000     0.2008     0.2008     0.2008   0.10%
BeamParticleContainer::InitParticles()                  1     0.1952     0.1952     0.1952   0.10%
Fields::Multiply()                                   2000     0.1198     0.1198     0.1198   0.06%

Old:

TinyProfiler total time across processes [min...avg...max]: 202.9 ... 202.9 ... 202.9

--------------------------------------------------------------------------------------------------
Name                                               NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
--------------------------------------------------------------------------------------------------
AdvancePlasmaParticles()                             4000      59.46      59.46      59.46  29.31% <---
ExplicitDeposition()                                 4000      58.33      58.33      58.33  28.75% <---
DepositCurrent_PlasmaParticleContainer()             4000      54.66      54.66      54.66  26.94% <---
hpmg::MultiGrid::solve1()                            2000      19.19      19.19      19.19   9.46%
AnyDST::Execute()                                   12000      5.032      5.032      5.032   2.48%
sortBeamParticlesByBox()                                2       1.65       1.65       1.65   0.81%
FFTPoissonSolverDirichlet::SolvePoissonEquation()    6000      0.832      0.832      0.832   0.41%
Hipace::SolveOneSlice()                              2000     0.6697     0.6697     0.6697   0.33%
Fields::ShiftSlices()                                2000     0.5455     0.5455     0.5455   0.27%
Hipace::InitializeSxSyWithBeam()                     2000     0.3998     0.3998     0.3998   0.20%
Fields::LinCombination()                             4000     0.3665     0.3665     0.3665   0.18%
AdvanceBeamParticlesSlice()                          2000      0.297      0.297      0.297   0.15% <---
PlasmaParticleContainer::InitParticles                  2     0.2808     0.2808     0.2808   0.14%
Fields::SolveExmByAndEypBx()                         2000      0.203      0.203      0.203   0.10%
BeamParticleContainer::InitParticles()                  1     0.1796     0.1796     0.1796   0.09%
DepositCurrentSlice_BeamParticleContainer()          4000     0.1766     0.1766     0.1766   0.09% <---

Note: The beam uses a permutation array so that’s why it is slower.

  • Small enough (< few 100s of lines), otherwise it should probably be split into smaller PRs
  • Tested (describe the tests in the PR description)
  • Runs on GPU (basic: the code compiles and run well with the new module)
  • Contains an automated test (checksum and/or comparison with theory)
  • Documented: all elements (classes and their members, functions, namespaces, etc.) are documented
  • Constified (All that can be const is const)
  • Code is clean (no unwanted comments, )
  • Style and code conventions are respected at the bottom of https://github.com/Hi-PACE/hipace
  • Proper label and GitHub project, if applicable

@AlexanderSinn AlexanderSinn added component: plasma About the plasma species component: beam About the beam species labels May 2, 2023
@ax3l ax3l requested review from ax3l and atmyers May 2, 2023 16:51
@MaxThevenet MaxThevenet changed the title [WIP] Upadate Beam and Plasma to Pure SoA [WIP] Update Beam and Plasma to Pure SoA May 3, 2023
@AlexanderSinn AlexanderSinn changed the title [WIP] Update Beam and Plasma to Pure SoA Update Beam and Plasma to Pure SoA May 25, 2023
Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, thank you Alex!

src/particles/beam/BeamParticleContainer.H Outdated Show resolved Hide resolved
src/particles/beam/BeamParticleContainerInit.cpp Outdated Show resolved Hide resolved
Copy link
Member

@MaxThevenet MaxThevenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for this PR!! See small comments and questions below. Let's merge tomorrow.

src/particles/plasma/PlasmaParticleContainer.cpp Outdated Show resolved Hide resolved
src/particles/plasma/PlasmaParticleContainer.cpp Outdated Show resolved Hide resolved
src/particles/plasma/PlasmaParticleContainerInit.cpp Outdated Show resolved Hide resolved
src/particles/pusher/GetAndSetPosition.H Outdated Show resolved Hide resolved
src/particles/pusher/GetAndSetPosition.H Outdated Show resolved Hide resolved
@ax3l
Copy link
Member

ax3l commented May 31, 2023

@AlexanderSinn :
@atmyers and I were wondering if the new split particle ID cause issues for you, running over the generation of 2billion particles on a single rank?

@AlexanderSinn
Copy link
Member Author

For the plasma we have all the particles in the same ParticleTile so we are more limited by the index type (int/unsigned int). I already ran a simulation with almost 2^31 plasma particles in two containers (500 GB total) on CPU before PureSoA, much more wouldn’t work but also would be impractically slow. The Plasma id is only used to distinguish between valid and invalid particles, so it could be made simpler. Related: #963

For the beam however the id is used for particle tracking diagnostics and a lot of beam particles are cheap performance wise. Currently we initialize everything on one rank and one ParticleTile so we are still index limited and memory limited when first sorting/reordering per box, but this might get fixed and then we will be limited by id(). We already use 2^30 beam particles sometimes.

@MaxThevenet
Copy link
Member

If this becomes a problem, would it be possible to change the AMReX behavior to use more bits for id and less for cpu, maybe as a compile-time option? For HiPACE++, we don't do much with cpu, and just a few bits would be sufficient. (we don't need it yet, this is just for information.)

Another question: IIRC there's a safeguard in AMReX that would abort if we exceed the range of possible IDs, right? I think I saw this in NextID. Otherwise we should put one in HiPACE++.

@MaxThevenet MaxThevenet self-requested a review June 1, 2023 05:04
Copy link
Member

@MaxThevenet MaxThevenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for this PR!

@MaxThevenet MaxThevenet merged commit 5dd5d23 into Hi-PACE:development Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: beam About the beam species component: plasma About the plasma species
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants