Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Particle Container to Pure SoA #348

Merged
merged 1 commit into from
Feb 9, 2024

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Apr 19, 2023

Transition particle containers to pure SoA layouts.

Benchmarks on NVIDIA A100-SXM 80GB GPU, SP

./impactx ../../examples/fodo/input_fodo.in beam.npart=600000000 diag.enable=false diag.slice_step_diagnostics=false lattice.elements=drift1 quad1 drift2 quad2 drift3

soa_gpu_sp.txt, dev_gpu_sp.txt

  • FODO runtime: 2.52x faster 🚀 ✨
  • impactx::Push::Drift: 2.33x faster
  • impactx::Push::Quad: 2.85x faster
  • impactX::collect_lost_particles: 2.77x faster
  • ImpactX::add_particles: 5.4% faster

Benchmarks on NVIDIA A100-SXM 80GB GPU, DP

./impactx ../../examples/fodo/input_fodo.in beam.npart=600000000 diag.enable=false diag.slice_step_diagnostics=false lattice.elements=drift1 quad1 drift2 quad2 drift3

soa_gpu_dp.txt, dev_gpu_dp.txt

  • FODO runtime: 1.73x faster 🚀 ✨
  • impactx::Push::Drift: 1.85x faster
  • impactx::Push::Quad: 1.42x faster
  • impactX::collect_lost_particles: 3.68x faster
  • ImpactX::add_particles: 6.6% faster

Benchmarks on CPU, SP

Laptop: 12th Gen Intel(R) Core(TM) i9-12900H, performance core w/ performance power mode on

export OMP_NUM_THREADS=1
./impactx ../../examples/fodo/input_fodo.in beam.npart=1000000 diag.enable=false diag.slice_step_diagnostics=false lattice.elements=drift1 quad1 drift2 quad2 drift3 &
taskset -cp 6 $!

soa_cpu_sp.txt, dev_cpu_sp.txt

  • FODO runtime: 1.28x faster 🚀 ✨
  • impactx::Push::Drift: 2.17x faster
  • impactx::Push::Quad: 4.8% faster
  • impactX::collect_lost_particles: 2.48x faster
  • ImpactX::add_particles: 1.7% faster

Benchmarks on CPU, DP

Laptop: 12th Gen Intel(R) Core(TM) i9-12900H, performance core w/ performance power mode on

export OMP_NUM_THREADS=1
./impactx ../../examples/fodo/input_fodo.in beam.npart=1000000 diag.enable=false diag.slice_step_diagnostics=false lattice.elements=drift1 quad1 drift2 quad2 drift3 &
taskset -cp 6 $!

soa_cpu_dp.txt, dev_cpu_dp.txt

  • FODO runtime: 1.21x faster 🚀 ✨
  • impactx::Push::Drift: 1.76x faster
  • impactx::Push::Quad: same perf.
  • impactX::collect_lost_particles: 3.24x faster
  • ImpactX::add_particles: 2.2% faster

@ax3l ax3l force-pushed the topic-particle-soa branch 2 times, most recently from 38c7f86 to c4268f4 Compare April 19, 2023 08:20
@ax3l ax3l force-pushed the topic-particle-soa branch 8 times, most recently from cf12d2c to 83f96ff Compare April 28, 2023 01:43
src/particles/elements/Multipole.H Fixed Show fixed Hide fixed
pxout = px + dpx;

p.pos(RealAoS::y) = y;
// yout = y;

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
src/particles/elements/NonlinearLens.H Fixed Show fixed Hide fixed
pxout = px + dpx;

p.pos(RealAoS::y) = y;
// yout = y;

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
src/particles/elements/ShortRF.H Fixed Show fixed Hide fixed
src/particles/elements/ShortRF.H Fixed Show fixed Hide fixed
@ax3l ax3l force-pushed the topic-particle-soa branch 4 times, most recently from a469076 to ba1d781 Compare April 28, 2023 06:46
ax3l pushed a commit to AMReX-Codes/amrex that referenced this pull request May 4, 2023
…oA (#3296)

The implementation of `ParticleTile::empty()` was wrong for pure SoA,
which led to problems in ImpactX:
ECP-WarpX/impactx#348

The proposed changes:
- [x] fix a bug or incorrect behavior in AMReX
- [x] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate
@ax3l ax3l force-pushed the topic-particle-soa branch 2 times, most recently from 99fb419 to 03d6b6c Compare May 4, 2023 22:59
cmake/dependencies/ABLASTR.cmake Outdated Show resolved Hide resolved
cmake/dependencies/ABLASTR.cmake Outdated Show resolved Hide resolved
@ax3l
Copy link
Member Author

ax3l commented Jan 28, 2024

@cemitch99 when you have time next week, can you please review the chances I did to individual elements?

@ax3l

This comment was marked as resolved.

@ax3l ax3l force-pushed the topic-particle-soa branch 3 times, most recently from fe812bd to 7b85a26 Compare January 28, 2024 20:27
@ax3l ax3l mentioned this pull request Jan 28, 2024
@cemitch99
Copy link
Member

This pattern of changes to individual elements looks correct, and I like the improved symmetry between the treatment of position and momentum variables. The main risk introduced is possible mixing of the initial values and final values, so I will need to check every element line-by-line before review.

@ax3l ax3l changed the title [WIP] Update Particle Container to Pure SoA Update Particle Container to Pure SoA Jan 29, 2024
@ax3l

This comment was marked as resolved.

@ax3l ax3l force-pushed the topic-particle-soa branch 3 times, most recently from d158624 to 0e8b676 Compare January 31, 2024 20:09
@ax3l

This comment was marked as resolved.

@ax3l ax3l force-pushed the topic-particle-soa branch 6 times, most recently from 6968ba3 to e954247 Compare February 8, 2024 02:07
Transition particle containers to pure SoA layouts.
@ax3l ax3l merged commit 9876a9e into ECP-WarpX:development Feb 9, 2024
15 checks passed
@ax3l ax3l deleted the topic-particle-soa branch February 9, 2024 20:16
@ax3l
Copy link
Member Author

ax3l commented Feb 9, 2024

Congrats to all who helped @Thierry992, @atmyers, @AlexanderSinn et al. 🎉 👏

@ax3l
Copy link
Member Author

ax3l commented Feb 16, 2024

I repeated the benchmarks on GPU for scrutiny on Perlmutter A100 SMX 80 GB GPUs using the new AMReX AMReX-Codes/amrex#3763 tiny_profiler.device_synchronize_around_region = 1 syncs.

There is now a bit more overhead added due to the excessive syncs, but individual elements should be more precisely measured.

Benchmarks on NVIDIA A100-SXM 80GB GPU, SP

./impactx ../../examples/fodo/input_fodo.in beam.npart=600000000 diag.enable=false diag.slice_step_diagnostics=false tiny_profiler.device_synchronize_around_region=1 lattice.elements=drift1 quad1 drift2 quad2 drift3

soa_gpu_sp.txt, old_gpu_sp.txt

  • FODO runtime: 2.34x faster 🚀 ✨
  • impactx::Push::Drift: 2.32x faster
  • impactx::Push::Quad: 2.32x faster
  • impactX::collect_lost_particles: 2.78x faster
  • ImpactX::add_particles: 5.2% faster

Benchmarks on NVIDIA A100-SXM 80GB GPU, DP

./impactx ../../examples/fodo/input_fodo.in beam.npart=600000000 diag.enable=false diag.slice_step_diagnostics=false tiny_profiler.device_synchronize_around_region=1 lattice.elements=drift1 quad1 drift2 quad2 drift3

soa_gpu_dp.txt, old_gpu_dp.txt

  • FODO runtime: 1.65x faster 🚀 ✨
  • impactx::Push::Drift: 1.80x faster
  • impactx::Push::Quad: 1.32x faster
  • impactX::collect_lost_particles: 3.65x faster
  • ImpactX::add_particles: 1.6% faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants