Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drastic increased startup lag when increasing polydeg #516

Closed
gregorgassner opened this issue Apr 7, 2021 · 6 comments
Closed

Drastic increased startup lag when increasing polydeg #516

gregorgassner opened this issue Apr 7, 2021 · 6 comments
Assignees
Labels

Comments

@gregorgassner
Copy link
Contributor

I am observing that the startup time (time for setting up ODE and time for setting up solve) scales very strongly with polydeg. Instead of a second for the KHI when using N=3, it takes a couple of seconds for N=7 and a couple of minutes for N=15 (while keeping the total number of DOF equal, i.e. reducing the grid level).

Any idea what is going on?

@gregorgassner gregorgassner added enhancement New feature or request discussion performance We are greedy labels Apr 7, 2021
@gregorgassner
Copy link
Contributor Author

gregorgassner commented Apr 7, 2021

Ok, it also seems to be connected to the ODE solver used. The time for ODE goes down again, when using SSPRK33() with a timestep callback.

But the major part still is the call of solve(), here it seems to not scale well with polydeg and it takes couple of minutes before the first output on screen is shown.

@gregorgassner
Copy link
Contributor Author

I think the same goes for the analysis/IO...it seems to take substantially more time with N=15, before the first error norm is plottet on screen.

@ranocha
Copy link
Member

ranocha commented Apr 7, 2021

I expect this latency overhead to be caused by our extensive use of StaticArrays, which are backed by NTuples which in turn are often difficult for the compiler to optimize (which should eventually be fixed, cf. JuliaLang/julia#31681). Citing the README.md of StaticArrays.jl:

Note that in the current implementation, working with large StaticArrays puts a lot of stress on the compiler, and becomes slower than Base.Array as the size increases. A very rough rule of thumb is that you should consider using a normal Array for arrays larger than 100 elements.

We use StaticArrays to pass more information at the type level to the compiler (the number of nodes per coordinate direction) to allow more aggressive optimizations and increased runtime performance. There are basically different ways how to attack this latency problem.

  • Try switching to Array for larger polynomial degrees and balance latency & runtime performance
  • Explore other design choices to pass more information static size information at the type level, e.g. via HybridArrays.jl or StrideArrays.jl

The latter is already on my TODO list (and I'm working on debugging some performance issues). However, I would like to get some canonical performance benchmarks at first before really going into this kind of optimizations to reduce the amount of manual testing, cf. #503.

@ranocha ranocha added latency and removed discussion enhancement New feature or request labels Apr 8, 2021
@ranocha ranocha self-assigned this Apr 8, 2021
@ranocha
Copy link
Member

ranocha commented Apr 8, 2021

I will have a look at this but I strongly suggest to setup some canonical performance (runtime and latency) first to make this task easier, cf. #503

@ranocha
Copy link
Member

ranocha commented Apr 13, 2021

Having merged #535, we get the following results for

julia --check-bounds=no --threads=1 -e 'using Trixi; @time trixi_include("examples/2d/elixir_advection_extended.jl", tspan=(0.0, 1.0e-10), polydeg=POLYDEG)' | egrep "seconds"
  • Trixi v0.3.25, polydeg=3: 27.044866 seconds
  • Trixi v0.3.25, polydeg=7: 43.895013 seconds
  • Trixi v0.3.25, polydeg=15: 509.532290 seconds
  • Trixi v0.3.26, polydeg=3: 26.864300 seconds
  • Trixi v0.3.26, polydeg=7: 26.944387 seconds
  • Trixi v0.3.26, polydeg=15: 27.720527 seconds

Could you please check whether things work better for you now, @gregorgassner? If not, please re-open this issue and post some details.

@ranocha ranocha closed this as completed Apr 13, 2021
@sloede
Copy link
Member

sloede commented Apr 13, 2021

Fantastic work, @ranocha, given the surgical precision of the changes to the general code (virtually none)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants