Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVP 1d threading bit-for-bit #681

Closed
apcraig opened this issue Jan 19, 2022 · 4 comments
Closed

EVP 1d threading bit-for-bit #681

apcraig opened this issue Jan 19, 2022 · 4 comments

Comments

@apcraig
Copy link
Contributor

apcraig commented Jan 19, 2022

See #680

Testing OpenMP and addition of omp_suite highlighted an issue in the OpenMP implementation of the evp1d code. The OpenMP in ice_dyn_evp_1d_kernel is not bit-for-bit with different threads. The following test groups should be bit-for-bit but they are not,

smoke gx3 4x4x5x29x40 alt04,reprosum,run10day
smoke gx3 8x2x5x29x40 alt04,reprosum,run10day
smoke gx3 24x1x5x29x40 alt04,reprosum,run10day,thread

or more clearly,

smoke gx3 4x4x5x29x40 evp1d,reprosum,run10day
smoke gx3 8x2x5x29x40 evp1d,reprosum,run10day
smoke gx3 24x1x5x29x40 evp1d,reprosum,run10day,thread

#680 comments out the OMP directives in ice_dyn_evp_1d_kernel. This changes answers for alt04/evp1d but makes the implementation bit-for-bit validated.

Finally, I can confirm that different blocks sizes with MPI only are bit-for-bit with evp1d. The following are all bit-for-bit,

smoke gx3 16x1x5x29x40 alt04,reprosum,run10day,droundrobin
smoke gx3 24x1x5x4x400 alt04,reprosum,run10day,droundrobin
smoke gx3 24x1x5x15x80 alt04,reprosum,run10day,droundrobin

so it's not an issue with the blocks and decompositions, really just the OpenMP in the ice_dyn_evp_1d_kernel.

@TillRasmussen
Copy link
Contributor

Blocks are not used within the 1d solver.
I have been able to recreate the bug. It appears already at the first iteration.

@TillRasmussen
Copy link
Contributor

The OMP differences are removed on intel compiler by adding -no-vec (no vectorization). This has to do with how the array fits into memory. This may indicate that the other omp loops do not vectorize. @srethmeier please elaborate a bit more.
The "-no-vec" flag could be removed if arrays are written so that they "fit" memory. For double arrays this would require padding to modulus of 4.

The result of the test with the -no-vec turned on is.
d9ea2f412e977d8fa0c1c0b3c871ff7f freya_intel_smoke_gx3_24x1x5x29x40_alt04_reprosum_run10day_thread.novecreal/restart/iced.2005-01-11-00000.nc
d9ea2f412e977d8fa0c1c0b3c871ff7f freya_intel_smoke_gx3_8x2x5x29x40_alt04_reprosum_run10day_thread.novecreal/restart/iced.2005-01-11-00000.nc
d9ea2f412e977d8fa0c1c0b3c871ff7f freya_intel_smoke_gx3_4x4x5x29x40_alt04_reprosum_run10day_thread.novecreal/restart/iced.2005-01-11-00000.nc

@TillRasmussen TillRasmussen mentioned this issue Feb 3, 2022
16 tasks
@TillRasmussen
Copy link
Contributor

@apcraig, @eclare108213 . I dont recall if this was enough for closing this?

@apcraig
Copy link
Contributor Author

apcraig commented Nov 16, 2023

I think this is fixed in #895. I tested the decomp suite with -s evp1d on cheyenne and it seemed to be OK.

@apcraig apcraig closed this as completed Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants