Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend CICE technical testing #247

Open
apcraig opened this issue Nov 20, 2018 · 6 comments
Open

extend CICE technical testing #247

apcraig opened this issue Nov 20, 2018 · 6 comments

Comments

@apcraig
Copy link
Contributor

apcraig commented Nov 20, 2018

There are a few things I'd like to do,

  • qc test different compilers/machines against each other
  • add signalling nans to debug compiler options
    • cray, "-ei"
    • intel, "-init=snan,arrays" (may need intel 18+)
  • set unused tracer indices to -1 and code around their use if needed
@apcraig
Copy link
Contributor Author

apcraig commented Nov 20, 2018

tests on conrad_cray and conrad_intel with above signaling nan flags on Nov 20, 2018 revealed potential problems with the following tests

FAIL conrad_intel_restart_gx3_6x2_alt01_debug_short run
FAIL conrad_intel_restart_gx3_8x2_alt02_debug_short run
FAIL conrad_intel_restart_gbox128_4x2_boxdyn_debug run
FAIL conrad_intel_smoke_gbox128_2x2_boxadv_debug_short run -1 -1 -1
FAIL conrad_intel_smoke_gbox128_4x4_boxrestore_debug run -1 -1 -1
FAIL conrad_intel_smoke_gx3_8x2_bgcz_debug run -1 -1 -1
FAIL conrad_intel_smoke_gx3_8x1_bgcskl_debug run -1 -1 -1
FAIL conrad_cray_restart_gx3_8x2_alt02_debug_short run

For now, I just note it and will turn off the signaling nans again.

@eclare108213
Copy link
Contributor

What do the signalling-nan errors mean?

We need to make a list of other tests that we know are missing, such as revp, maybe not in this issue.

@phil-blain
Copy link
Member

The flags means that all arrays and scalars are initialized with signaling nans. Signaling nans make the code abort if any computations uses them. It is a way to check that the code does not use uninitialized variables

@phil-blain
Copy link
Member

Revp is included in the base_suite through test using the options "alt02" and "boxrestore"

@phil-blain
Copy link
Member

Two thoughts :

  1. I think that we should consider trying to run CICE in Valgrind to check memory usage (using memcheck)

  2. On intel, compiling with -check arg_temp_created gives a lot of output. Has this been discussed before ?

@apcraig
Copy link
Contributor Author

apcraig commented Mar 27, 2020

I have run a number of qc tests against each other with CICE #5f97e45e2362518d which is CICE6.1.1. All qc comparisons pass. These include

conrad_intel
conrad_pgi
conrad_gnu
conrad_cray
onyx_intel
cheyenne_intel

all with 64x1 pe counts. I also tried a 16x4 case, but it was running too slow, so I abandoned that test. I think for the time being, we can consider this task as being checked and OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants