Refactor exception mechanism in P3M/DP3M tuning functions #3869

jngrad · 2020-08-19T20:02:54Z

Description of changes:

core: correctly propagate errors codes in the P3M/DP3M tuning functions to avoid infinite loops
core: rely on runtimeErrorMsg() instead of printing error messages to stderr or char arrays
python: catch errors from the P3M/DP3M tuning functions in the tune() method of the relevant python Actors
python: fix the broken 'metallic' case of the P3M epsilon parameter
testsuite: add tests for the 'metallic' case and for the new exception mechanism
python: fix the non-metallic epsilon case (epsilon was always set to 0 for P3M and P3MGPU in the core since 4.0.0)
documentation: fix broken code examples in the user guide
documentation: mention P3M doesn't support non-cubic boxes for non-metallic epsilon

Regression introduced in a6794cc. The error code is now handled.

The old exception mechanism simply printed the error message to a log string, without halting the flow of the program. This either left the system in an undefined state, or caused other errors to be thrown at the end of the tuning functions, making it difficult to trace the true origin of the issue.

Provide working code samples, add missing list of P3M papers, cleanup outdated documentation.

jngrad · 2020-08-19T20:48:31Z

@RudolfWeeber could you please have a look at the DP3M bisection code? I don't understand why it fails for specific particle configurations. Here is a MWE:

import espressomd
import espressomd.magnetostatics
system = espressomd.System(box_l=3*[1])
system.time_step = 0.01
system.part.add(pos=[[0, 0, 0], [0.5, 0.5, 0.5]], rotation=2 * [(1, 1, 1)], dip=2 * [(1, 0, 0)])
solver = espressomd.magnetostatics.DipolarP3M(prefactor=1, accuracy=1e-2)
system.actors.add(solver)  # infinite loop

The infinite loop was solved in this PR by correctly propagating the error code. But where is the mistake in my MWE? If I set the box dimensions to 3*[10] it doesn't fail. If this is not due to precision error, then we could write a test for it, because the propagation logic for the return value of the bisection function is quite fragile.

More generally though, the error code propagation logic in P3M/DP3M badly needs refactoring. We're forwarding integer error codes as floating-point return values. The pointer output arguments are just asking for trouble, I originally ended up with uninitialized values due to early exits from the new exception mechanism and was saved by Clang-Tidy.

jngrad · 2020-08-19T21:22:57Z

Furthermore, there is this suspicious piece of code in the tuning function that has no coverage:

espresso/src/core/electrostatics_magnetostatics/p3m.cpp

Lines 965 to 972 in 5ba0735

    
           if (p3m.params.epsilon != P3M_EPSILON_METALLIC) { 
        
             if (!((box_geo.length()[0] == box_geo.length()[1]) && 
        
                   (box_geo.length()[1] == box_geo.length()[2]))) { 
        
               *log = strcat_alloc( 
        
                   *log, "{049 P3M_init: Nonmetallic epsilon requires cubic box} "); 
        
               return ES_ERROR; 
        
             } 
        
           }

The body of the first conditional is never visited. Several issue here:

There is only a single python integration test where P3M is used for the non-metallic case (p3m_gpu.py), but it doesn't actually run the integrator, so the only way to know if P3M works for any non-zero epsilon is to rely on the samples and tutorials, which do not contribute to coverage.
This is probably never visited because the tuning function is called before the actor parameters are sent to the core, in which case the P3M struct is default constructed with epsilon = P3M_EPSILON_METALLIC, but I could be wrong. Removing the P3M actor and adding a new one seems to reset the core epsilon value back to P3M_EPSILON_METALLIC for the second tuning.
P3M can be used for non-cubic boxes, regardless of the value of epsilon, so the error message is misleading.

fweik · 2020-08-20T07:28:35Z

P3M can be used for non-cubic boxes, regardless of the value of epsilon, so the error message is misleading.

Are you sure? I think the dipole correction is not implemented for non-cubic boxes.

jngrad · 2020-08-20T10:22:01Z

P3M can be used for non-cubic boxes, regardless of the value of epsilon, so the error message is misleading.

Are you sure? I think the dipole correction is not implemented for non-cubic boxes.

Is it only a restriction on the tuning function? The electrostatics docs don't mention such a restriction. Also, p3m.params.epsilon seems to be zero in p3m_sanity_checks_system even after tuning, when epsilon should be set in the core.

fweik · 2020-08-20T10:31:26Z

If I remember correctly, for the non-cubic case the calculation of the dipole correction is way more complicated, potentially not even available in closed form. If think this is discussed in https://link.springer.com/article/10.1023/A:1018775311536.

fweik · 2020-08-20T10:33:21Z

So, one should not be allowed to set epsilon to non-metallic if the box is not cubic. Metallic boundaries are used in 99% of the cases anyway, I'm not even sure that epsilon should be settable, this is more of an expert feature in my opinion.

jngrad · 2020-08-20T10:40:52Z

Then why don't we throw an error? The rectangular case runs just fine:

import espressomd.electrostatics
system = espressomd.System(box_l=[10, 10, 20])
system.time_step = 0.01
system.cell_system.skin = 0.4
system.part.add(pos=[[0, 0, 0], [.5, .5, 1]], q=[-1, 1])
system.non_bonded_inter[0, 0].wca.set_params(epsilon=1.0, sigma=0.1)
solver = espressomd.electrostatics.P3M(prefactor=2, accuracy=1e-2, epsilon=78.)
system.actors.add(solver)
system.integrator.run(100)

jngrad · 2020-08-20T11:17:18Z

Now that's just brilliant: we overwrite self._params in the P3M actor with default-initialized values from the core before the call to self._set_params_in_es_core() (electrostatics.pyx#L318-L335). All simulations with non-zero epsilon since 4.0.0 used epsilon=0 internally. We never caught it in CI because the only test for the non-metallic case bypasses the tuning function:

espresso/testsuite/python/p3m_gpu.py

Line 41 in 5ba0735

test_params["tune"] = False

The epsilon value was set to zero during tuning before the call to `self._set_params_in_es_core()`. All simulations with a non-zero epsilon were internally using epsilon = 0.0 since ESPResSo 4.0.0. This only affects the P3M and P3MGPU actors, the DipolarP3M and MMM1D actors are not affected.

Range checking was incorrectly rejecting integer values for epsilon.

P3M doesn't support the non-metallic case with non-cubic boxes.

fweik · 2020-08-20T13:20:01Z

Yeah, the whole cython coulomb complex is quite ... special

fweik · 2020-08-21T07:07:35Z

The tests look good. For the rest, I understand that you don't want to put work into this, but in my opinion a minimal solution would be to switch to an error reporting mechanism were non-results can not be ignored, e.g. optional return values, instead of meddling with the signaling return values. This way it can be made sure that errors are not ignored in the future. (E.g. use boost.outcome, if this is available with our version)

jngrad · 2020-08-21T08:19:33Z

The issue is that we're working for 4.1, where we still have an Ubuntu 16.04 image with boost 1.58. boost::outcome needs 1.70.

Could we use exceptions to store the error code and catch it in the core tuning function that is called from python?

RudolfWeeber · 2020-08-21T09:44:50Z

I don’t think, tuning fixes should go into 4.1.4, maybe with the exception of the epsilon overwriting thing. Particularly, if including it in 4.1 makes the fix difficult.

fweik · 2020-08-21T10:23:18Z

Sure, you can also just use exceptions. It is probably better to not needlessly introduce new things (although I think outcome or similar is actually a interesting contribution to the error reporting design space). Please also note that I did not say you should change something, I just pointed out that your fix does not exclude the same type of problem in the future, and that there are solutions that do that.

jngrad · 2020-08-21T10:36:07Z

@fweik Yes, my fix is brittle and I'd prefer to see a proper exception mechanism. boost::outcome looks promising, although we'll have to wait for a boost bump before we can include it. Using exceptions shouldn't be too difficult, but I cannot invest more time into P3M at this moment.

@RudolfWeeber if we don't include the tuning fixes in 4.1, users will still have infinite loops (upon DP3M bisection failure and any P3M failure) and useless error messages (upon any failure, e.g. system with no particles).

…#3869) Fixes espressomd#3868 Description of changes: - core: correctly propagate errors codes in the P3M/DP3M tuning functions to avoid infinite loops - core: rely on `runtimeErrorMsg()` instead of printing error messages to stderr or char arrays - python: catch errors from the P3M/DP3M tuning functions in the `tune()` method of the relevant python Actors - python: fix the broken `'metallic'` case of the P3M `epsilon` parameter - testsuite: add tests for the `'metallic'` case and for the new exception mechanism - python: fix the non-metallic epsilon case (epsilon was always set to 0 for `P3M ` and `P3MGPU` in the core since 4.0.0) - documentation: fix broken code examples in the user guide - documentation: mention P3M doesn't support non-cubic boxes for non-metallic epsilon

…#3869) Fixes espressomd#3868 Description of changes: - core: correctly propagate an error code in the P3M tuning function to avoid an infinite loop - python: fix the broken `'metallic'` case of the P3M `epsilon` parameter - python: fix the non-metallic epsilon case (epsilon was always set to 0 for `P3M ` and `P3MGPU` in the core since 4.0.0) - documentation: fix broken code examples in the user guide - documentation: mention P3M doesn't support non-cubic boxes for non-metallic epsilon

jngrad added 2 commits August 19, 2020 17:28

docs: Fix incorrect code samples

0f6b24f

core: Fix infinite loop in electrostatics tuning

a5cfdcf

Regression introduced in a6794cc. The error code is now handled.

jngrad added Testcase Documentation Python Core BugFix labels Aug 19, 2020

jngrad added this to the Espresso 4.1.4 milestone Aug 19, 2020

jngrad requested a review from RudolfWeeber August 19, 2020 20:02

jngrad marked this pull request as draft August 19, 2020 20:21

jngrad added 4 commits August 19, 2020 22:36

tests: Check exception mechanism in (D)P3M tuning

9f059f9

python: Fix broken 'metallic' case for P3M methods

001c3c7

docs: Update electrostatic/magnetostatics docs

4d1ad9a

Provide working code samples, add missing list of P3M papers, cleanup outdated documentation.

jngrad force-pushed the fix-3868 branch from 64052da to 4d1ad9a Compare August 19, 2020 20:42

jngrad marked this pull request as ready for review August 19, 2020 20:53

jngrad added 4 commits August 20, 2020 14:19

python: Fix epsilon range check

3a742ba

Range checking was incorrectly rejecting integer values for epsilon.

tests: Simplify (D)P3M tuning test class

698754f

docs: Clarify limitation of the P3M algorithm

28d6b99

P3M doesn't support the non-metallic case with non-cubic boxes.

tests: Check non-cubic boxes in (D)P3M tuning

c01e988

tests: Check P3M energies for non-metallic epsilon

44add41

Merge branch 'python' into fix-3868

22e3bdf

RudolfWeeber approved these changes Aug 23, 2020

View reviewed changes

jngrad added the automerge Merge with kodiak label Aug 23, 2020

Merge branch 'python' into fix-3868

2cd9cbb

jngrad added automerge Merge with kodiak and removed automerge Merge with kodiak labels Aug 24, 2020

kodiakhq bot merged commit 586ccae into espressomd:python Aug 24, 2020

jngrad mentioned this pull request Sep 27, 2020

Bisection bugs in dipolar P3M #3915

Open

jngrad deleted the fix-3868 branch January 18, 2022 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor exception mechanism in P3M/DP3M tuning functions #3869

Refactor exception mechanism in P3M/DP3M tuning functions #3869

jngrad commented Aug 19, 2020 •

edited

Loading

jngrad commented Aug 19, 2020

jngrad commented Aug 19, 2020

fweik commented Aug 20, 2020

jngrad commented Aug 20, 2020

fweik commented Aug 20, 2020

fweik commented Aug 20, 2020

jngrad commented Aug 20, 2020

jngrad commented Aug 20, 2020

fweik commented Aug 20, 2020

fweik commented Aug 21, 2020 •

edited

Loading

jngrad commented Aug 21, 2020

RudolfWeeber commented Aug 21, 2020 via email

fweik commented Aug 21, 2020 •

edited

Loading

jngrad commented Aug 21, 2020

Refactor exception mechanism in P3M/DP3M tuning functions #3869

Refactor exception mechanism in P3M/DP3M tuning functions #3869

Conversation

jngrad commented Aug 19, 2020 • edited Loading

jngrad commented Aug 19, 2020

jngrad commented Aug 19, 2020

fweik commented Aug 20, 2020

jngrad commented Aug 20, 2020

fweik commented Aug 20, 2020

fweik commented Aug 20, 2020

jngrad commented Aug 20, 2020

jngrad commented Aug 20, 2020

fweik commented Aug 20, 2020

fweik commented Aug 21, 2020 • edited Loading

jngrad commented Aug 21, 2020

RudolfWeeber commented Aug 21, 2020 via email

fweik commented Aug 21, 2020 • edited Loading

jngrad commented Aug 21, 2020

jngrad commented Aug 19, 2020 •

edited

Loading

fweik commented Aug 21, 2020 •

edited

Loading

fweik commented Aug 21, 2020 •

edited

Loading