Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor exception mechanism in P3M/DP3M tuning functions #3869

Merged
merged 14 commits into from
Aug 24, 2020

Conversation

jngrad
Copy link
Member

@jngrad jngrad commented Aug 19, 2020

Fixes #3868

Description of changes:

  • core: correctly propagate errors codes in the P3M/DP3M tuning functions to avoid infinite loops
  • core: rely on runtimeErrorMsg() instead of printing error messages to stderr or char arrays
  • python: catch errors from the P3M/DP3M tuning functions in the tune() method of the relevant python Actors
  • python: fix the broken 'metallic' case of the P3M epsilon parameter
  • testsuite: add tests for the 'metallic' case and for the new exception mechanism
  • python: fix the non-metallic epsilon case (epsilon was always set to 0 for P3M and P3MGPU in the core since 4.0.0)
  • documentation: fix broken code examples in the user guide
  • documentation: mention P3M doesn't support non-cubic boxes for non-metallic epsilon

The old exception mechanism simply printed the error message to a
log string, without halting the flow of the program. This either
left the system in an undefined state, or caused other errors to
be thrown at the end of the tuning functions, making it difficult
to trace the true origin of the issue.
Provide working code samples, add missing list of P3M papers,
cleanup outdated documentation.
@jngrad
Copy link
Member Author

jngrad commented Aug 19, 2020

@RudolfWeeber could you please have a look at the DP3M bisection code? I don't understand why it fails for specific particle configurations. Here is a MWE:

import espressomd
import espressomd.magnetostatics
system = espressomd.System(box_l=3*[1])
system.time_step = 0.01
system.part.add(pos=[[0, 0, 0], [0.5, 0.5, 0.5]], rotation=2 * [(1, 1, 1)], dip=2 * [(1, 0, 0)])
solver = espressomd.magnetostatics.DipolarP3M(prefactor=1, accuracy=1e-2)
system.actors.add(solver)  # infinite loop

The infinite loop was solved in this PR by correctly propagating the error code. But where is the mistake in my MWE? If I set the box dimensions to 3*[10] it doesn't fail. If this is not due to precision error, then we could write a test for it, because the propagation logic for the return value of the bisection function is quite fragile.

More generally though, the error code propagation logic in P3M/DP3M badly needs refactoring. We're forwarding integer error codes as floating-point return values. The pointer output arguments are just asking for trouble, I originally ended up with uninitialized values due to early exits from the new exception mechanism and was saved by Clang-Tidy.

@jngrad jngrad marked this pull request as ready for review August 19, 2020 20:53
@jngrad
Copy link
Member Author

jngrad commented Aug 19, 2020

Furthermore, there is this suspicious piece of code in the tuning function that has no coverage:

if (p3m.params.epsilon != P3M_EPSILON_METALLIC) {
if (!((box_geo.length()[0] == box_geo.length()[1]) &&
(box_geo.length()[1] == box_geo.length()[2]))) {
*log = strcat_alloc(
*log, "{049 P3M_init: Nonmetallic epsilon requires cubic box} ");
return ES_ERROR;
}
}

The body of the first conditional is never visited. Several issue here:

  • There is only a single python integration test where P3M is used for the non-metallic case (p3m_gpu.py), but it doesn't actually run the integrator, so the only way to know if P3M works for any non-zero epsilon is to rely on the samples and tutorials, which do not contribute to coverage.
  • This is probably never visited because the tuning function is called before the actor parameters are sent to the core, in which case the P3M struct is default constructed with epsilon = P3M_EPSILON_METALLIC, but I could be wrong. Removing the P3M actor and adding a new one seems to reset the core epsilon value back to P3M_EPSILON_METALLIC for the second tuning.
  • P3M can be used for non-cubic boxes, regardless of the value of epsilon, so the error message is misleading.

@fweik
Copy link
Contributor

fweik commented Aug 20, 2020

P3M can be used for non-cubic boxes, regardless of the value of epsilon, so the error message is misleading.

Are you sure? I think the dipole correction is not implemented for non-cubic boxes.

@jngrad
Copy link
Member Author

jngrad commented Aug 20, 2020

P3M can be used for non-cubic boxes, regardless of the value of epsilon, so the error message is misleading.

Are you sure? I think the dipole correction is not implemented for non-cubic boxes.

Is it only a restriction on the tuning function? The electrostatics docs don't mention such a restriction. Also, p3m.params.epsilon seems to be zero in p3m_sanity_checks_system even after tuning, when epsilon should be set in the core.

@fweik
Copy link
Contributor

fweik commented Aug 20, 2020

If I remember correctly, for the non-cubic case the calculation of the dipole correction is way more complicated, potentially not even available in closed form. If think this is discussed in https://link.springer.com/article/10.1023/A:1018775311536.

@fweik
Copy link
Contributor

fweik commented Aug 20, 2020

So, one should not be allowed to set epsilon to non-metallic if the box is not cubic. Metallic boundaries are used in 99% of the cases anyway, I'm not even sure that epsilon should be settable, this is more of an expert feature in my opinion.

@jngrad
Copy link
Member Author

jngrad commented Aug 20, 2020

Then why don't we throw an error? The rectangular case runs just fine:

import espressomd.electrostatics
system = espressomd.System(box_l=[10, 10, 20])
system.time_step = 0.01
system.cell_system.skin = 0.4
system.part.add(pos=[[0, 0, 0], [.5, .5, 1]], q=[-1, 1])
system.non_bonded_inter[0, 0].wca.set_params(epsilon=1.0, sigma=0.1)
solver = espressomd.electrostatics.P3M(prefactor=2, accuracy=1e-2, epsilon=78.)
system.actors.add(solver)
system.integrator.run(100)

@jngrad
Copy link
Member Author

jngrad commented Aug 20, 2020

Now that's just brilliant: we overwrite self._params in the P3M actor with default-initialized values from the core before the call to self._set_params_in_es_core() (electrostatics.pyx#L318-L335). All simulations with non-zero epsilon since 4.0.0 used epsilon=0 internally. We never caught it in CI because the only test for the non-metallic case bypasses the tuning function:

test_params["tune"] = False

The epsilon value was set to zero during tuning before the call to
`self._set_params_in_es_core()`. All simulations with a non-zero
epsilon were internally using epsilon = 0.0 since ESPResSo 4.0.0.
This only affects the P3M and P3MGPU actors, the DipolarP3M and
MMM1D actors are not affected.
Range checking was incorrectly rejecting integer values for epsilon.
P3M doesn't support the non-metallic case with non-cubic boxes.
@fweik
Copy link
Contributor

fweik commented Aug 20, 2020

Yeah, the whole cython coulomb complex is quite ... special

@fweik
Copy link
Contributor

fweik commented Aug 21, 2020

The tests look good. For the rest, I understand that you don't want to put work into this, but in my opinion a minimal solution would be to switch to an error reporting mechanism were non-results can not be ignored, e.g. optional return values, instead of meddling with the signaling return values. This way it can be made sure that errors are not ignored in the future. (E.g. use boost.outcome, if this is available with our version)

@jngrad
Copy link
Member Author

jngrad commented Aug 21, 2020

The issue is that we're working for 4.1, where we still have an Ubuntu 16.04 image with boost 1.58. boost::outcome needs 1.70.

Could we use exceptions to store the error code and catch it in the core tuning function that is called from python?

@RudolfWeeber
Copy link
Contributor

RudolfWeeber commented Aug 21, 2020 via email

@fweik
Copy link
Contributor

fweik commented Aug 21, 2020

Sure, you can also just use exceptions. It is probably better to not needlessly introduce new things (although I think outcome or similar is actually a interesting contribution to the error reporting design space). Please also note that I did not say you should change something, I just pointed out that your fix does not exclude the same type of problem in the future, and that there are solutions that do that.

@jngrad
Copy link
Member Author

jngrad commented Aug 21, 2020

@fweik Yes, my fix is brittle and I'd prefer to see a proper exception mechanism. boost::outcome looks promising, although we'll have to wait for a boost bump before we can include it. Using exceptions shouldn't be too difficult, but I cannot invest more time into P3M at this moment.

@RudolfWeeber if we don't include the tuning fixes in 4.1, users will still have infinite loops (upon DP3M bisection failure and any P3M failure) and useless error messages (upon any failure, e.g. system with no particles).

@jngrad jngrad added the automerge Merge with kodiak label Aug 23, 2020
@jngrad jngrad added automerge Merge with kodiak and removed automerge Merge with kodiak labels Aug 24, 2020
@kodiakhq kodiakhq bot merged commit 586ccae into espressomd:python Aug 24, 2020
jngrad pushed a commit to jngrad/espresso that referenced this pull request Aug 25, 2020
…#3869)

Fixes espressomd#3868

Description of changes:
- core: correctly propagate errors codes in the P3M/DP3M tuning functions to avoid infinite loops
- core: rely on `runtimeErrorMsg()` instead of printing error messages to stderr or char arrays
- python: catch errors from the P3M/DP3M tuning functions in the `tune()` method of the relevant python Actors
- python: fix the broken `'metallic'` case of the P3M `epsilon` parameter
- testsuite: add tests for the `'metallic'` case and for the new exception mechanism
- python: fix the non-metallic epsilon case (epsilon was always set to 0 for `P3M ` and `P3MGPU` in the core since 4.0.0)
- documentation: fix broken code examples in the user guide
- documentation: mention P3M doesn't support non-cubic boxes for non-metallic epsilon
jngrad pushed a commit to jngrad/espresso that referenced this pull request Aug 27, 2020
…#3869)

Fixes espressomd#3868

Description of changes:
- core: correctly propagate an error code in the P3M tuning function to avoid an infinite loop
- python: fix the broken `'metallic'` case of the P3M `epsilon` parameter
- python: fix the non-metallic epsilon case (epsilon was always set to 0 for `P3M ` and `P3MGPU` in the core since 4.0.0)
- documentation: fix broken code examples in the user guide
- documentation: mention P3M doesn't support non-cubic boxes for non-metallic epsilon
@jngrad jngrad deleted the fix-3868 branch January 18, 2022 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

P3M-based electrostatics and magnetostatics tuning functions ignore errors
3 participants