Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subcycling leads to time drift leading to desync #1866

Closed
fsimonis opened this issue Nov 13, 2023 · 5 comments · Fixed by #1867
Closed

Subcycling leads to time drift leading to desync #1866

fsimonis opened this issue Nov 13, 2023 · 5 comments · Fixed by #1867
Assignees
Labels
bug preCICE does not behave the way we want and we should look into it (and fix it if possible)
Milestone

Comments

@fsimonis
Copy link
Member

fsimonis commented Nov 13, 2023

Describe your setup

Operating system (e.g. Linux distribution and version): Ubuntu 00.00
preCICE Version: develop at 05541eb

Describe the problem

Using subcycling leads to a time drift which can result in a desync of participants and a crash once only one participant decides that isCouplingOngoing() == false.

Problem by @BenjaminRodenberg:

max_time   = 1.0
precice_dt = 0.2
solver_dt  = precice_dt / 640 # = 0.0003125

t = 0
NUMERICAL_ZERO_DIFFERENCE = 1e-14

while abs(t - max_time) > NUMERICAL_ZERO_DIFFERENCE:
    i = 0
    t_loc = 0
    while abs(t_loc - precice_dt) > 10e-14:
        i+=1
        t_loc += solver_dt
    t += t_loc
    print(f"Reached {t} after {i} time steps")
    if(t > max_time):
        break

We are using the computed time window to advance timeWindowStart in the BaseCouplingScheme. This should always be moved by _timeWindowSize if available.
This leads to a drift which builds up over time.

Step To Reproduce

  1. max-time = 1.0, time-window-size=0.2, P1 does 1 timestep, P2 does 640 timesteps of size 0.2/640.
  2. Run the simulation
  3. P1 finalizes while P2 crashes as it is waiting for data, but the communication was closed.

Expected behaviour

No drift, no crash.

Additional context

This is related to but not caused by #1788, which advances the time window size correctly.
This used to simply hang with the participant wait in finalize, which is now disabled by default #1600.

There are 3 relevant cases at the end of a time window:

  1. the next time step is too small, so we move to the real end Change valid-digits to min-timestep #1788
  2. the next time step size is numerically 0 (this doesn't move to the next time window correctly)
  3. there is no time window size (nothing to correct)

Possible solution

Always "snap" the provided time of the last timestep to the end of the time window. This also requires correcting the time stamps of the data samples.

void BaseCouplingScheme::firstExchange()
{
PRECICE_TRACE(_timeWindows, getTime());
checkCompletenessRequiredActions();
PRECICE_ASSERT(_isInitialized, "Before calling advance() coupling scheme has to be initialized via initialize().");
_hasDataBeenReceived = false;
_isTimeWindowComplete = false;
PRECICE_ASSERT(_couplingMode != Undefined);
if (reachedEndOfTimeWindow()) {
_timeWindows += 1; // increment window counter. If not converged, will be decremented again later.
//If preCICE has stopped before the end of the time window we have to duplicate the last available sample and put it at the end of the time window.
// We have to exclude the case where coupling scheme does not have a time window size, since this will cause problem with the interpolation later on
if (getNextTimeStepMaxSize() > math::NUMERICAL_ZERO_DIFFERENCE && hasTimeWindowSize()) {
addTimeStepAtWindowEnd();
// Update the _computedTimeWindowPart in order to keep the time within preCICE synchronised
// Has to be done before the second exchange, since the serial coupling scheme moves to the new time window before updating _timeWindowStartTime
_computedTimeWindowPart = _timeWindowSize;
}
exchangeFirstData();
}
}

if (reachedEndOfTimeWindow()) {

    _timeWindows += 1; // increment window counter. If not converged, will be decremented again later.

    if (hasTimeWindowSize()) {

      //If preCICE has stopped before the end of the time window we have to duplicate the last available sample and put it at the end of the time window.
      // We have to exclude the case where coupling scheme does not have a time window size, since this will cause problem with the interpolation later on
      if (getNextTimeStepMaxSize() > math::NUMERICAL_ZERO_DIFFERENCE) {

        addTimeStepAtWindowEnd();

        // Update the _computedTimeWindowPart in order to keep the time within preCICE synchronised
        // Has to be done before the second exchange, since the serial coupling scheme moves to the new time window before updating _timeWindowStartTime
        _computedTimeWindowPart = _timeWindowSize;
      } else {
        // snap final data samples to the end
        snapDataToTimeWindowEnd();
        _computedTimeWindowPart = _timeWindowSize;
      }
    }

    exchangeFirstData();
  }
@fsimonis fsimonis added the bug preCICE does not behave the way we want and we should look into it (and fix it if possible) label Nov 13, 2023
@fsimonis fsimonis added this to the Version 3.0.0 milestone Nov 13, 2023
@BenjaminRodenberg
Copy link
Member

The python script above basically summarizes what's happening with the CPP solverdummy, if you choose to use a maximum time of 1.0, a time window size of 0.2 and one participant subcycles with 640 steps, while the other uses a single step. Note that you also need to comment out reading and writing of data, since otherwise an assertion is triggered before reaching the end of the simulation.

@fsimonis
Copy link
Member Author

fsimonis commented Nov 14, 2023

Easier than updating the timestamps of write-data Stamples is to update the time before generating the Stamples.

This is something I already do in the upcoming fix for the compositional coupling scheme #1705

@BenjaminRodenberg
Copy link
Member

With #1897 it seems like the problem here has shifted a bit. As we are tracking our time differently now, we don't run into any observable trouble for the case described above anymore. The python script below represents the updated situation:

max_time   = 1.0
window_dt = 0.2
solver_dt  = window_dt / 640

t_start = t = 0
NUMERICAL_ZERO_DIFFERENCE = 1e-14

while abs(t - max_time) > NUMERICAL_ZERO_DIFFERENCE:
    i = 0
    precice_dt = window_dt
    while abs(t - t_start - window_dt) > NUMERICAL_ZERO_DIFFERENCE:  # BaseCouplingScheme::reachedEndOfTimeWindow()
        i += 1
        dt = min(solver_dt, precice_dt)
        t += dt
        precice_dt = t_start + window_dt - t
    t_start += window_dt
    t = t_start
    print(f"Reached {t} after {i} time steps")
    if(t > max_time):
        break

This works and we get 640 steps for every window. So the new way for keeping track of time seems to be more robust.

However, if we use 6400 steps for every window, we can also see a window, where 6401 steps are performed:

Reached 0.2 after 6400 time steps
Reached 0.4 after 6401 time steps
Reached 0.6000000000000001 after 6400 time steps
Reached 0.8 after 6400 time steps
Reached 1.0 after 6400 time steps

Additionally, I'm currently looking for a good strategy that might help us to return an error to the user in such a situation.

Note that a user can still quite easily implement a fix in the adapter by following this idea: If there is a danger for precice_dt getting really small in the next step, allow to use a dt that is slightly larger than the solver_dt we would like to use:

max_time   = 1.0
window_dt = 0.2
solver_dt  = window_dt / 6400

t_start = t = 0
NUMERICAL_ZERO_DIFFERENCE = 1e-14

while abs(t - max_time) > NUMERICAL_ZERO_DIFFERENCE:
    i = 0
    precice_dt = window_dt
    while abs(t - t_start - window_dt) > NUMERICAL_ZERO_DIFFERENCE:  # BaseCouplingScheme::reachedEndOfTimeWindow()
        i += 1
        if abs(solver_dt - precice_dt) < 10 * NUMERICAL_ZERO_DIFFERENCE:
            dt = precice_dt
        else:
            dt = min(solver_dt, precice_dt)
        t += dt
        precice_dt = t_start + window_dt - t
    t_start += window_dt
    t = t_start
    print(f"Reached {t} after {i} time steps")
    if(t > max_time):
        break

Important question w.r.t #1867 do we need to be able to detect this situation? If not, we can close this issue, because the actual problem is the adapter code and not preCICE.

@uekerman
Copy link
Member

Is the remaining problem really worth investing more time? It is extremely niche and even if someone hits it, it does not give real troubles (crash in finalize).

@uekerman uekerman closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2023
@BenjaminRodenberg
Copy link
Member

BenjaminRodenberg commented Nov 27, 2023

@uekerman I agree. Let's close this for now. The crash in finalize has as far as I've tested it so far even disappeared with #1897. There is only the (very niche) possibility of performing +-few substeps, if you are doing a large number of substeps due to round-off errors. But this is less critical as a crash and we also still have to learn a lot about how users will actually use subcycling in the end (I assume doing exactly 6400 substeps is not the main use-case).

BenjaminRodenberg added a commit to BenjaminRodenberg/precice that referenced this issue Nov 27, 2023
uekerman added a commit to precice/precice.github.io that referenced this issue Feb 8, 2024
* v3: Make dt mandatory in documentation

* Update read data functions to use relativeReadTime.

* change precice_dt to preciceDt and solver_dt to solverDt to follow
conventions.

* Add missing.

* Shorten a bit.

* Minor follow-up for #258.

* Add figure.

* Remove outdated note.

* Use dt properly.

* Remove unnecessary pdf.

* Partial update of documentation w.r.t breaking changes.

* Redice diff.

* Redice diff.

* Reduce diff.

* Fix some more breaking changes.

* Add how dt is computed. See #257.

* Apply suggestions from code review

Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>

* Remove unneeded.

* Add pointer to interpolation section.

* Fix format.

* Add advice on precice/precice#1866.

* Apply suggestions from code review

* Update some minor, obviously outdated points.

* Update png.

* Update pages/docs/couple-your-code/couple-your-code-mesh-and-data-access.md

Co-authored-by: Frédéric Simonis <simonisfrederic@gmail.com>

* Update heading.

* Moved changes to #326.

* Update pages/docs/couple-your-code/couple-your-code-mesh-and-data-access.md

* Update pages/docs/couple-your-code/couple-your-code-waveform.md

---------

Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>
Co-authored-by: Frédéric Simonis <simonisfrederic@gmail.com>
Co-authored-by: Benjamin Uekermann <benjamin.uekermann@gmail.com>
uekerman added a commit to precice/precice.github.io that referenced this issue Feb 12, 2024
* Fix whitespace.

* Update action config docs to v3 (#254)

* Update Roadmap

* Update fundamentals-roadmap.md

* Use time step, not timestep. (#259)

* v3: Update documentation w.r.t getMaxTimeStep(). (#258)

* Update documentation w.r.t getMaxTimeStep().
* Implements changes from precice/precice#1623

---------

Co-authored-by: Benjamin Uekermann <benjamin.uekermann@gmail.com>

* Minor follow-up for #258.

* Update documentation for initialize (#186)

* Documentation for data initialization changes introduced in v3.0.0.
* Cleanup initializeData.

---------

Co-authored-by: Benjamin Uekermann <benjamin.uekermann@gmail.com>
Co-authored-by: Benjamin Uekermann <benjamin.uekermann@ipvs.uni-stuttgart.de>

* Update profiling section (#266)

* Update profiling section

* Update examples in profiling section

* Apply suggestions from code review

Co-authored-by: Benjamin Uekermann <benjamin.uekermann@gmail.com>

* Add motivation for custom tool

* Wording tweaks

* More rewording

* Update event files in output files

---------

Co-authored-by: Benjamin Uekermann <benjamin.uekermann@gmail.com>

* Fix typo in profiling

* Update profiling info in guide

* Update CMake variables to new names (#289)

* Clarify some instructions in porting guide (#287)

Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>

* Fix minor issues in tooling/performance-analysis (#272)

* Rename events to profiling

according to precice/precice #1787

* Mention where to find the precice-profiling

* Add hint regarding the event flag in precice-profiling

* Add recommendation for `imvj-restart-mode` in configuration of acceleration (#296)

* Update configuration-acceleration.md

Add recommendation to use restart mode in `IQN-IMVJ` method for large-scale simulations.

* Update pages/docs/configuration/configuration-acceleration.md

Improve descriptive accuracy

Co-authored-by: Benjamin Uekermann <benjamin.uekermann@gmail.com>

---------

Co-authored-by: Benjamin Uekermann <benjamin.uekermann@gmail.com>

* Apply trivial renaming changes.

* Fix naming.

* Add note.

* Update breaking API changes (#299)

---------

Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>

* Fix token

* Fix arg

* Bump preCICE version

* Update hint on formatting configuration files

* Refer to precice.hpp instead of participant.hpp

* Update version in docs sidebar

* Rewrite mapping docs (#308)

* Add overview figure

* Work in progress

* Fix links

* Transform pdfs to svgs

* Update kernel method section

* Also write about GPU executors

* Language edits by Benjamin and Makis

Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>
Co-authored-by: Benjamin Uekermann <benjamin.uekermann@gmail.com>

* Review updates

* Makis language edits

Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>

* Fix typos via aspell

---------

Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>
Co-authored-by: Benjamin Uekermann <benjamin.uekermann@gmail.com>

* Fix some outdated solver-interface and dimensions (#314)

* Migrate use-mesh to provide/receive-mesh

* Migrate binprecice to precice-tools

* Update config-visualizer page

* Include instructions to remove .data() in readData function (#313)

* Include instructions to remove .data() in readData function
* Mention span replacing raw pointers
* Remove raw pointers from general porting example

* Cleanup porting guide example

* Update couple your code

* Remove obsolete section regarding IDs

* Apply suggestions from code review

Co-authored-by: Ishaan Desai <ishaandesai@gmail.com>

* This PR adds C++ STL linking from Doxygen to cppreference

* Fix WD

* Link to porting guides from the source docs

* Add list of porting guides to overview

* Add new solid solver to Nutils adapter overview page (#325)

* Geometric Multiscale Documentation (#205)

Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>

* Update submodules

* Ignore tooling in images

* Update xmlreference to v3.0.0

* Cleanup porting guide API+Config (#317)

Co-authored-by: Ishaan Desai <ishaandesai@gmail.com>
Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>

* Update the rust bindings page

* Fix link

* Make dt in readData mandatory in documentation (#257)

* v3: Make dt mandatory in documentation

* Update read data functions to use relativeReadTime.

* change precice_dt to preciceDt and solver_dt to solverDt to follow
conventions.

* Add missing.

* Shorten a bit.

* Minor follow-up for #258.

* Add figure.

* Remove outdated note.

* Use dt properly.

* Remove unnecessary pdf.

* Partial update of documentation w.r.t breaking changes.

* Redice diff.

* Redice diff.

* Reduce diff.

* Fix some more breaking changes.

* Add how dt is computed. See #257.

* Apply suggestions from code review

Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>

* Remove unneeded.

* Add pointer to interpolation section.

* Fix format.

* Add advice on precice/precice#1866.

* Apply suggestions from code review

* Update some minor, obviously outdated points.

* Update png.

* Update pages/docs/couple-your-code/couple-your-code-mesh-and-data-access.md

Co-authored-by: Frédéric Simonis <simonisfrederic@gmail.com>

* Update heading.

* Moved changes to #326.

* Update pages/docs/couple-your-code/couple-your-code-mesh-and-data-access.md

* Update pages/docs/couple-your-code/couple-your-code-waveform.md

---------

Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>
Co-authored-by: Frédéric Simonis <simonisfrederic@gmail.com>
Co-authored-by: Benjamin Uekermann <benjamin.uekermann@gmail.com>

* Update submodules

* Remove v3 tip from roadmap

* Add rust bindings to installation overview

* Update libprecice2 to libprecice3 on Ubuntu packages page

* Update header names in installation-linking

* Update porting information on configuration overview page

* Update several things on configuration basics page

* Update configuration communication page

* Revert tweaks in configuration basics page

Previously, I changed from precice to participant as name for the preCICE handle and I used "Force" instead of "Forces" to be consistent with the tutorial guidelines. But both changes would also need updates on many other pages. Thus, I left them for the moment for consistency.

* Refer to QN default values on acceleration configuration

* Update porting info and mention Rust on couple your code start

* Change global dim to mesh dim in step 3

* Remove dim handling from step 6

* Fix getDataDim arguments in step 3

* Fix dim handling in step 9

* Fix dim in direct access

* Fix and tweak porting guide

* Make linter happy

* Make linter happier

* Add missing tutorials in the sidebar and overview (#334)

* Add partitioned-pipe-two-phase to sidebar and overview

* Add oscillator overlap in sidebar and overview

* Add partitioned-heat-conduction-overlap to the sidebar and overview

* Add volume-coupled-flow to sidebar and overview

* Add two-scale-heat-conduction to sidebar and overview

* Update pages/tutorials/tutorials.md

Co-authored-by: Ishaan Desai <ishaan.desai@ipvs.uni-stuttgart.de>

* Update pages/tutorials/tutorials.md

---------

Co-authored-by: Benjamin Uekermann <benjamin.uekermann@gmail.com>
Co-authored-by: Ishaan Desai <ishaan.desai@ipvs.uni-stuttgart.de>

* Update bindings overview picture (#333)

* Update bindings overview picture

* Move arrow of Python bindings to precice.hpp

* Add Rust bindings

---------

Co-authored-by: Benjamin Rodenberg <benjamin.rodenberg@in.tum.de>
Co-authored-by: Gerasimos Chourdakis <chourdak@in.tum.de>
Co-authored-by: Benjamin Rodenberg <benjamin.rodenberg@cit.tum.de>
Co-authored-by: Frédéric Simonis <simonisfrederic@gmail.com>
Co-authored-by: David Schneider <david.schneider@ipvs.uni-stuttgart.de>
Co-authored-by: carme-hp <71499004+carme-hp@users.noreply.github.com>
Co-authored-by: June <94080048+Fujikawas@users.noreply.github.com>
Co-authored-by: homspons <ge35joj>
Co-authored-by: Ishaan Desai <ishaandesai@gmail.com>
Co-authored-by: Elia Zonta <99761626+ezonta@users.noreply.github.com>
Co-authored-by: precice-bot <info@precice.org>
Co-authored-by: Ishaan Desai <ishaan.desai@ipvs.uni-stuttgart.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug preCICE does not behave the way we want and we should look into it (and fix it if possible)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants