Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verify/pass/arrays/{merge,selection_sort}.rs timeout issue #819

Open
Pointerbender opened this issue Jan 7, 2022 · 10 comments
Open

verify/pass/arrays/{merge,selection_sort}.rs timeout issue #819

Pointerbender opened this issue Jan 7, 2022 · 10 comments

Comments

@Pointerbender
Copy link
Contributor

Pointerbender commented Jan 7, 2022

On commit 7e259b8 from PR #817 the test case prusti-tests/tests/verify/pass/arrays/selection_sort.rs is reliably failing due to a timeout error during the CI run on Ubuntu. This timeout issue is not reproducible on the Windows or Mac CI runs, nor on my local Ubuntu machine, where the test case is verifying okay. @vakaras pointed out that such a performance issue could very well be due to matching loop.

There are a couple of things we could try to investigate this, per the instructions from @fpoli:

Typically such large performance variations are caused by matching loops. @fpoli If I am not mistaken you had used AxiomProfiler recently, would you mind adding some brief instructions on how to use it?

I didn't; my logs were too big (my timeout was 30 minutes) and the tool kept crashing on Linux. However, this failing test in Prusti seems much smaller than that. The readme has instructions on how to run the tool and there is a working Docker image that can be used even from Mac. To use the profiler, see here; they link to the paper that describes the tool. The "Help" menu of the tool also contains useful instructions.

Alternatively, Z3's smt.qi.profile=true flag is very promising. Its output is explained here and with a bit of processing I used it to see how many times each quantifier is instantiated. You need to obtain first from Silicon the smt2 file with the commands sent to Z3, making sure that you can reproduce the performance issue just by running Z3 on it.

After tracking down and fixing the performance issue, we should re-enable the prusti-tests/tests/verify/pass/arrays/selection_sort.rs test case, as it was ignored for now until this is resolved.

@Pointerbender
Copy link
Contributor Author

I also observed selection_sort.rs failing spuriously in the wild on Mac OS X when fetching the latest changes on master from upstream to my fork of prusti-dev. This particular commit doesn't have the FxHash changes yet, so it's probably not reproducible there: https://github.com/Pointerbender/prusti-dev/runs/4737295553?check_suite_focus=true#step:8:707

  • [Prusti: verification error] loop invariant might not hold after a loop iteration that preserves the loop condition.
  • thread 'main' panicked at 'Prusti failed to finish within 180 seconds. It finished in 526.228013589s.', prusti/src/driver.rs:227:9

@fpoli
Copy link
Member

fpoli commented Jan 7, 2022

The same for pass/arrays/merge.rs on Mac: Prusti failed to finish within 500 seconds. It finished in 521.787225005s. (job)

@fpoli
Copy link
Member

fpoli commented Jan 7, 2022

Where is the timeout set?
Edit: here.

@Pointerbender
Copy link
Contributor Author

I have a running theory of what might be happening: I think the fix_quantifiers and/or the optimize_folding passes might be altering the user-specified quantifiers in a way that makes it hard for Viper to infer the correct triggers (since the user-specified quantifiers in the test case don't have explicit triggers, Viper will attempt to infer these automatically).

Take for example the post-condition on line 65:

#[requires(i < 10)]
#[requires(min < 10)]
#[requires(i <= min)]
#[requires(forall(|k1: usize, k2: usize| (0 <= k1 && k1 < k2 && k2 < i) ==> a[k1] <= a[k2]))]
#[requires(forall(|k1: usize, k2: usize| (0 <= k1 && k1 < i && i <= k2 && k2 < 10) ==> a[k1] <= a[k2]))]
#[ensures(forall(|k1: usize| (0 <= k1 && k1 < 10 && k1 != i && k1 != min) ==> (a[k1] == old(a[k1]))))]
#[ensures(a[i] == old(a[min]))]
#[ensures(a[min] == old(a[i]))]
#[ensures(forall(|k1: usize, k2: usize| (0 <= k1 && k1 < k2 && k2 <= i) ==> a[k1] <= a[k2]))]
#[ensures(forall(|k1: usize, k2: usize| (0 <= k1 && k1 < i && i <= k2 && k2 < 10) ==> a[k1] <= a[k2]))]
fn set_a(a: &mut [i32; 10], i: usize, min: usize) {
let a_i = a[i];
let a_min = a[min];
a[i] = a_min;
a[min] = a_i;
}

If I run the test in debug mode with the fix_quantifiers and/or the optimize_folding optimizations disabled, the whole test compilation finishes successful in ~58 seconds, of which the Viper verification of method selection_sort takes less than 20 seconds (in release mode the entire compilation takes 22 seconds of which selection_sort takes only 11-12 seconds). The post-condition #[ensures(forall(|k1: usize| (0 <= k1 && k1 < 10 && k1 != i && k1 != min) ==> (a[k1] == old(a[k1]))))] is translated into the Viper code:

forall _2_quant_101: Int ::
    0 <= _2_quant_101 && _2_quant_101 <= 18446744073709551615
    ==>
      0 <= _2_quant_101 && (
        _2_quant_101 < 10 && (
          !(_2_quant_101 == old[l38](_89))
          && !(_2_quant_101 == old[l38](_90))
        )
      ) ==>
        _2_quant_101 < 10 && (
          _2_quant_101 < 10
          && read$Snap$Array$10$i32$__$TY$__(snap$__$TY$__Snap$Array$10$i32$(old[l38](_87.val_ref)), _2_quant_101)
          == old[l38](read$Snap$Array$10$i32$__$TY$__(snap$__$TY$__Snap$Array$10$i32$(_87.val_ref), _2_quant_101))
        )

This is already somewhat tricky, because we don't know for sure (without profiling it first) whether Viper will infer the triggers for both:

{ read$Snap$Array$10$i32$__$TY$__(snap$__$TY$__Snap$Array$10$i32$(old[l38](_87.val_ref)), _2_quant_101) }
{ read$Snap$Array$10$i32$__$TY$__(snap$__$TY$__Snap$Array$10$i32$(_87.val_ref), _2_quant_101) }

If I manually add both triggers in the dumped Viper file, it verifies fast in Viper IDE in about 13 seconds on a cold start. If I only add the latter trigger then Viper IDE takes much longer to verify (it timed out after 300 seconds). So at least empirically it looks like Viper correctly infers both triggers in this case (profiling will have to tell us for sure).

It gets a lot worse when we enable the 2 optimizations again, now the Viper verification of method selection_sort (through Prusti, which uses a fixed Z3 random seed) takes almost 120 seconds! The same post-condition now turns into the following Viper code:

inhale (let _LET_41 == (old[l38](_87.val_ref)) in (let _LET_40 == (old[l38](_90)) in (let _LET_39 == (old[l38](_89)) in (
forall _2_quant_101: Int ::
    0 <= _2_quant_101
    && _2_quant_101 <= 18446744073709551615
    ==>
      0 <= _2_quant_101
      && (
        _2_quant_101 < 10 && (
          !(_2_quant_101 == _LET_39)
          && !(_2_quant_101 == _LET_40)
        )
      ) ==>
        _2_quant_101 < 10 && (
          _2_quant_101 < 10
          && read$Snap$Array$10$i32$__$TY$__(snap$__$TY$__Snap$Array$10$i32$(_LET_41), _2_quant_101)
          == old[l38](read$Snap$Array$10$i32$__$TY$__(snap$__$TY$__Snap$Array$10$i32$(_87.val_ref), _2_quant_101))
        )
// .....

Which triggers could Viper possibly choose here out of the possibilities:

//  #[ensures(forall(|k1: usize| (0 <= k1 && k1 < 10 && k1 != i && k1 != min) ==> (a[k1] == old(a[k1]))))] 
forall _2_quant_101: Int ::
    { read$Snap$Array$10$i32$__$TY$__(snap$__$TY$__Snap$Array$10$i32$(_LET_41), _2_quant_101) }
    { read$Snap$Array$10$i32$__$TY$__(snap$__$TY$__Snap$Array$10$i32$(old[l38](_87.val_ref)), _2_quant_101) }
    { read$Snap$Array$10$i32$__$TY$__(snap$__$TY$__Snap$Array$10$i32$(_87.val_ref), _2_quant_101) }

Would the choice matter a lot for performance? Would this also matter in Viper assert and exhale statements, or only for inhale statements?

(I didn't do any profiling yet, I was just trying to eyeball it and then noticed the performance difference when disabling those optimizations).

@fpoli
Copy link
Member

fpoli commented Jan 7, 2022

It seems to me that the transformation from the first to the second quantifier that you reported is doing a lot of unnecessary stuff. I don't see why we would need to move those old expressions out of the quantifier. If I recall correctly those passes are a workaround for an incompleteness of Viper when the old expression is used to identify some acc(..) permission inside a quantifier (e.g. forall ... :: unfolding acc(P(old[..](..))) in ...). But when the type of the Viper expression is a value (Int or domain instance) we shouldn't be able to hit that incompleteness, so the transformation shouldn't be needed.

If triggers are the cause, I think the proper fix for that example would be to add the triggers {a[k1]}{old(a[k1])} on line 65, so that Prusti should generate the first two triggers that you wrote. However, this might still be blocked by #812.

Which triggers could Viper possibly choose here out of the possibilities:

It's always hard to answer this. I opened viperproject/silicon#586.

Would the choice matter a lot for performance?

The first two should be equivalent because the verifier should know _LET_41 == old[l38](_87.val_ref), but the last one is not. So, the choice should still matter, but it depends a lot on what's in the rest of the program.

Would this also matter in Viper assert and exhale statements, or only for inhale statements?

Yes, because assert and exhale assume the property when the check succeeds.

@vakaras
Copy link
Contributor

vakaras commented Jan 7, 2022

I think the fix_quantifiers and/or the optimize_folding passes might be altering the user-specified quantifiers in a way that makes it hard for Viper to infer the correct triggers

Just a heads-up: these passes should be gone once I finish my refactorings.

(since the user-specified quantifiers in the test case don't have explicit triggers, Viper will attempt to infer these automatically)

From our Viper experience, we concluded that the user should always write the triggers because inferring the correct ones is impossible. So, if it is clear what triggers to write, we should just add them to the example.

@Pointerbender
Copy link
Contributor Author

Pointerbender commented Jan 7, 2022

In that case I think the best way forward to fix this performance issue is to add triggers to the test case once #812 is resolved and @vakaras' refactorings land. I think that should solve it and then we won't need to do the profiling.

In another comment I alluded to an idea about detecting potential matching loops automatically. I will post a more detailed description later, but at a very high level I'm thinking of two things:

  • Emit a compiler warning when a potential matching loop is detected (this would be a step before running the Viper program). This will also warn the user if the matching loop is not actually triggered.
  • Let Prusti infer the default triggers when the user omits these, so that it's always clear which triggers are used by looking at the Viper program (in lieu of viperproject/silicon#586). These triggers might not always be complete, but that is the price to pay for users when they choose to omit them :P If the auto-inferred trigger(s) could lead to a (potential) matching loop, this is a compiler error that tells the user to add another quantified variable and/or specify the triggers explicitly manually.

@vakaras
Copy link
Contributor

vakaras commented Jan 8, 2022

In that case I think the best way forward to fix this performance issue is to add triggers to the test case once #812 is resolved and @vakaras' refactorings land. I think that should solve it and then we won't need to do the profiling.

I completely agree.

In another comment I alluded to an idea about detecting potential matching loops automatically. I will post a more detailed description later, but at a very high level I'm thinking of two things:

  • Emit a compiler warning when a potential matching loop is detected (this would be a step before running the Viper program). This will also warn the user if the matching loop is not actually triggered.

  • Let Prusti infer the default triggers when the user omits these, so that it's always clear which triggers are used by looking at the Viper program (in lieu of viperproject/silicon#586). These triggers might not always be complete, but that is the price to pay for users when they choose to omit them :P If the auto-inferred trigger(s) could lead to a (potential) matching loop, this is a compiler error that tells the user to add another quantified variable and/or specify the triggers explicitly manually.

This would definitely improve the user experience. One more idea would be when the user omits triggers, show a warning and in the warning's note tell what triggers were inferred so that the user has some starting point.

@Aurel300 Aurel300 changed the title prusti-tests/tests/verify/pass/arrays/selection_sort.rs timeout issue verify/pass/arrays/{merge,selection_sort}.rs timeout issue Jan 17, 2022
Pointerbender added a commit to Pointerbender/prusti-dev that referenced this issue Feb 3, 2022
bors bot added a commit that referenced this issue Feb 3, 2022
848: partial fix for #819 r=vakaras a=Pointerbender

In issue #819 the `prusti-tests/tests/verify/pass/arrays/selection_sort.rs` test was entirely disabled in anticipation of some future fixes. Those future fixes have not landed yet, but PRs #831 and #844 opened a window for a temporary work-around. I was curious if I could get it to work and was able to achieve a stable test run for `selection_sort` (consistently finishes in under 60 seconds on my PC). 

The changes I made were to selectively disable the `fix_quantifiers` and `optimize_folding` optimization passes and use wrapper functions for array accesses instead of using array indices directly (those are blocked by #812 for now).

I don't consider this a proper fix for #819, but I think it could still be a nice regression test and now that its execution time has stabilized I think it's safe to turn back on. Let's keep #819 open to track implementing a proper fix in the future.

Co-authored-by: Pointerbender <pointerbender@gmail.com>
bors bot added a commit that referenced this issue Feb 17, 2022
858: add support for array/slice accesses in quantifier triggers r=fpoli a=Pointerbender

This PR adds support for using slice/array access in quantifier triggers (issue #812). These are encoded including bound checks, but the bound checks are not needed (nor allowed) inside the triggers. The fix/hack applied here removes those bound checks from the triggers.

This seems to fix the encoding problem, but it doesn't appear to be a full solution for `prusti-tests/tests/verify/pass/arrays/selection_sort.rs` (see #819). If I replace all the triggers there with direct array accesses (but not the lookup and the assignment statements in the loop body wrapped in `get`/`set`), then the verification time for this test jumps from 1 minute to 4 minutes on my PC. If I also replace the lookup and assign statements in the loop bodies with direct array accesses, then the verification times out. I suspect this is a matter of adding a few dummy assert statements here and there, but as long as #796 hasn't landed yet this does not seem to improve the situation (it still times out, even with dummy assertions atm). I tried adding `body_invariant`s instead of assert statements, but this also still results in timeouts. For these reasons, I decided not to include the updated version of `selection_sort.rs`. Using a custom pure lookup function such as `get` still yields superior performance over interactions with `lookup_pure` array/slice accesses, even with the fix from this PR, unfortunately :)

The test `prusti-tests/tests/verify/pass/issues/issue-812-4.rs` is currently ignored because it triggers a `todo!()` in the new vir high encoder here:

https://github.com/viperproject/prusti-dev/blob/66c90d5dae9040eb68b2f0d2689faa480b10e8d6/prusti-viper/src/encoder/mir/pure/specifications/encoder_high.rs#L26-L35

I included the test anyway so that it can be enabled later when that part is implemented during the refactoring.

I'd like to investigate deeper what is the best way to trigger the quantifiers (which are snap encoded) from a Viper method body (where Rust statements are not snap encoded, unless routed through a pure Rust function), but I thought I'd first do a check-in first to see if the proposed fix is heading in the right direction and inquire about if there are any other alternatives than #796 or custom pure lookup functions to work-around the performance issues that seem to be caused by the mixed snap/non-snap encoding :) 

Thanks!





Co-authored-by: Pointerbender <pointerbender@gmail.com>
Co-authored-by: Federico Poli <federpoli@gmail.com>
Pointerbender added a commit to Pointerbender/prusti-dev that referenced this issue Feb 19, 2022
bors bot added a commit that referenced this issue Feb 19, 2022
876: use array indices as quantifier triggers for `selection_sort` test r=vakaras a=Pointerbender

Now that there is support for array/slice indices in quantifier triggers (#812), I updated the `selection_sort.rs` test case to make use of this. This results in a ~15 second faster verification time 🎉 increasing performance further per #819.

Prusti can't verify this test yet when using array assignments instead of `set()`, so there are still 2 calls to `set()` in the test with a FIXME to address these later. Will post a separate comment with some ideas about what that would take.

Co-authored-by: Pointerbender <pointerbender@gmail.com>
@Pointerbender
Copy link
Contributor Author

With PR #876 the selection_sort.rs was sped up a little further, but when playing around with quantifier triggers I discovered that code like this doesn't work fully yet:

while i < a.len() {
        // ... snipped code and loop body invariants from `selection_sort.rs`
        let a_i = a[i];
        let a_min = a[min];
        a[i] = a_min;
        a[min] = a_i;
}

But this does work currently:

while i < a.len() {
        // ... snipped code and loop body invariants from `selection_sort.rs`
        let a_i = a[i];
        let a_min = a[min];
        set(a, i,  a_min);
        set(a, min, a_i);
}

#[requires(0 <= i && i < 10)]
#[ensures(forall(|j: usize| (0 <= j && j < 10 && j != old(i)) ==> (a[j] == old(a[j])), triggers=[(a[j],)]))]
#[ensures(a[old(i)] == old(v))]
fn set(a: &mut [i32; 10], i: usize, v: i32) {
    a[i] = v;
}

I suspect that this is because in the first example the array assignments (= these do not use snap encoding) are not triggering the quantifiers from the loop body invariants (= these use snap encoding). In the second example the correct triggering assertions are contained in the post-conditions of set. These are snap encoded and thus trigger the quantifiers from the loop body invariant.

At the Viper level, this could probably be fixed by adding a call to snap$__$TY$__Snap$Array$10$i32$ after the last array assignment a[min] = a_i; at the end of the loop:

function snap$__$TY$__Snap$Array$10$i32$(self: Ref): Snap$Array$10$i32
  requires acc(Array$10$i32(self), read$())
  ensures (forall i: Int ::
         { read$Snap$Array$10$i32$__$TY$__(result, i) }
         { lookup_pure__$TY$__Array$10$i32$i32$(self, i) }
         0 <= i && i < 10 ==>
                  read$Snap$Array$10$i32$__$TY$__(result, i)
                  == lookup_pure__$TY$__Array$10$i32$i32$(self, i))
{
  cons$Snap$Array$10$i32$__$TY$__(seq_collect$Array$10$i32$__$TY$__(self, 0))
}

The function snap$__$TY$__Snap$Array$10$i32$ has the needed post-conditions to propagate the array assignment into the snap domain in order to satisfy the loop body invariants. Even though the loop body invariants already are making a call to snap$__$TY$__Snap$Array$10$i32$ at the Viper level, the problem seems to be that the array assignments are not preserved between loops (at the snap encoding level, at least), so on entering the next iteration of the loop it fails verification on the first body invariant which depends on the array a.

I was pondering on what it would take to be able to write the first example and have Prusti verify it okay. Some possibilities are:

  1. Wait for Use snapshot encoding for Rust assertions #796 which addresses snap encoding of Rust assertions. Then we can add (debug_)assert! statements at the end of the loop which will invoke snap$__$TY$__Snap$Array$10$i32$. This one is currently blocked by the refactoring.
  2. We could update the quantifier triggers for array/slice assignment to also trigger on { read$Snap$Array$10$i32$__$TY$__(X, i) }, currently those only trigger on { lookup_pure__$TY$__Array$10$i32$i32$(X, i). This would be nice for the end-user experience, because less Rust assertions would be needed. However, this may decrease overall performance because of the extra trigger and larger Viper programs when snap encoding is not needed for the verification.
  3. Find another way to preserve information about the array assignment between loop iterations. Johannes also suggested this option in his master thesis in section "6.1 Future Work".

Would there be any merit to already start investigating one of these options before the refactoring lands? Are there any other interesting options that are not listed above? I would be interested in taking on exploring this further :)

Thanks!

@vakaras
Copy link
Contributor

vakaras commented Feb 19, 2022

Thank you for investigating this! I suspect that the second option would be quite easy to implement and hopefully does not introduce too serious performance problems. So, I personally would vote for it.

Pointerbender added a commit to Pointerbender/prusti-dev that referenced this issue Feb 28, 2022
…iperproject#877.

remove auxiliary helper function for `selection_sort.rs` viperproject#819.
added regression test for viperproject#877.
bors bot added a commit that referenced this issue Mar 12, 2022
888: use `old` in trigger for direct array assignments r=vakaras a=Pointerbender

This PR experiments with addressing 2 things:

1. It implements the suggested work-around for #877 together with a regression test (the test suite indicated no regressions locally).
2. The `selection_sort.rs` test case was updated to use this fix in order to remove the auxiliary `set` function (#819).

Some notes apply to the approach chosen to make `selection_sort.rs` verify:

* A "dummy" `body_invariant!(a[0] <= a[i]);` was added to the outer and inner loops in order to correctly trigger the quantifiers.
* A change was made in `prusti-viper/src/encoder/foldunfold/mod.rs` to prevent Prusti from running into an encoder error when the `old` expresion was only present in the trigger, but not in the quantifier body. Without this change, `selection_sort.rs` will complain about a fold-unfold error. Since access predicates are not needed when the `old` expression is only present in the trigger, this should be sound (but please do double check this during the peer review).
* Without the auxiliary `set` function, the verification time roughly doubled, but it is still well within the `-Pverification_deadline=180` compiler parameter. Execution time now hovers around ~2 minutes on my PC.

If the chosen approach sounds good, then this PR can be merged. I'm still a bit suspicious about why the `set` function yields better performance, I could look into this deeper with some guidance (possibly in a separate PR if that makes more sense). For now I came up empty-handed for what could be causing this performance regression (if this happens to be related to how Viper gives hints to Z3 based on the Viper/Silver syntax, then my current experience may not be of much help, but am willing to learn more!).

Thanks!

p.s. Issue #877 can be left open for now in case we want to file a bug report with the Viper back-end developers to investigate the `old` trigger issue. I posted a ~30 lines example there that hopefully is small enough to investigate.

Co-authored-by: Pointerbender <pointerbender@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants