Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRIU: Add support for dynamic debug interpreter transition on restore #17642

Open
tajila opened this issue Jun 22, 2023 · 51 comments
Open

CRIU: Add support for dynamic debug interpreter transition on restore #17642

tajila opened this issue Jun 22, 2023 · 51 comments
Labels
comp:jit comp:vm criu Used to track CRIU snapshot related work

Comments

@tajila
Copy link
Contributor

tajila commented Jun 22, 2023

Background

We currently have 3 interpreters, normal one, criu and debug. Ideally, we would like to get in a position where we only have two interpreters, normal and debug. The CRIU interpreter was added because there were capabilities missing (method enter/exit checks) in the normal interpreter to support serviceability features like java method tracing dynamically upon restore.

Goal

Detect request to run with debug interpreter, then exit normal interpreter and continue in debug interpreter. If we can achieve this then we gain

  • Remove criu interpreter
  • Cheaper support for serviceability features on restore
  • Support -Xint on restore
  • Support java debugging on restore

Challenges

Places to detect change:

  • interpreter entry from native (callin) - easy
  • interpreter return from native (any native, not just JNI) - lots of places to check
  • interpreter code (bytecode interpreter loop) - piggy back on async checks
  • jit code - decompile while still in safe point exclusive
  • jit helpers - ??
@tajila tajila added comp:vm comp:jit criu Used to track CRIU snapshot related work labels Jun 22, 2023
@dsouzai
Copy link
Contributor

dsouzai commented Jun 22, 2023

Based on a discussion with @vijaysun-omr, we came up with a few possible ways forward.

1. Disable all compiled code

This is relatively straightforward to do; in fact this is what we currently do for -Xtrace/-Xrs. However, the problem is that this does not guarantee that some JIT'd code will not execute; any JIT'd code on the stack will continue to execute until a new invocation wherein execution will transfer to the interpreter.

2. Fail the checkpoint and start a new JVM in default mode

This is probably less of an option for the JVM and more for Applications; an application can be configured to handle the failure and instead start a new JVM in default mode. This would not maintain Dev/Prod Parity, but it is a fallback option that would at the very least guarantee functionality from a Java Application User pov.

3. Generate code pre-checkpoint as if the JVM is running in FSD mode

Generating code as if the JVM is in FSD mode means running in Involuntary OSR Mode. This means any yield point can be a place where the VM triggers the transition of a thread from JIT'd code to the interpreter. The downside of this approach is that FSD compliant JIT code is around 30% slower. However, this may not matter too much for first response; for steady state throughput, these FSD bodies can be generated with GCR trees to force recompilation post restore.

An important subtlety here is that if debut is not enabled post-restore but redefinition is still possible, the code cache will have some method bodies that support involuntary OSR (i.e. those that were generated pre-checkpoint) and the rest that support voluntary OSR. As such, the VM will need to check a (yet to exist) flag in the body's metadata to determine what type of OSR was used. When redefinition needs to occur, the VM will need to check, at a yield point, if the body was compiled to support involuntary OSR, and if so, decompile it regardless of the type of yield point; otherwise, normal Voluntary OSR mechanics apply.

4. Choose Async Checkpoints that, in Voluntary OSR Mode allow redefinition to occur, and add OSR transitions there

If option 3 is too expensive, another approach is to run in a suboptinal Voluntary OSR Mode. Rather than run the Fear Analysis to minimize the OSR transition points, we force the transition points to be the exact set of yield points that are used to ensure that redefinition occurs; while this set is larger than what would result from an optimal OSR analysis, it is still likely smaller than the set of points in option 3.

However, an important caveat here is that any yield point that is not used to ensure that redefinition occurs must be ignored by the VM for the purpose of checkpointing; the thread should be allowed to continue execution until it hits one of these yield points that is also a transition point (it is guaranteed that the thread will not execute indefinitely before reaching such a point).

Another caveat is that we will need to add Voluntary OSR support for AOT (#4849).

@dsouzai
Copy link
Contributor

dsouzai commented Jun 22, 2023

I am going to start investigating the perf impact of option 3 first. Specifically, I will generate two builds where,

  1. The JIT generates FSD code all the time
  2. The JIT generates FSD code only pre-checkpoint

@tajila
Copy link
Contributor Author

tajila commented Jun 22, 2023

FYI @gacholio

@tajila
Copy link
Contributor Author

tajila commented Jun 22, 2023

any yield point that is not used to ensure that redefinition occurs must be ignored by the VM for the purpose of checkpointing; the thread should be allowed to continue execution until it hits one of these yield points that is also a transition point

I'm not too familiar with this detail, how do we differentiate this in the VM? There are two main mechanism we use, exclusive and safepoint exlcusive. @gacholio thougts.

@gacholio
Copy link
Contributor

My impression from discussion with Tobi is that we would just discard all the compiled code if debug was enabled on restore. This avoids any number of difficult issues. The checkpoint code uses safepoint exclusive, so all threads will certainly be at an OSR point.

@tajila
Copy link
Contributor Author

tajila commented Jun 22, 2023

@gacholio that is captured in Irwin's case 3 and 4. From my understanding, what Irwin is saying is that the JIT either needs to be in FSD mode (non default) or Voluntary OSR (default) mode for us to decompile the JIT frames on the stack.

The checkpoint code uses safepoint exclusive, so all threads will certainly be at an OSR point.

To me this sounds like we could then use case 4, which is the cheaper option.

@gacholio
Copy link
Contributor

To me this sounds like we could then use case 4, which is the cheaper option.

The OSR I'm talking about is I believe involuntary, in that we force it on all threads (it's not induced by a failed check in the compiled code). Does involuntary require FSD? I didn't think so.

@dsouzai
Copy link
Contributor

dsouzai commented Jun 22, 2023

Does involuntary require FSD?

FSD involves involuntary OSR; normal HCR enabled mode uses voluntary OSR.

@gacholio
Copy link
Contributor

So either we need to start in involuntary mode always (or at least if we want to support the possibility of debug) or add guards at every OSR point to check for the switch (maybe this can be done via the assumptions mechanism?).

@dsouzai
Copy link
Contributor

dsouzai commented Jun 22, 2023

add guards at every OSR point to check for the switch (maybe this can be done via the assumptions mechanism?).

Well, once the guards are patched it will always transition to the VM. As such, once we enter into debug mode, the entire code cache might as well be discarded (same with the AVL trees). However, if we don't enter debug mode, the code quality should be better than with involuntary osr mode.

Also, with this approach, at the time when the VM wants to stop threads to prepare for checkpoint, if the thread hits some other yield point that isn't an OSR transition point, it needs to be allowed to return back to running JIT'd code; it's only in involuntary osr mode that all yield points are OSR transition points. That's why if we can get away with involuntary osr mode pre-checkpoint, that would be the simplest approach to take.

@tajila
Copy link
Contributor Author

tajila commented Jun 22, 2023

if the thread hits some other yield point that isn't an OSR transition point, it needs to be allowed to return back to running JIT'd code;

This is the part that is challenging. Im not sure how we detect this.

@gacholio
Copy link
Contributor

gacholio commented Jun 22, 2023

As such, once we enter into debug mode, the entire code cache might as well be discarded

I believe we will be reinitializing the send targets for all methods when we restore, which has the effect of abandoning all of the compiled code (by which I mean the interpreter will never invoke it again), so normal CCR should be able to discard the old method bodies once every running invocation has OSRed back to the interpreter.

@gacholio
Copy link
Contributor

This is the part that is challenging. Im not sure how we detect this.

Let's not do this - it's essentially another layer of exclusive on top of safepoint, which would be completely unmanageable (I'd already like to see some proof that safepoint is valuable given how many problems it has had).

@dsouzai
Copy link
Contributor

dsouzai commented Jun 22, 2023

This is the part that is challenging. Im not sure how we detect this.

@vijaysun-omr could elaborate more on this perhaps, but he did mention that there's only very specific bytecodes that matter for the purpose of (in a normal run) ensuring that we yield to allow a redefinition event (for example, if we're in a loop with no monents/invocations, we need to ensure that we don't loop indefinitely).

If there's some way to identify at the yield point / transition point what the bytecode is supposed to, we would be able to distinguish between normal yield points and OSR transition points. Of course the critical point here is that the set of osr transition points must be the set of yield points that are necessary to ensure a redefinition event. It may also be that when we transition via OSR, we end up in a different place than when we yield via a yield point, so that too could be a distinguishing factor.

That said, I don't know if what I just described is absolutely accurate, so I'll let Vijay clarify.

@vijaysun-omr
Copy link
Contributor

I am under the impression that under our present default HCR implementation, the VM only allows actual class redefinition to occur at certain yield points, and my understanding is that those yield points are 1) async checks 2) method calls (probably via stack overflow check) and 3) monitor enter.

If this is not how the VM is doing class redefinition, then please clarify. If this is how the VM is doing class redefinition, then I don't understand what more is needed in order to support option 4 in Irwin's post.

@gacholio
Copy link
Contributor

Redefinition can occur at any place that releases VM access. These would include:

  • stack overflow
  • method entry event (FSD only)
  • async check
  • method call
  • monitor enter
  • field events (FSD only)
  • many JIT helpers (resolve helpers in particular)
  • OOL object allocation (not if safepoint is enabled)
  • method exit event (FSD only)

With some exceptions, if you call out from compiled code, that's a redef point (some JIT helpers will never release VM access, so we'll need to be very careful in future if we change a helper and the JIT has assumed it will not release VM access).

The only practical solution for compiled code is to discard it entirely on restore (i.e. post decompiles for every compiled frame in every thread). This will naturally result in the debug interpreter being invoked after the decompiles.

Safepoint HCR means that object allocation is not an OSR/decompile point (the checkpoint code gets that kind of access if necessary).

The requirement is that we have an OSR block at all of the possible locations that a method could be paused (by safepoint exclusive). I'd rather not rely on guards to accomplish this since it would be very hard to distinguish which points will rely on the guard fail and which need to be forced into OSR.

When we restore, we will mark all frames in all stacks for decompile, and reset all method send targets back to their default (count and compile in the JIT case). Eventually, the obsolete compiled code will be unrerefenced and able to be reclaimed.

@vijaysun-omr
Copy link
Contributor

vijaysun-omr commented Jun 23, 2023

That list of program points in compiled code from @gacholio where class redefinition may occur (ignoring FSD for the moment) is what we used to have, until some more OSR changes were done to the design a few years ago was my understanding. The basis of this understanding is this code :

The code under the if-condition I pasted only checks for calls, async checks and monitor enters as spots where it needs to arrange for OSR transitions ("post execution OSR" there means it will set up the OSR transition after those operations are done and we return back to the JITed code) https://github.com/eclipse/omr/blob/2d5ac63fbe881f0af035ef2732b22f85eb3893dd/compiler/compile/OMRCompilation.cpp#L637

There is also this comment that alludes to what that code does:

// HCR in the new world is only allowed to happen at three kinds of async points:

There must have been some VM code added to ensure we only redefine at those 3 points since the JIT is not in charge of where class redefinition occurs. The point of debate being this category which the above JIT code does not seem to consider anymore as a place where redefinition is possible:

  • many JIT helpers (resolve helpers in particular)
  • OOL object allocation (not if safepoint is enabled)

@gacholio
Copy link
Contributor

changes were done to the design a few years ago

You are likely referring to safepoint OSR, which only eliminates object allocation from the list of HCR points:

// Check NextGenHCR is supported by the VM
if (!(javaVM->extendedRuntimeFlags & J9_EXTENDED_RUNTIME_OSR_SAFE_POINT) ||
(*vmHooks)->J9HookDisable(vmHooks, J9HOOK_VM_OBJECT_ALLOCATE_INSTRUMENTABLE) || disableHCR)
{
self()->setOption(TR_DisableNextGenHCR);
}

Looking at the code, in HCR (not FSD) mode, the VM does not force decompile anywhere - it calls jitClassesRedefined with a list of modified classes/methods so the JIT can patch what it needs to.

So, I suppose it's up the JIT to determine where HCR checks need to be inserted to ensure correctness.

One thing I think we've all forgotten (and I've just remembered) is that HCR does not affect existing frames on the stack. The requirement is that all new method invocations target the most current version of the method.

This may mean that existing HCR/OSR is not sufficient to accomplish what's needed here as we will be unable to simply discard the code cache like we do for FSD (extended) HCR.

@dsouzai
Copy link
Contributor

dsouzai commented Jun 26, 2023

There are two different concepts at play here:

  • Points at which redefinition can occur
  • Points at which decompilation can occur

In the case of FSD, i.e. when we use Involuntary OSR, the sets of these points end up being the same from the point of view of the JIT because all those yield points mentioned by Gac are decompilation points.

In the case of default HCR, i.e. when we use Voluntary OSR, from the point of view of the JIT, redefinition and decompilation points are not necessarily the same. In general, a thread yields to the VM to allow a STW redefinition event to occur, and then the thread continues executing until it reaches a decompilation point. The only yield points that could be redefinition points are, as Vijay mentioned, asynccheck, calls, and monents. This can be seen here:

https://github.com/eclipse/omr/blob/0c448df41bbd5978cb22dac0fe117febf78010d7/compiler/compile/OMRCompilation.cpp#L637-L665

the selected if branch above is what runs by default.

What Option 4 in #17642 (comment) proposes is to essentially make the set of redefinition points (from the JIT's pov in Voluntary OSR mode) also the set of decompilation points. This can be implemented in two ways:

  1. The Involuntary OSR Mechanism: At these yield points, the VM triggers decompilation
  2. The Voluntary OSR Mechanism: As with HCR, the VM allows the thread to return to executing the compiled body; unlike HCR though, the code generated to trigger a decompilation is emitted right after the yield point.

1 is obviously the cleaner approach, but 2 may be more practical in terms of being able to reuse non-FSD infrastructure.

At any rate, the question of what are redefinition points and what are decompilation points is an orthogonal concern to Option 4 above, which banks on the fact that we must already able to distinguish between the two for HCR.

All that said, if the assumption that redefinition cannot occur outside of asynccheck, calls, and monents is wrong, then HCR has a longstanding bug independent of the CRIU feature. As far as the JIT is concerned right now, it generates code assuming that redefinition can only occur at these three types of yield points. The code comment linked in #17642 (comment) was first added around May 2016. @gacholio do you know what VM changes were added around that time frame that might explain why that comment exists?

@gacholio
Copy link
Contributor

All that said, if the assumption that redefinition cannot occur outside of asynccheck, calls, and monents is wrong, then HCR has a longstanding bug independent of the CRIU feature.

Clasically, HCR could occur any time VM access can be released. That includes all of the places (and possibly more) that I detailed above.

The only HCR change I can think of is the safepoint OSR (which I think you refer to as nextGenHCR). This disallows HCR at object allocation points.

When the HCR occurs, the VM does not add any decompilations - it reports the modified classes/methods so the JIT can do the appropriate patching (presumably invalidating calls to any potentially-replaced methods). As stated above, there's no need to decompile when the thread resume - it's fine to wait until a new method invocation is going to take place (even then, if you know that the invoked method has not been replaced, you can just go ahead and invoke it).

@gacholio
Copy link
Contributor

It's tempting to use voluntary OSR to let the decompiles trickle in as the compiled code detects the restore, but this won't work properly in the debugger (an obvious example is that the debugger would not be able to query locals in frames that remain compiled without FSD).

I think the only way this will work is to make every escape point (except allocation points in next gen) from the compiled code into an OSR point, and do the force decompile (involuntary) on restore.

@DanHeidinga
Copy link
Member

It's tempting to use voluntary OSR to let the decompiles trickle in as the compiled code detects the restore, but this won't work properly in the debugger (an obvious example is that the debugger would not be able to query locals in frames that remain compiled without FSD).

@gacholio that sounds a lot like Graeme's @SelectiveDebug technology from many many years ago. Is that a reasonable approach to build off where existing frames are marked in some way to indicate they can't be debugged (and use the correct stackmapper) and new invocations are debuggable?

How valid this is depends on the user requirements but it seems like a reasonable position to me.

@gacholio
Copy link
Contributor

@SelectiveDebug technology

I don't see the correlation , and I would have to say no to building on top of 20-year old abandoned tech (I doubt there's even a mention of it left in the codebase). It also does not address my above concern about locals.

@DanHeidinga
Copy link
Member

We don't want to reuse the @SelectiveDebug tech but the idea of allowing a mix of debuggable and not debuggable frames is worth considering. The locals in non-debuggable frames would simply be unavailable - I believe there's an existing JVMTI error (JVMTI_ERROR_OPAQUE_FRAME) to return from the locals related queries that covers this behaviour

@dsouzai
Copy link
Contributor

dsouzai commented Jun 28, 2023

The only HCR change I can think of is the safepoint OSR (which I think you refer to as nextGenHCR). This disallows HCR at object allocation points.

After talking to @jdmpapin and Vijay, I believe that the three types of yield points I mentioned above do cover most of what is handled by safepoints. However, it may be that the resolve helpers are not handled; we'll have to take a look and see if we do handle it in some other way; either way we would have to make them an explicit OSR point.

I think the only way this will work is to make every escape point (except allocation points in next gen) from the compiled code into an OSR point, and do the force decompile (involuntary) on restore.

Yeah that sounds right. Actually, additionally we need to make these points also the only points that a thread can yield to allow a checkpoint. Essentially, in Option 4, we need to have the set of Redefinition Points (Escape Points/HCR Points), the set of OSR Points (Involuntary OSR Transition Points), and the set of Checkpoint Points be the same set of points.

Overall though, I do agree that if FSD compliant code pre-checkpoint is sufficient then we should just stick to that.

@dsouzai
Copy link
Contributor

dsouzai commented Jul 5, 2023

I launched some perf runs to measure the impact of generating FSD compliant code. I ran the pingperf and restcrud apps; as the names suggest, pingperf is a simple OpenLiberty app that responds to a request with a response, whereas restcrud queries a postgres db and returns the results.

I had 3 builds:

  1. Baseline
  2. FSD Always - the JIT always generates FSD compliant code, even post-restore
  3. FSD Pre-checkpoint - the JIT only generates FSD compliant code pre-checkpoint

pingperf

Build Startup Slowdown First Response Slowdow
FSD Always 5% 4%
FSD Pre-checkpoint 4% 3%

restcrud

Build Startup Slowdown First Response Slowdow
FSD Always 2.5% 15%
FSD Pre-checkpoint 2.5% 2%

From the looks of things, the FSD approach (i.e. Option 3) looks to be sufficient to enable debug post-restore.

That said, there are some things that we need to address.

  1. If debug is not enabled post-restore, then in order to support "normal" HCR redefinition, at any yield point (not just the safepoints mentioned above), the VM will need to check a (yet to be defined) flag in the method's metadata to see if it is a FSD body; if it is, the VM will have to trigger an OSR transition. This is because FSD bodies do not have any OSR guards. Essentially, we have never been in a situation where we have Involuntary and Voluntary OSR method bodies at the same time.
  2. I have to ensure that throughput is not affected. I did some initial throughput runs, and it turns out that the FSD Pre-checkpoint build is not much better than the FSD Always build, which is ~40% worse than baseline. Part of this comes from the fact that the FSD compliant bodies do not get recompiled (because up until this point, we never had the need for FSD bodies to exist except if we knew for certain debug was enabled). However, this doesn't explain the entire gap, and so I need to investigate further; it may be that there are other optimizations that get disabled under FSD that I missed in the post-restore options processing where I reset the FSD flag.

gacholio added a commit to gacholio/openj9 that referenced this issue Feb 6, 2024
Add a new private flag which instructs the interpreter to exit and
re-invoke itself. This will be used by CRIU when a restored image
requests debug capabilities (by changing the interpreter entry point to
the debug interpreter).

Related: eclipse-openj9#17642

Signed-off-by: Graham Chapman <graham_chapman@ca.ibm.com>
singh264 added a commit to singh264/openj9 that referenced this issue Mar 5, 2024
Support transition to debug interpreter on restore when
the transition is requested with an env var file or an
options file.

Issue: eclipse-openj9#17642
Co-authored-by: Tobi Ajila <atobia@ca.ibm.com>
Signed-off-by: Amarpreet Singh <Amarpreet.A.Singh@ibm.com>
@singh264
Copy link
Contributor

singh264 commented Jun 5, 2024

How can I know the status of the issue?

@dsouzai
Copy link
Contributor

dsouzai commented Jun 6, 2024

The compiler work is being tracked here #18866 (it does include some VM pre-requisites). For the most part, the compiler functional work is done, but we still need to reduce the footprint gap caused by generating FSD pre-checkpoint.

@singh264
Copy link
Contributor

singh264 commented Jun 6, 2024

How can I know if now is a good time to address the VM pre-requisites as it seems like we still need to reduce the footprint gap caused by generating FSD pre-checkpoint?

@dsouzai
Copy link
Contributor

dsouzai commented Jun 6, 2024

The footprint gap and the VM pre-prerequisites are mutually exclusive; the work to reduce the footprint gap is not going to be impacted by the necessary VM changes.

That said you should probably coordinate with @JasonFengJ9 since I believe he's working on the VM side debug on restore work.

@singh264
Copy link
Contributor

singh264 commented Jun 6, 2024

@JasonFengJ9 how can I potentially assist with the debug on restore work?

@JasonFengJ9
Copy link
Member

The first openj9 portion of the debug on restore work was

The corresponding extension repo PR (initially opened by Mike Z., now with my changes) is awaiting review

I have a draft PR for the second openj9 PR which is being tuned according to Irwin's perf results, the ETA is next week or so.

There are quite a few other CRIU open issues, please talk to @tajila for a suitable task.

@singh264
Copy link
Contributor

singh264 commented Jun 6, 2024

It seems like a suitable task was discussed after I talked to @tajila for a suitable task.

@singh264
Copy link
Contributor

singh264 commented Jul 9, 2024

How can I contribute to the task?

@tajila
Copy link
Contributor Author

tajila commented Jul 9, 2024

@singh264 I've assigned #19835 to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit comp:vm criu Used to track CRIU snapshot related work
Projects
Status: No status
Development

No branches or pull requests

7 participants