Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRIU supports Java debugger via the restore option file #19754

Merged
merged 1 commit into from
Sep 18, 2024

Conversation

JasonFengJ9
Copy link
Member

@JasonFengJ9 JasonFengJ9 commented Jun 24, 2024

CRIU supports Java debugger via the restore option file

Debugger support related code are guarded with isDebugOnRestoreEnabled();
For -Xint mode, do not disable JVMTI capabilities required for JDWP debugger before checkpoint, add a capability can_access_local_variables;
For JIT mode, add JVMTI capabilities required for JDWP debugger before checkpoint;
Hooked J9HOOK_VM_PREPARING_FOR_RESTORE event to determine if a JDWP agent is specified in the restore option file, also hooked J9HOOK_VM_CRIU_RESTORE to load the agent libraries;
Refactored the agent library creation from J9VMInitArgs;
Added some trace points.

Signed-off-by: Jason Feng fengj@ca.ibm.com

Closes #16959

@JasonFengJ9 JasonFengJ9 added comp:vm criu Used to track CRIU snapshot related work labels Jun 24, 2024
@JasonFengJ9
Copy link
Member Author

FYI @tajila

@JasonFengJ9
Copy link
Member Author

The CRIU JDWP debugger via the restore option file requires both extension repo PR ibmruntimes/openj9-openjdk-jdk#591 and its backports (in progress) and this PR.
However, this PR alone won't break any existing builds such as Java 8 that won't be backported.

@JasonFengJ9 JasonFengJ9 marked this pull request as ready for review August 22, 2024 18:47
@JasonFengJ9
Copy link
Member Author

FYI @dsouzai
This PR has the following messages post-restore when no JDWP agent is specified in the restore option file.

JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
JVMJITM043W AOT load and compilation disabled post restore.

which is expected to be resolved via a JIT change like the patch you provided.

Note: this JDWP debugger support via the option file is only available with -XX:+DebugOnRestore.

@dsouzai
Copy link
Contributor

dsouzai commented Aug 22, 2024

I opened #20047, which is essentially the patch I had given you.

@dsouzai
Copy link
Contributor

dsouzai commented Aug 23, 2024

I was reading over #18866, specifically the VM Coordination section, and wanted to check if Points 1 and 2 are addressed, namely:

  1. Decompile immediately on restore if a user enables debug post-restore
  2. If debug is not enabled post-restore, but normal HCR is possible, then the VM needs to ensure that stacks with FSD bodies get decompiled.

Regarding 1, although currently (i.e. with this PR and #20047) we will invalidate all compiled code if debug is enabled post-restore, it is still possible for JIT'd code to execute, namely:

  • JNI Thunks
  • Failed proactive compilation

For a failed proactive compilation, we will fall back to using the FSD body which means that a thread will continue to execute that body. Therefore, the VM needs to decompile immediately on restore if debug is enabled. Once we have the ability to reuse the FSD code we generate pre-checkpoint this won't be necessary, but for now I believe we need this to make debug fully functional.

In the case of JNI Thunks, I'm working on a PR for that.

I think once we address all of this, debug on restore should be completely functional (if not optimal). FYI @TobiAjila @vijaysun-omr

@dsouzai
Copy link
Contributor

dsouzai commented Aug 23, 2024

For a failed proactive compilation, we will fall back to using the FSD body which means that a thread will continue to execute that body. Therefore, the VM needs to decompile immediately on restore if debug is enabled. Once we have the ability to reuse the FSD code we generate pre-checkpoint this won't be necessary, but for now I believe we need this to make debug fully functional.

Hm actually... I guess it's not actually a problem if it's running FSD code because that is already set up to deal with debug events. So I guess I just need to address the JNI Thunk issue; the rest is covered by the method body invalidation that already occurs.

I'll think about this some more, but now I don't believe we need to decompile immediately anymore heh.

@dsouzai
Copy link
Contributor

dsouzai commented Sep 3, 2024

In the case of JNI Thunks, I'm working on a PR for that.

Opened #20108.

@JasonFengJ9
Copy link
Member Author

Rebased to resovle the merge conflict.

runtime/jvmti/jvmtiHook.c Outdated Show resolved Hide resolved
@tajila
Copy link
Contributor

tajila commented Sep 11, 2024

@JasonFengJ9 How exatly did you enable debug on restore? whe I tried -Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y in the options file it failed with:

JVMJITM043W AOT load and compilation disabled post restore.
post restore hook
JVMJ9VM007E Command-line option unrecognised: -Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y
Exception in thread "main" org.eclipse.openj9.criu.JVMRestoreException: The JVM could not enable all the restore options specified
	at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:534)
	at Debug.checkPointJVM(Debug.java:32)
	at Debug.main(Debug.java:16)
Caused by: openj9.internal.criu.JVMRestoreException: The JVM could not enable all the restore options specified
	at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVMImpl(Native Method)
	at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVM(InternalCRIUSupport.java:997)
	at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:530)
	... 2 more

@JasonFengJ9
Copy link
Member Author

@tajila I was using -agentlib:jdwp=transport=dt_socket,server=y,suspend=y, will look into -Xrunjdwp option.

@tajila
Copy link
Contributor

tajila commented Sep 11, 2024

same issue

JVMJITM043W AOT load and compilation disabled post restore.
post restore hook
JVMJ9VM007E Command-line option unrecognised: -agentlib:jdwp=transport=dt_socket,server=y,suspend=y
Exception in thread "main" org.eclipse.openj9.criu.JVMRestoreException: The JVM could not enable all the restore options specified
	at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:534)
	at Debug.checkPointJVM(Debug.java:32)
	at Debug.main(Debug.java:16)
Caused by: openj9.internal.criu.JVMRestoreException: The JVM could not enable all the restore options specified
	at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVMImpl(Native Method)
	at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVM(InternalCRIUSupport.java:997)
	at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:530)
	... 2 more

@tajila
Copy link
Contributor

tajila commented Sep 11, 2024

@JasonFengJ9 We will need to add a docs issue for this as well

@JasonFengJ9
Copy link
Member Author

-Xrunjdwp:transport=dt_socket,address=9999,server=y,suspend=y

Added support of -Xrunjdwp.

@tajila this is ready for another look.

@tajila
Copy link
Contributor

tajila commented Sep 13, 2024

jenkins test sanity xlinux jdk17

@tajila
Copy link
Contributor

tajila commented Sep 13, 2024

jenkins test sanity alinux64 jdk21

@tajila
Copy link
Contributor

tajila commented Sep 13, 2024

jenkins test sanity win jdk8

@JasonFengJ9
Copy link
Member Author

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_x86-64_linux_Personal_testList_0/540/consoleFull

16:40:07  ---SUMMARY OF FAILED TESTS---
16:40:07  cma001
16:40:07  -----------------------------
16:40:07  
16:40:07  -----------------------------------
16:40:07  cmdLineTester_jvmtitests_1_FAILED

Looking at this PR test failure.

Debugger support related code are guarded with
isDebugOnRestoreEnabled();
For -Xint mode, do not disable JVMTI capabilities required for JDWP
debugger before checkpoint, add a capability can_access_local_variables;
For JIT mode, add JVMTI capabilities required for JDWP debugger before
checkpoint;
Hooked J9HOOK_VM_PREPARING_FOR_RESTORE event to determine if a JDWP
agent is specified in the restore option file, also hooked
J9HOOK_VM_CRIU_RESTORE to load the agent libraries;
Support -agentlib/-agentpath and -Xrunjdwp;
Refactored the agent library creation from J9VMInitArgs;
Added some trace points;
Disable criu_jitPostRestore.xml failure conditions until the JIT PR is
merged.

Signed-off-by: Jason Feng <fengj@ca.ibm.com>
@JasonFengJ9
Copy link
Member Author

Fixed the PR test failure due to an earlier break for undecorated JDWP agent processing while multiple agents were present.

In addition, disabled following failure conditions within criu_jitPostRestore.xml:

		<!-- output type="failure" caseSensitive="yes" regex="no">Some or all compiled code in the code cache invalidated post restore.</output>
		<output type="failure" caseSensitive="yes" regex="no">JIT compilation disabled post restore.</output -->

Verified manually that these failure conditions can re-enabled with

Also moved DisposeEnvironment(jvmti_env) into jvmtiHookVMRestoreStartAgent() which is the last part of the cleanup if there was no JDWP agent specified. It releases VM access and hence can't be invoked within criuDisableHooks() from J9HOOK_VM_PREPARING_FOR_RESTORE.

@tajila this is ready for another look.

@tajila
Copy link
Contributor

tajila commented Sep 18, 2024

jenkins test sanity win jdk8

@tajila
Copy link
Contributor

tajila commented Sep 18, 2024

jenkins test sanity alinux64 jdk21

@tajila
Copy link
Contributor

tajila commented Sep 18, 2024

jenkins test sanity xlinux64 jdk17

@tajila
Copy link
Contributor

tajila commented Sep 18, 2024

jenkins test sanity xlinux jdk17

@tajila tajila merged commit ebe8718 into eclipse-openj9:master Sep 18, 2024
9 checks passed
@JasonFengJ9 JasonFengJ9 deleted the jdwpoptionfile branch September 18, 2024 22:09
dsouzai added a commit to dsouzai/openj9 that referenced this pull request Sep 19, 2024
Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
dsouzai added a commit to dsouzai/openj9 that referenced this pull request Sep 20, 2024
Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
luke-li-2003 pushed a commit to luke-li-2003/openj9 that referenced this pull request Sep 25, 2024
Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:vm criu Used to track CRIU snapshot related work doc:externals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to re-enable disabled hooks post-restore
3 participants