Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel subshells (JEP91) implementation #1249

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

ianthomas23
Copy link
Collaborator

@ianthomas23 ianthomas23 commented Jun 13, 2024

This is the implementation of the kernel subshells JEP (jupyter/enhancement-proposals#91). It follows the latest commit (1f1ad3d) with the addition of a %subshell magic command that is useful for debugging. To try this out I have a JupyterLab branch that talks to this branch and is most easily tried out using https://mybinder.org/v2/gh/ianthomas23/jupyterlab/jep91_demo?urlpath=lab; once the mybinder instance has started, open the subshell_demo_notebook.ipynb and follow the instructions therein.

The idea is that this is mergeable as it is now, it is backward compatible in that it does not break any existing use of ipykernel (subject to CI confirmation). There are some ramifications of the protocol additions (outlined below) that will need addressing eventually, but I consider these future work that can be in separate PRs.

Outline of changes

  1. The parent subshell (i.e. the main shell) runs in the main thread.
  2. Each new subshell runs in a separate thread.
  3. There is a new thread that deals with all communication on the shell channel, previously this was performed in the main thread.
  4. Communication between the shell channel thread and other threads is performed using ZMQ inproc pair sockets, which are essentially shared memory and avoid the use of thread synchronisation primitives.
  5. Incoming shell messages are handled by the shell channel thread which extracts the subshell_id from the message and passes it on to the correct subshell.
  6. Subshells are created and deleted via messages sent on the control channel. These are passed to the shell channel thread via inproc pair sockets so that the SubshellManager in the shell channel thread is responsible for subshell lifetimes.

Example scenario

Here is an example of the communication between threads when running a long task in the parent subshell (main thread) and whilst this is running a child subshell is created, used, and deleted.

sequenceDiagram
    participant client as Client
    participant control as Control thread
    participant shell as Shell channel thread
    participant main as Main thread

    client->>+shell: Execute request (main shell)
    shell->>-main: Execute request (inproc)
    activate main

    client->>+control: Create subshell request
    control->>-shell: Create subshell request (inproc)
    activate shell
    create participant subshell as Subshell thread
    shell-->>subshell: Create subshell thread

    shell->>control: Create subshell reply (inproc)
    deactivate shell
    activate control
    control->>-client: Create subshell reply

    client->>+shell: Execute request (subshell)
    shell->>-subshell: Execute request (inproc)
    activate subshell

    subshell->>shell: Execute reply (inproc)
    deactivate subshell
    activate shell
    shell->>-client: Execute reply (subshell)

    client->>+control: Delete subshell request
    control->>-shell: Delete subshell request (inproc)
    activate shell
    destroy subshell
    shell-->>subshell: Delete subshell thread

    shell->>control: Delete subshell reply (inproc)
    deactivate shell
    activate control
    control->>-client: Delete subshell reply

    main->>shell: Execute reply (inproc)
    deactivate main
    activate shell
    shell->>-client: Execute reply (main shell)
Loading

Future work

ipykernel

  1. Shell channel thread deserialises the whole some of the message to get the subshell_id. Ideally it would only deserialise the header. May need changes in Jupyter Client.
  2. Signalling a subshell to stop uses a threading.Event following the existing anyio implementation which requires an extra thread per Event. It would be nice if this could be changed so a subshell is a single thread not two.
  3. Execution count. Should either be a separate count per subshell or a single count for a kernel. Needs a decision and changes in IPython as is currently not atomic.
  4. History. Related to item 2 above.
  5. input() on more than one subshell at the same time run but do not store correctly.
  6. Debugger use needs investigating.
  7. Busy/idle status needs investigating. Should there, as now, be separate status for each subshell, or the concept of kernel (i.e. any subshell) busy status? This issue is much wider than subshells as it includes status of the control channel, and how Jupyter Server should track status (Improve the busy/idle execution state tracking for kernels. jupyter-server/jupyter_server#1429).
  8. Use of display hooks for e.g. Matplotlib. Should these be on the parent subshell, or child subshells too?

JupyterLab

The JupyterLab branch I am using to demo this isn't really intended to be merged. But if it was, it needs:

  1. Check kernel_info to see if subshells are supported.
  2. Delete subshell when close a subshell's ConsolePanel.
  3. Report subshell IDs in tree view?
  4. Display of subshell busy/idle status.

(Edited for clarity)

@ianthomas23
Copy link
Collaborator Author

The kernel subshells JEP has been accepted and as this is the reference implementation for that I am asking for someone to review this.

This was referenced Sep 10, 2024
Comment on lines 128 to 134
while True:
for socket, _ in self._poller.poll(0):
msg = await socket.recv_multipart(copy=False)
self._shell_socket.send_multipart(msg)

# Yield to other tasks.
await sleep(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop leads to 100% CPU usage, we need to find another way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. On my macOS dev machine this is fine, but I've confirmed that it is a problem on Linux.

Copy link
Collaborator Author

@ianthomas23 ianthomas23 Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the implementation in 97e3e91. I've removed the sleep(0) and poll(0) calls and now use an async zmq Poller in an anyio task. When a subshell is created or deleted this task is cancelled using an anyio.Event and rescheduled with the updated list of subshells. With this I no longer see 100% CPU load on Linux or macOS.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the implementation of this piece of code again. My use of await poller.poll() turned out to cause problems on python < 3.10 and on Windows. Now I am avoiding use of zmq.Poller and instead using a separate anyio task for each subshell to listen for reply messages and send them out to the client via the shell channel. This seems to be more robust on older python and Windows. I've also replaced the use of an anyio.Event with a memory object stream instead (essentially an anyio async queue).

@ianthomas23
Copy link
Collaborator Author

Local tests are showing occasional errors in tests/test_message_spec.py::test_execute_stop_on_error which I need to look at.

pyproject.toml Outdated
@@ -188,6 +188,7 @@ filterwarnings= [
# https://github.com/python-trio/trio/issues/3053
"ignore:The `hash` argument is deprecated in favor of `unsafe_hash` and will be removed in or after August 2025.",
]
asyncio_default_fixture_loop_scope = "function"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comes from pytest-asyncio, but don't we use AnyIO's pytest plugin?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does come from pytest-asyncio, and evidently it is used via some means as there are warnings before this change and not after. But I am going to revert this change as this option is not available for the minimum dependencies run, and I can tolerate a warning.

@ianthomas23
Copy link
Collaborator Author

I had missed a direct use of the shell_socket to send an abort_reply message when it should have been sent via the SubshellManager. With that fixed, all the local CI is now passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants