Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler should not be considered idle while a client submits new work #8876

Open
hendrikmakait opened this issue Sep 19, 2024 · 1 comment

Comments

@hendrikmakait
Copy link
Member

Describe the issue:

I have seen several instances where a cluster with an idle timeout shut down because it took an excessive amount of time for the client to submit new work. In these cases, the scheduler should not have shut down because but rather anticipated that new work will arrive shortly.

As far as I can tell, we can address this in two steps:

  1. We should not consider the scheduler idle while Scheduler.update_graph executes. This method is the main entry point for submitting new work to the cluster and it can take a while when encountering large or complex task graphs, resulting in a cluster shutting down while the scheduler is already preparing future work.
  2. We should not consider the scheduler idle while a client submits new work. This is more complex. One possible solution would be for the client to announce to the scheduler that it starts submitting work. The scheduler will then have to ensure that it doesn't block being idle longer than necessary, i.e., handling submission attempts and client timeouts.
@jacobtomlinson
Copy link
Member

This reminds me of a change we talked about making a few years ago to add a "pre-submit" call on the scheduler. So the client would call that immediately before calling update_graph which would register the intent to send a graph.

The main use of this would be to add a graph submission line between the client and scheduler to the pew-pew-pew plot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants