-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix revoking tasks on custom queues #1352
Conversation
As custom queues have a different vhost we need to use an appropriate celery configuration with that vhost including in the broker URL.
I assign Ihsaan as reviewer as I you mentioned him in your message. @ihsaan-ullah If you won't review Chris' PRs, please let me know. |
Any movement on this? |
If you submit a run to a custom queue, and in the middle of the scoring stage, you basically cancel the execution of the run, you still get the output, meaning that the run was not cancelled, and the next time you send a run to the same queue, you'd get:
|
I can take a look at this, it looks like the original value for |
@liviust I chatted with @ihsaan-ullah and the fact that the running submission is not cancelling is expected behavior/a missing feature. The submission will only be cancelled if it hasn't started running yet. I am not able to recreate the error you are seeing with a custom queue, is it consistent? If so can you provide a more detailed description of the steps? |
@cjh1 The error still persists on the develop and main branches.
Here are the steps to reproduce:
From this point forwards, all the submission will get stuck with that error. If you don't cancel the submission, everything works as expected. If you do cancel it, even if it does still compute the results and returns them back, you will get the above error. |
It could be linked to this: #1445. |
I am afraid, I am still unable to recreate this in my dev environment. When I can the submission I see the following in the worker:
This seems to show the submission being cancelled. |
@liviust Not sure if you can try adding the following debug patch to your setup, to see if we can figure out what is going on?
If you can apply that and then provide the logs for the site_worker after you have run through the cancel steps, that might help find out what is going on. |
Hi @cjh1,
There are no logs in the worker. |
@liviust Thanks, that is helpful. |
|
@Didayolo @liviust Could you try the following patch?
|
Sure. I have updated the
If the two behaviors (a and b) in step 2 are expected, then, it seems that the patch fixed the issue. |
Thanks @liviust for testing this out. As far as I understand (a and b) in step 2 are the current expected behavour, @Didayolo and @ihsaan-ullah can confirm this. |
From the code I don't get how scoring submission cannot be cancelled and running submission can be. Previously I believed that a submission once submitted to worker cannot be cancelled but the code says otherwise and also liviust comments codabench/src/apps/competitions/models.py Line 665 in 82fd30f
There is a commented line
I think this should be checked |
|
sorry for the confusion, I thought this was comment for a todo |
Thank you very much for your help. I'll test this patch and incorporate it. I confirm that this is the current expected behavior (being able to cancel submissions only before they start being computed). Once a worker is working, it does not listen to any cancellation signal. That would be a nice feature for the future (see #872). |
@Didayolo Should I push a PR with that change? |
I see you have already raised a PR |
What is strange now is that I encounter the same behavior regarding cancellation also for the default queue. If I remember correctly, one could cancel the default queue at any stage. I have also tested this on the Codabench website using the default queue, and it canceled successfully, but on the develop and master branches, it doesn't. It has the same behavior as the custom queue. |
@liviust What do you mean it cancels successfully? You are able to interrupt a running submission? https://codabench.org/ is using the latest |
@ mention of reviewers
@ihsaan-ullah
A brief description of the purpose of the changes contained in this PR.
Currently cancelling a submission running from an custom queue doesn't work. Custom queues have a different vhost we need to use an appropriate celery configuration with that vhost including in the broker URL.
A checklist for hand testing
Checklist