Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute worker - docker pull image + execution time limit crash #1161

Merged
merged 1 commit into from
Sep 21, 2023

Conversation

ihsaan-ullah
Copy link
Collaborator

@ihsaan-ullah ihsaan-ullah commented Sep 17, 2023

@ mention of reviewers

@Didayolo

A brief description of the purpose of the changes contained in this PR.

Now users can see the following error logs:

  1. docker image pull failure logs in ingestion std_err
  2. execution time limit logs in ingestion or scoring std_err

Screenshots

Docker image pull
Screenshot 2023-09-17 at 8 28 05 PM

Screenshot 2023-09-17 at 8 28 16 PM

Execution time limit Ingestion
Screenshot 2023-09-18 at 1 14 38 AM
Screenshot 2023-09-18 at 1 14 54 AM

Execution time limit Scoring
Screenshot 2023-09-18 at 1 31 26 AM
Screenshot 2023-09-18 at 1 31 39 AM

Issues this PR resolves

Checklist

  • Code review by me
  • Hand tested by me
  • I'm proud of my work
  • Code review by reviewer
  • Hand tested by reviewer
  • CircleCi tests are passing
  • Ready to merge

@ihsaan-ullah ihsaan-ullah linked an issue Sep 17, 2023 that may be closed by this pull request
2 tasks
@Didayolo
Copy link
Collaborator

That is awesome!

In case of another problem (not docker pull or time limit), is the error log transmitted too?

@Didayolo Didayolo added the P1 High priority, but NOT a current blocker label Sep 18, 2023
@ihsaan-ullah
Copy link
Collaborator Author

That is awesome!

In case of another problem (not docker pull or time limit), is the error log transmitted too?

I cannot think of any other problem which may occur outside the running of the submission. Docker pull is happening before running the submission and execution time limit is also outside the submission execution. But if there is any other scenario, I can check that

@Didayolo
Copy link
Collaborator

I think that was the main scenarios indeed. Others scenarios are basically bugs inside the code of the compute worker. For instance de BadZipFile that we have fixed already. We can just assume there is no other bug while nothing is reported.

@ihsaan-ullah
Copy link
Collaborator Author

I think that was the main scenarios indeed. Others scenarios are basically bugs inside the code of the compute worker. For instance de BadZipFile that we have fixed already. We can just assume there is no other bug while nothing is reported.

Yes, I agree. Now I think if there is any other bug, it can be fixed in less time as now things seems to be working

@Didayolo
Copy link
Collaborator

Didayolo commented Sep 18, 2023

I suggest to test this PR in this way:

  • Build and push the compute worker docker to codalab/competitions-v2-compute-worker:test (note the test tag).
  • By the way, we can remove the image with tag retries from DockerHub.
  • Update the compute workers of the default queue of https://codabench-test.lri.fr/ with this new docker image.
  • Try different submission on codabench-test, exceeding execution time limit or not, etc.

This procedure can be used for future developments too.

@Didayolo Didayolo self-assigned this Sep 18, 2023
@ihsaan-ullah
Copy link
Collaborator Author

ihsaan-ullah commented Sep 18, 2023

Push the compute worker docker to codalab/competitions-v2-compute-worker:test (note the test tag).

why the compute worker docker image.

@Didayolo
Copy link
Collaborator

why the compute worker docker image.

To reflect changes done on compute_worker.py, we need to build the docker image and push it to dockerhub.

@Didayolo
Copy link
Collaborator

Didayolo commented Sep 21, 2023

@ihsaan-ullah

I updated the compute worker associated to codabench-test server with your changes.

We need to make submission to test this PR.

https://codabench-test.lri.fr/

Seems to be working fine!

@Didayolo Didayolo merged commit 899e388 into develop Sep 21, 2023
1 check passed
@Didayolo Didayolo deleted the compute_worker_crash_handled branch September 21, 2023 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High priority, but NOT a current blocker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No error logs when the compute worker crashes
2 participants